JPS607492A

JPS607492A - Monosyllable voice recognition system

Info

Publication number: JPS607492A
Application number: JP58115573A
Authority: JP
Inventors: 寺尾　修
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-06-27
Filing date: 1983-06-27
Publication date: 1985-01-16

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（ａ）　発明の技術分野本発明は擬似連続発声の単音節を認識対象とする登録方
式による特定話者の音声認識方式に関する。DETAILED DESCRIPTION OF THE INVENTION (a) Technical Field of the Invention The present invention relates to a speech recognition method for a specific speaker using a registration method that recognizes monosyllables of pseudo-continuous utterances.

（ｂ）　技術の背景近年、音声認識はデータ処理技術の発達と普及に伴いデ
ータ処理システムにおけるデータ入出力手段の一端とし
て当初は音声制御による仕分け、′α話回１）プにおけ
る案内サービス程ｊ蜆にとソ、ｔっでいた音声認識・合
成技術も半害体特に集イ！（化技術の進展に支えられ、
従来内外とさｈでいた大量の情報を高速処理する音声認
識のための？ｆ１５理回１１′ｉ′１あるいは高速大谷
４１メモリがＬＳＩによって実現すると共に低コストで
提供されるようになり、１３本語による音声入力手段の
持つ対語形式に４．　Ｌ、１・や作者に他の入出力装（
醒に見られるようなＩＩｆ別のＨ熟全必費とすることの
ない操作がち易な人力音声〜デジタルデータの変換機能
を生かしたデータ処理装置として普及するようになった
。(b) Background of the technology In recent years, with the development and spread of data processing technology, voice recognition has been used as part of the data input/output means in data processing systems. Speech recognition and synthesis technology, which has been around for a long time, is now available especially for semi-harmful creatures! (Supported by advances in chemical technology,
For voice recognition that can process large amounts of information at high speed, which was previously required both internally and externally? f15 logic 11'i'1 or high-speed Otani 41 memory was realized by LSI and provided at low cost, and 4. L, 1, and other input/output devices (
It has become popular as a data processing device that takes advantage of the easy-to-operate human-powered voice to digital data conversion function, without requiring the full cost of IIf as seen in modern technology.

（ｃ）　従来技術と問題点第１図は従来における単音節認識方式のブロック図金示
す。図において１は例えばマイクロプロセッサ（ＭＰＵ
）によりね成される制御部、２は高速半導体メモリによ
る記憶部、３は音声り環部、４は入カバクンバッファお
よび５は比較部である。(c) Prior Art and Problems FIG. 1 shows a block diagram of a conventional monosyllable recognition method. In the figure, 1 is, for example, a microprocessor (MPU).
), 2 is a storage section using a high-speed semiconductor memory, 3 is an audio ring section, 4 is an input buffer, and 5 is a comparison section.

単音節認識方式は通常特定語基のため認識すべき入力音
声における複数の単音節を設定して、制御部１は音声処
理部３への入力音声における単音節を図示省略したが予
め扶数の帯域フィルタ群に印加して得るスペクトラム出
力毎に５ｍｓ程度のフレーム周期で標準化して得た特定
パラメータの各音素毎に対応した足常的な部分を記憶部
３に標準パクン辞４２３として蓄積する。線形時間正規
化により１単６Ｍにつき例えば５１２ビツトの単訪バタ
ンか作成蓄積でべろ。このように先行する学習モードで
は５〜１０個程の訓練サンプルから平均的な標準バタン
２３ａ−ｎを作成して標準バタン２３全登録保持し、認
識モードでは入力音声による入力バタンとＨＭ　ｈ４　
卑バタン２３ａ−ｎとを同一フレーム周期で予め設定し
たしきい値レベルに従い類似度を比較部４によ請求めて
日本語における母音・子音（ア〜ン）４５ｆｆｆｌ、濁
音１８種、半濁音５種。The monosyllable recognition method usually sets a plurality of monosyllables in the input speech to be recognized for a specific word base, and the control unit 1 sets the monosyllables in the input speech to the speech processing unit 3 in advance, although not shown in the figure. A regular part corresponding to each phoneme of a specific parameter obtained by standardizing each spectrum output obtained by applying it to a group of bandpass filters at a frame period of about 5 ms is stored in the storage unit 3 as a standard parody 423. By linear time normalization, for example, 512 bits per single 6M can be generated and accumulated. In this way, in the preceding learning mode, average standard batons 23a-n are created from about 5 to 10 training samples and all the standard batons 23 are registered, and in the recognition mode, input batons based on input voice and HM h4 are stored.
The comparison unit 4 calculates the similarity between the Japanese vowels and consonants 23a to 23a-n according to a preset threshold level in the same frame period, and calculates 45 fffl vowels and consonants (a-n), 18 types of voiced sounds, and 5 semi-voiced sounds in Japanese. seed.

拗音３３種の計１０１種の単音節を基本に１識を行う。A study is conducted based on a total of 101 types of monosyllables, including 33 types of syllables.

認識は通常単音節を子音部分と母音部分に分離し、母音
標準バタンを使って母音を決定した後、子音の認識を行
う方法によっている。この方法は単音節の候補が削減さ
るので子音の認識が容易になる上処理量が少くなる利点
がある。しかしこの方法は一般的な擬似連続発声による
音μ」入力に見られる母音発声が不充分な特に独りの半
母音を含む拗音については特に母音が誤認識され易く子
音部を含めた単音節の認識率が下る欠点があった０（ｄ）　発明の目的本発明の目的は上記の欠点を除去するため従来の単音節
だけによる母音の標準バタンたけでなく母音または／お
よび半母音の連続組合せにおける音韻の通過部（わたシ
）に関する推移バタン全母音のモデルとして辞−卦に備
え、入力バタンにおける推移パタンとの差異を演算して
差異結果の移行方向によシ母音を決定して、母音抽出に
おける認識率を同上しようとするものである。Recognition usually involves separating a single syllable into consonant and vowel parts, determining the vowel using a standard vowel button, and then recognizing the consonant. This method has the advantage of reducing the number of monosyllable candidates, making it easier to recognize consonants, and reducing the amount of processing required. However, this method tends to misrecognize vowels, especially for persistent consonants that include solitary semi-vowels, where the vowel pronunciation seen in the input of the general pseudo-continuous utterance of the sound μ'' is insufficient, and the recognition rate of monosyllables including consonants is high. (d) Object of the Invention The object of the invention is to eliminate the above-mentioned disadvantages by not only the conventional standard slam of vowels with only monosyllables, but also the passage of phonemes in successive combinations of vowels and/or semi-vowels. Prepare the ji-ku as a model for all the transitional vowels related to part (watashi), calculate the difference with the transition pattern in the input baton, determine the transition direction of the difference result, determine the shi vowel, and improve the recognition rate in vowel extraction. It is an attempt to do the same as above.

（ｅ）　発明の構成この目的は、未知入力音Ｐ＋予め辞書に登録された単音
節標準バタンと照合して行う音声認識装置において、音
声処理部は、話者の単音節標準バタンの登録時にスペク
トラム時系列による分析に基づく標準バタンと共に母音
との推移バタンを拗音・半母音等の特定の単音節に対し
て母音の推移モデル辞書として作成して記憶部Ｖｃ登録
せし虻る慎能を俯え、制御部は擬似連続発声された入力
音声に伴う音声処理部、経由、照合部における入力バタ
ンと標準バタンによる一次照合において、しきい値を満
たす類似度が侍られぬとき、または、複数の母音とのＩ
ｉ′Ｉ２離が近いときには、該入力バタンをモデル辞書
の推移パタンに従って各母音との差異を演算し、各母音
との距離の差異傾向をめ、母音推移モデル辞書と照什し
て、その母音認識を行うことを特徴とする単音節音声認
識方式荀提供することによシ達成することができる。(e) Structure of the Invention The object of the present invention is to provide a speech recognition device that performs matching between an unknown input sound P and a monosyllabic standard bang registered in advance in a dictionary. I created a vowel transition model dictionary for specific monosyllables such as obsessives and semi-vowels, along with standard slams based on time-series analysis, and registered them in the memory unit Vc. The control unit is configured to perform a primary matching between the input button and the standard button in the voice processing unit, relay, and matching unit accompanying pseudo-continuously uttered input speech, when the similarity that satisfies the threshold cannot be met, or when multiple vowels and I of
i'I2 When the separation is close, calculate the difference between the input button and each vowel according to the transition pattern of the model dictionary, find the tendency of the difference in distance from each vowel, and compare it with the vowel transition model dictionary to calculate the difference between the input button and each vowel. This can be achieved by providing a monosyllabic speech recognition method that performs recognition.

（ｆ）　発明の実施例以下、図面を参照しつつ本発明の一実施例について説明
する。第２図は本発明の一実施例における単音節音声認
識方式のブロック図、第３　ｆｆ？Ｉ（Ｒ）は入力バタ
ンのパワー推移バタンおよび第３図（ｂ）は各母音モデ
ルとの差異傾向何回を示す。図において１ａは制御部、
２ａは記憶部、３ａは音声処理部、４ａは入力バタンバ
ッファ、５ａは比較部である。更に記憶部２ａの記憶領
域において２３は標準バタン辞書、２３ａ−ｎは標準バ
タン、２４は推移パタンによる母音モデル辞書、２４ｐ
−ｓは母音モデルである。図の構成部材を示す省゛）号
で従来と共通の符号を有するものは従来と共通の僚能と
特性をイエし、サフィックスが付加されたｒｌｔ成部材
は従来と共通の機能に加えて付加（幾能を備えたことを
示す。従って本実施例においても１ｎｌｌ　（１一部１
ａは記憶部２ａにおける１υ１ｊ仰プログラム２１　ａ
　！ｐよひ制酉１データ２２ａに従い構成各部を制仰し
て話者の擬似連続発声でよる学習モードにおける入力音
声に従って単音節標準バタンを音声処理部３ａによシ分
析して標準バタン辞書２３に標準パタン２３ａ−ｎを登
録し、認識モードでは入力音声を音声処理部３ａにおい
て分析し入力バタンを入力バタンバッファ４ａ経由比収
部５ａに入力し、標準バタン２３ａ−ｎと照合して認識
を冥行することに従来と変りない。しかし本実施例にお
いては母音発声が不充分なゆ数の母音または／および半
母音の連続組合せを含む拗音晴・についてはしきい値を
満足する照合がイ：≠られないので、通當の標準パタン
との照合に絖いて例えは第３図（ａ）に示す入力バタン
を母音モデル２４ｐ〜８と同一１ｉ！ＩＩ期フレーム毎
に差異を演算し、その傾向を第３図（ｂ）のようにめる
。ここでは母音Ｕが平均値的には最も近いが音韻末尾部
においては母音０が差異がＯに近接する傾向に比較して
離脱する傾向に見られるので母音０と認識する。勿論母
音Ｕに対して点線に示す延長のようになれば該入力バタ
ンは母音Ｕと認識する。このようにすれば入力音声にお
ける発声が不充分なため木尾都が欠損して母音Ｕと誤ｇ
織されるような場合でも望ましい認識結果の母音０が出
力され、従来に比較してよシ認識率の高いルＹ識万式が
イ（ｉられる〇（ｇ）　発明の詳細な説明したように母音抽出が困ＭＩＣな母音ＩＦ、たは
／および半母音の連続組合せについてもよりｉ４在的な
特徴を捕えて照合する方法全加味し、ト５！識率を高め
る単音節の音声認識方式が得られるので４ｊ用である。(f) Embodiment of the Invention An embodiment of the invention will be described below with reference to the drawings. FIG. 2 is a block diagram of a monosyllabic speech recognition method according to an embodiment of the present invention. I(R) shows the power transition of the input button, and FIG. 3(b) shows the number of times the difference trend with each vowel model. In the figure, 1a is a control unit;
2a is a storage section, 3a is an audio processing section, 4a is an input button buffer, and 5a is a comparison section. Furthermore, in the storage area of the storage unit 2a, 23 is a standard batan dictionary, 23a-n is a standard batan dictionary, 24 is a vowel model dictionary based on transition patterns, 24p
-s is a vowel model. Components in the figure that have the same reference numerals as conventional ones have functions and characteristics common to conventional ones, and rlt component parts with a suffix have added functions in addition to conventional ones. (Indicates that it has geometric function. Therefore, also in this example, 1nll (1 part 1
a is the 1υ1j elevation program 21 a in the storage unit 2a
! The monosyllabic standard bang is analyzed by the voice processing unit 3a according to the input voice in the learning mode using the speaker's pseudo-continuous utterances by controlling each constituent part according to the data 22a and converted into the standard bang dictionary 23. The standard patterns 23a-n are registered, and in the recognition mode, the input voice is analyzed by the audio processing section 3a, the input button is inputted to the ratio acquisition section 5a via the input button buffer 4a, and the recognition is performed by comparing it with the standard patterns 23a-n. The process remains the same as before. However, in this embodiment, matching that satisfies the threshold value is not possible for the continuous combination of vowels and/or semi-vowels with insufficient vowel pronunciation, so the standard pattern For example, the input button shown in FIG. 3(a) is the same as the vowel models 24p to 8, 1i! The difference is calculated for each period II frame, and its tendency is plotted as shown in FIG. 3(b). Here, the vowel U is the closest in terms of average value, but in the final part of the phoneme, vowel 0 is recognized as vowel 0 because it tends to separate from the difference compared to the tendency for vowel 0 to approach O. Of course, if the input button becomes an extension of the vowel U as shown by the dotted line, the input button is recognized as the vowel U. In this way, Kio Miyako will be lost due to insufficient voicing in the input voice, and it will be mistaken for the vowel U.
Even in cases where the vowel 0 is the desired recognition result, the vowel 0 is output, and the recognition rate is higher than that of the conventional method. A monosyllable speech recognition method that increases the recognition rate by taking into account all the features of continuous combinations of vowels IF, MIC, and/or semi-vowels that are difficult to extract. It is for 4j.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は従来における単音節に織方式によるブロック図
、第２図は本発明の一実施例におりる単音節トエ献方式
によるブロック図および帛３図（ａ）は入力バタンにお
けるパワー推移バタンおよび２．３３図（ｂ）は各母音
モデルとの差異傾向何区を示す。図において１，１ａは
制御部、２ａは記憶部、３ａは音声処理部、４ａは入力
バタンバッファ、５ａは比較部、２３は標準パタン辞書
および２４は母音モデル辞書である。代理人　弁理士　松　岡　犬四部Fig. 1 is a block diagram of the conventional monosyllabic method, Fig. 2 is a block diagram of the monosyllabic method according to an embodiment of the present invention, and Fig. 3 (a) shows the power transition of the input bat. Figure 2.33 (b) shows the number of different trends from each vowel model. In the figure, 1 and 1a are control units, 2a is a storage unit, 3a is a voice processing unit, 4a is an input button buffer, 5a is a comparison unit, 23 is a standard pattern dictionary, and 24 is a vowel model dictionary. Agent Patent Attorney Inu Shibe Matsuoka

Claims

【特許請求の範囲】[Claims]

未知入力音声を予め辞書に登録された単音節標準バタン
と照合して行う音声認識装置において、音声処理部は、
話者の単音節標準バタンの登録時にスペクトラム時系列
による分析に基づく標準バタンと共に母音との推移バタ
ン金物音・半母音等の特定の単音節チ対して母音の推移
モデル辞書として作成して記憶部に登録せしめる機能を
備え、制御部は擬似連続発声された入力音声に伴う音声
処理部、経由、照合部における入力バタンと９．ｑ　ｆ
’Ａパタンによる一次照合において、しきい値を満たす
類似度が得られぬとき、または、複数の母音との距離が
近いときには、該入力バタンをモデル辞否の推移バタン
に従って各母音との差異を演算し、各母音との距離の差
異傾向をめ、母音推移モデル辞書と照合して、その母音
認識を行うことを特Ｏ５（とする単音節音声認識方式。In a speech recognition device that performs unknown input speech by comparing it with monosyllabic standard bangs registered in advance in a dictionary, the speech processing unit includes:
When registering a speaker's monosyllabic standard batan, a vowel transition model dictionary is created for specific monosyllabic chis such as the standard batan and a vowel, such as a metal sound or a semi-vowel, and is stored in the storage unit. 9. The control unit is equipped with a function for registering, and the control unit performs input slamming in the audio processing unit, relay, and verification unit in response to pseudo-continuously uttered input speech. q f
In the primary matching using the 'A pattern, if a similarity that satisfies the threshold cannot be obtained, or if the distance to multiple vowels is close, the difference between the input button and each vowel is determined according to the transition button of the model word. A monosyllabic speech recognition method that recognizes the vowel by calculating the difference in distance from each vowel and comparing it with a vowel transition model dictionary.