JPS6332394B2

JPS6332394B2 -

Info

Publication number: JPS6332394B2
Application number: JP56028139A
Authority: JP
Inventors: Masahiko Goto
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1981-02-26
Filing date: 1981-02-26
Publication date: 1988-06-29
Also published as: JPS57141700A

Description

【発明の詳細な説明】この発明は登録操作を伴なう音声認識装置にお
いて誤認識防止策を施したものに関するものであ
る。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition device that involves a registration operation and is provided with measures to prevent misrecognition.

第１図は従来の音声認識装置の一構成例を示す
ものである。マイクロホン１で拾われた音声波形
２は音声分析・特徴抽出回路３内で例えば周波数
スペクトラム分析を受け、スペクトラムの時間構
造を表わす特徴パターン４が抽出される。この特
徴パターン４は次のパターン圧縮回路５で発声時
間の長短にかかわらず一定長の圧縮パターン６に
凝縮される。後続のスイツチ７は学習（登録）／
認識モードを切替えるもので、音声の登録操作時
には点線側、認識実行時には実線側に倒される。 FIG. 1 shows an example of the configuration of a conventional speech recognition device. The audio waveform 2 picked up by the microphone 1 is subjected to, for example, frequency spectrum analysis in the audio analysis/feature extraction circuit 3, and a feature pattern 4 representing the time structure of the spectrum is extracted. This feature pattern 4 is condensed into a compressed pattern 6 of a constant length in the next pattern compression circuit 5, regardless of the length of the utterance time. The subsequent switch 7 is learning (registration)/
This is used to switch the recognition mode, and it is set to the dotted line side when registering a voice, and to the solid line side when performing recognition.

そこで音声登録時、圧縮パターン６はモード切
替スイツチ７の点線側を通り、登録パターンメモ
リ９へ順次書き込まれる。このメモリは登録音声
数Ｎ語分備えられており、通常音声番号No.１、
２、３、…、Ｎと順番に貯えられていく。さて一
通り、例えば100語分の登録が終わると、スイツ
チ７は実線側に切替えられ、認識動作が開始され
る。認識時には圧縮パターン６はスイツチ７の実
線側を通り、入力パターンメモリ８に一時貯えら
れる。このメモリ８は音声入力の都度更新され、
書き替えられる。 Therefore, during audio registration, the compressed pattern 6 passes through the dotted line side of the mode changeover switch 7 and is sequentially written into the registered pattern memory 9. This memory is equipped with the number of registered voices N words, and usually voice numbers No. 1,
2, 3, ..., N are stored in order. Once the registration of, for example, 100 words is complete, the switch 7 is switched to the solid line side and the recognition operation is started. During recognition, the compressed pattern 6 passes through the solid line side of the switch 7 and is temporarily stored in the input pattern memory 8. This memory 8 is updated every time there is a voice input,
Can be rewritten.

次に登録パターンメモリ９からの複数の登録パ
ターン１１は、入力パターンメモリ８から１つの
入力パターン１０と、認識処理回路１２で順次比
較され、両者間の類似度が次々と求められる。そ
して入力音声と最大の類似度をもつ登録音声が選
択され、判定結果１３が出力される。ところで認
識処理回路１２は、誤認識を避ける為、類似度の
監視をも行なつており、類似度がある閾値を越え
た時にのみ、判定制御信号１４を発生する。前記
判定結果１３とこの判定制御信号１４とは転送ゲ
ート１５に導かれており、認識スコアが良好な時
に限つて最終の認識結果１６が出力される。 Next, the plurality of registered patterns 11 from the registered pattern memory 9 are sequentially compared with one input pattern 10 from the input pattern memory 8 in the recognition processing circuit 12, and the degree of similarity between the two is determined one after another. Then, the registered voice with the greatest similarity to the input voice is selected, and the determination result 13 is output. By the way, the recognition processing circuit 12 also monitors the degree of similarity in order to avoid erroneous recognition, and generates the determination control signal 14 only when the degree of similarity exceeds a certain threshold. The judgment result 13 and the judgment control signal 14 are led to a transfer gate 15, and the final recognition result 16 is output only when the recognition score is good.

例えば類似度100を完全一致時、90を閾値とす
ると、ある入力音声の類似度が85の場合、この判
定出力は棄却される事になる。以上の様な判定出
力制御は音声パターンの変動や周囲雑音に対処
し、誤認識を防止する上で重要な機能である。 For example, if a similarity of 100 is a perfect match and a threshold of 90 is used, if the similarity of a given input voice is 85, this judgment output will be rejected. The above-mentioned judgment output control is an important function in dealing with fluctuations in voice patterns and ambient noise, and preventing erroneous recognition.

ところで従来装置では極めて類似度の高い音
声、例えばSIX／スイツクス／、FIX／フイツク
ス／等が不用意に登録される可能性があり、認識
モードで誤認識が多発し、システムの動作の混乱
を惹き起こす要因となつていた。又誤認識が続く
場合には、当然該当音声パターンの更新操作が必
要となり、この間認識動作が中断される為、シス
テムの稼働率を著しく低下させていた。 However, with conventional devices, there is a possibility that voices with extremely high similarity, such as SIX/FIX/, etc., may be registered inadvertently, resulting in frequent erroneous recognition in recognition mode, leading to confusion in system operation. It was a contributing factor. Furthermore, if erroneous recognition continues, it is naturally necessary to update the corresponding voice pattern, and the recognition operation is interrupted during this time, resulting in a significant decrease in system operation rate.

この発明は前記従来装置の有する欠点を除去す
る為に成されたもので、音声登録時にも認識動作
を行なわせ、極めて類似度の高いパターンの登録
を未然に防止する事により、誤認識の発生を抑
え、信頼度の高い音声認識装置を提供せんとする
ものである。 This invention was made in order to eliminate the drawbacks of the conventional device, and it performs recognition operation even during voice registration, and prevents the registration of patterns with extremely high similarity, thereby causing erroneous recognition. The aim is to provide a highly reliable speech recognition device that suppresses the noise.

第２図は本発明による音声認識装置の一実施例
を示す構成図である。同図中２０は登録操作時入
力パターンとそれ以前に登録された登録パターン
との比較を行なうため、入力パターンを登録パタ
ーンメモリ９を経て入力パターンメモリ８に転送
するための転送パス１８とルート切替えスイツチ
１７とからなる転送回路であり、ルート切替えス
イツチ１７はモード切替スイツチ７と同様、認識
実行時には実線側、登録操作時には点線側に倒さ
れる。また１９は上記登録操作時の上記転送ゲー
ト１５の出力である最終認識結果１６を監視し、
類似度の高い、即ち認識動作時の類似度閾値より
も低く設定したリジエクト閾値を越えるパターン
が既に登録されている場合警報を発し、再入力を
促す監視回路である。 FIG. 2 is a block diagram showing an embodiment of a speech recognition device according to the present invention. In the figure, reference numeral 20 indicates a transfer path 18 and a route switch for transferring the input pattern to the input pattern memory 8 via the registered pattern memory 9 in order to compare the input pattern during the registration operation with the previously registered registered pattern. The route changeover switch 17, like the mode changeover switch 7, is turned to the solid line side during recognition execution and to the dotted line side during registration operation. Further, 19 monitors the final recognition result 16 which is the output of the transfer gate 15 during the registration operation,
This is a monitoring circuit that issues an alarm and prompts re-input if a pattern with a high degree of similarity, that is, a pattern exceeding a reject threshold set lower than the similarity threshold at the time of recognition operation, has already been registered.

本装置での音声登録操作は以下の様に行なわれ
る。 The voice registration operation in this device is performed as follows.

例えば第１語として１／イチ／が入力された
時、その圧縮パターン６はスイツチ７の点線側を
通り登録パターンメモリ９のNo.１に書き込まれ
る。第１語目は類似度比較対象が無い為、認識動
作は行なわれない。さて第２語として７／シチ／
が入力されたとする。その圧縮パターン６は登録
パターンメモリ９のNo.２に書き込まれると同時に
転送パス１８及びルート切替えスイツチ１７の点
線側を通り、入力パターンメモリ８へ転送され
る。この入力パターン１０は前回までに登録され
た登録パターン１１、本例では第１語と認識処理
回路１２で類似度が計算される。／イチ／と／シ
チ／では類似度が極めて高く、判定結果１３は第
１語／イチ／を出力し、又判定制御信号１４も発
せられて転送ゲート１５の出力には／イチ／なる
認識結果１６が現われる。即ち／シチ／と音声入
力したにもかかわらず、／イチ／と誤まつた認識
結果が出力された事になる。 For example, when 1/ichi/ is input as the first word, the compressed pattern 6 passes through the dotted line side of the switch 7 and is written into No. 1 of the registered pattern memory 9. Since there is no similarity comparison target for the first word, no recognition operation is performed. Now, as the second word, 7/shichi/
Suppose that is input. The compressed pattern 6 is written into No. 2 of the registered pattern memory 9, and at the same time is transferred to the input pattern memory 8 through the transfer path 18 and the dotted line side of the route changeover switch 17. The recognition processing circuit 12 calculates the degree of similarity between this input pattern 10 and the previously registered registered pattern 11, in this example, the first word. The similarity between /ichi/ and /ichi/ is extremely high, and the judgment result 13 outputs the first word /ichi/, and the judgment control signal 14 is also issued, and the output of the transfer gate 15 is the recognition result /ichi/. 16 appears. In other words, even though /ichi/ was inputted by voice, the recognition result was incorrectly output as /ichi/.

ところで認識結果１６は監視回路１９に導かれ
ており前記誤認識をモニターし、使用者に対して
警報、表示等を発して該当音声の再入力を促す。
そこで第２語を７／ナナ／と入力すると、今度は
第１語１／イチ／との類似度が低い為、登録パタ
ーンメモリ９のNo.２にはその圧縮パターンが保持
され、次の第３語の登録に移る事になる。第３語
登録の際は、前回迄に登録した第１語、第２語と
の類似度比較が行なわれる。以下同様に第Ｎ語入
力時には、既に登録済みの第１〜第（Ｎ―１）語
との類似度判定が逐次実施される事になる。 By the way, the recognition result 16 is led to a monitoring circuit 19, which monitors the erroneous recognition and issues an alarm, display, etc. to the user to urge him or her to re-input the corresponding voice.
Therefore, when the second word is inputted as 7/nana/, the similarity with the first word 1/ichi/ is low, so that compressed pattern is retained in No. 2 of the registered pattern memory 9, and the next word is inputted as 7/nana/. We will move on to registering three words. When registering a third word, a similarity comparison with the first and second words registered up to the previous time is performed. Similarly, when the Nth word is input, similarity determination with the already registered first to (N-1)th words is sequentially performed.

この様にすれば誤認識を誘発する紛わしい音声
や異音の混入した音声、あるいは不明瞭な発声に
より、不良音声パターンが登録されるのを未然に
排除する事ができる。 In this way, it is possible to prevent a defective speech pattern from being registered due to confusing speech, speech mixed with abnormal sounds, or unclear pronunciation that would induce erroneous recognition.

このようにして登録操作が完了したのちの認識
実行時には、モード切替スイツチ７とルート切替
えスイツチ１７とは実線側に倒され、従来装置と
同様の認識処理が行なわれる。但し、この時の認
識処理回路１２の閾値は、前述の登録操作時の閾
値よりも高く設定される。前記の如く登録パター
ンの品質が改善されている為、誤認識の発生を極
めて低く抑える事ができ、認識率の向上を図る事
が可能である。また本実施例では、登録操作時の
リジエクト閾値を認識動作時の類似度閾値よりも
低く設定しているので、類似パターンの登録を厳
しく締め出すことができる。例えば、類似度100
を完全一致時、90を認識動作時の閾値とした場
合、登録操作時のリジエクト閾値を60（類似度60
を越えるものは登録しない）と設定すれば、不良
パターンの登録を未然に阻止することができ、認
識率の向上を図る事が可能となる。 When performing recognition after the registration operation is completed in this manner, the mode changeover switch 7 and route changeover switch 17 are turned to the solid line side, and recognition processing similar to that of the conventional apparatus is performed. However, the threshold of the recognition processing circuit 12 at this time is set higher than the threshold at the time of the registration operation described above. Since the quality of the registered pattern is improved as described above, the occurrence of erroneous recognition can be suppressed to an extremely low level, and it is possible to improve the recognition rate. Furthermore, in this embodiment, since the reject threshold at the time of registration operation is set lower than the similarity threshold at the time of recognition operation, registration of similar patterns can be strictly prohibited. For example, similarity 100
If 90 is the threshold for the recognition operation when there is a perfect match, then the reject threshold for the registration operation is 60 (similarity 60).
If the setting is set such that patterns exceeding 100% are not registered, it is possible to prevent defective patterns from being registered, and it is possible to improve the recognition rate.

尚上記実施例では、音声入力の都度類似度比較
を行なう例を示したが、全語入力終了後類似度判
定を行ない、その中の不良パターンの更新をする
事もできる。また本発明による音声パターン登録
法は、特徴パターン圧縮を行なわず、ダイナミツ
クプログラミング手法により、不等長パターン間
の照合操作を行なう装置にも適用可能である。 In the above embodiment, an example was shown in which similarity comparison is performed each time a voice is input, but it is also possible to perform similarity judgment after inputting all words and update defective patterns therein. Furthermore, the voice pattern registration method according to the present invention can also be applied to an apparatus that performs a matching operation between patterns of unequal length using a dynamic programming method without performing feature pattern compression.

さらに音声以外の他の音響信号や画像信号等の
認識装置にも本発明を拡張し、適用する事が可能
である。 Furthermore, the present invention can be extended and applied to recognition devices for other acoustic signals other than voice, image signals, etc.

以上説明した如く、本発明による音声認識装置
では、学習と認識が併行して行なわれ、登録パタ
ーンの品質向上が図れる為、誤認識の発生が極め
て少ない信頼度の高いシステムを構成する事がで
きる。従つて従来装置の如く、認識動作を中断し
音声パターンの再登録をする手間が減り、システ
ムの稼働率を著しく高める事ができる。また本発
明による音声認識装置は、従来装置に簡単な転送
回路及び監視回路を付加するのみで良く、極めて
容易且つ経済的に実現する事が可能である。 As explained above, in the speech recognition device according to the present invention, learning and recognition are performed in parallel, and the quality of registered patterns can be improved, making it possible to configure a highly reliable system with extremely few occurrences of misrecognition. . Therefore, unlike conventional devices, the effort required to interrupt the recognition operation and re-register the voice pattern is reduced, and the operating rate of the system can be significantly increased. Furthermore, the speech recognition device according to the present invention can be realized extremely easily and economically, by simply adding a simple transfer circuit and a monitoring circuit to the conventional device.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は従来の音声認識装置の一例を示す構成
図、第２図は本発明による音声認識装置の一実施
例を示す構成図である。１…マイクロホン、２…音声波形、３…音声分
析・特徴抽出回路、４…特徴パターン、７…モー
ド切替えスイツチ、８…入力パターンメモリ、９
…登録パターンメモリ、１０…入力パターン、１
１…登録パターン、１２…認識処理回路、１３…
判定結果、１４…判定制御信号、１５…転送ゲー
ト、１６…最終認識結果、１９…監視回路、２０
…転送回路。なお図中、同一符号は同一又は相当
部分を示す。 FIG. 1 is a block diagram showing an example of a conventional speech recognition device, and FIG. 2 is a block diagram showing an embodiment of the speech recognition device according to the present invention. DESCRIPTION OF SYMBOLS 1... Microphone, 2... Audio waveform, 3... Audio analysis/feature extraction circuit, 4... Feature pattern, 7... Mode changeover switch, 8... Input pattern memory, 9
...Registered pattern memory, 10...Input pattern, 1
1...Registered pattern, 12...Recognition processing circuit, 13...
Judgment result, 14... Judgment control signal, 15... Transfer gate, 16... Final recognition result, 19... Monitoring circuit, 20
...transfer circuit. In the figures, the same reference numerals indicate the same or equivalent parts.

Claims

【特許請求の範囲】１マイクロホンからの入力音声波形を音声分析
し特徴パターンを抽出する音声分析・特徴抽出回
路と、この音声分析・特徴抽出回路からの特徴パター
ンを登録操作時と認識実行時とで切替えるモード
切替スイツチと、登録操作時該モード切替スイツチから送られる
特徴パターンを記憶する登録パターンメモリと、認識実行時上記モード切替スイツチから送られ
る特徴パターンを記憶する入力パターンメモリ
と、上記入力パターンメモリからの入力パターンと
登録パターンメモリからの登録パターンとを順次
比較し最大の類似度を持つ登録音声を判定結果と
して出力するとともに、認識実行時にはその類似
度が第１の閾値を越えた時に、登録操作時にはそ
の類似度が上記第１の閾値よりも低く設定した第
２の閾値を越えた時に制御信号を出力する認識処
理回路と、上記制御信号により上記判定結果出力を転送し
最終認識結果を出力する転送ゲートと、登録操作時入力パターンとそれ以前に登録され
たすべての登録パターンとの比較を行なうため上
記入力パターンを上記登録パターンメモリを経て
上記入力パターンメモリに転送するための転送回
路と、上記登録操作時の上記転送ゲートの出力である
最終認識結果を監視し、認識実行時の類似度閾値
よりも低く設定された閾値を越えるパターンが既
に登録されている場合警報を発し再入力を促す監
視回路とを備えたことを特徴とする音声認識装
置。[Scope of Claims] 1. A voice analysis/feature extraction circuit that analyzes an input voice waveform from a microphone and extracts a feature pattern; and a voice analysis/feature extraction circuit that analyzes the input voice waveform from a microphone and extracts a feature pattern; a registration pattern memory that stores the characteristic pattern sent from the mode changeover switch during a registration operation; an input pattern memory that stores the characteristic pattern sent from the mode changeover switch during recognition execution; and the input pattern described above. The input pattern from the memory and the registered pattern from the registered pattern memory are sequentially compared, and the registered voice with the highest degree of similarity is output as a determination result, and when the degree of similarity exceeds a first threshold during recognition execution, a recognition processing circuit that outputs a control signal when the degree of similarity exceeds a second threshold set lower than the first threshold during a registration operation; and a recognition processing circuit that transmits the judgment result output using the control signal and outputs the final recognition result. a transfer gate for outputting; a transfer circuit for transferring the input pattern to the input pattern memory via the registered pattern memory in order to compare the input pattern during the registration operation with all previously registered registered patterns; The final recognition result, which is the output of the transfer gate during the registration operation, is monitored, and if a pattern has already been registered that exceeds the similarity threshold set lower than the similarity threshold during recognition execution, an alarm is issued and the system requires re-input. A voice recognition device characterized by comprising a monitoring circuit for prompting.