JPS5917598A

JPS5917598A - Voice recognition system

Info

Publication number: JPS5917598A
Application number: JP57125837A
Authority: JP
Inventors: 徳子松井
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1982-07-21
Filing date: 1982-07-21
Publication date: 1984-01-28

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は、認識対象の各単語に対応して複数組の標準音
声パタンを内蔵（格納、記憶）している音声認識装置に
おいて、特に、認識結果として出力するには当該類似度
が充分に高くないので繰返し認識（再認識）を行うべき
リジェクトのときに、その訂正処理の効率向上を図るた
めの音声認識方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention provides a speech recognition device that incorporates (stores and stores) a plurality of sets of standard speech patterns corresponding to each word to be recognized. The present invention relates to a speech recognition method for improving the efficiency of correction processing when a rejection is required to be repeatedly recognized (re-recognized) because the similarity is not high enough.

この種の音声認識装置における従来の音声認識方式は、
例えば、一連のザービスが完了するまで、内蔵されてい
る複数組の全標準音声パタンを使用して音声認識処理を
行わせるようにしていたので、ある特定の発声者または
一般の発声者による特定の単語が、ある特定組の標準音
声パタンに関して誤認識、リジェクトを起こし易いもの
であると、一連の音声認識処理において誤認識、リジェ
クトが当該単語について集中・多発をするというおそれ
があった。The conventional speech recognition method for this type of speech recognition device is
For example, until a series of services are completed, speech recognition processing is performed using multiple sets of built-in standard speech patterns. If a word is likely to be erroneously recognized or rejected with respect to a certain set of standard speech patterns, there is a risk that erroneous recognitions or rejections will be concentrated or occur frequently for the word in a series of speech recognition processes.

＋ｆ、た、誤認識、リジェクトが発生したとき顛は同一
内容のものを再発声させ、かつ、誤認識、リジェクトに
おけるものと全く同一内容の音声認識処理を行わせるよ
うにしていた。+f, When an erroneous recognition or rejection occurs, the same content is re-uttered, and the speech recognition process is performed with exactly the same content as that in the erroneous recognition or rejection.

しかしながら、誤認識、リジェクトをしたということは
、その発声音声バタンか、各標準音声パタンのうちで真
に上記発声音声パタンに近いものとして認識・判定をさ
れるべき標準音声パタンよりも、誤認識、リジェクトの
対象となった標準音声バタンの方に近かったということ
である。However, the fact that it was misrecognized or rejected means that the uttered sound is incorrectly recognized, or the standard sound pattern that should be recognized and judged as truly close to the above uttered sound pattern among the standard sound patterns. , which was closer to the standard voice button that was rejected.

したがって、」二連のように同一内容の音声認識処理を
繰り返しても、反復して同様な誤認識、リジェクトとな
る確率が高く、正しい認識結果が得られる捷でには、相
当に多くの発声の繰返し２をしなければならないので、
認識に要する時間が長くなるとともに、発声者に対する
負担も大きくなるという問題があった。Therefore, even if the speech recognition process is repeated for the same content twice, there is a high probability that the same erroneous recognition or rejection will occur over and over again. Since we have to repeat 2,
There is a problem in that the time required for recognition becomes longer and the burden on the speaker becomes heavier.

本発明の目的は、上記した従来技術の欠点をなりシ、特
にリジェクト時の訂正処理の効率向上を可能とする音声
認識方式を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a speech recognition method that overcomes the above-mentioned drawbacks of the prior art and makes it possible to improve the efficiency of correction processing, especially in the case of rejection.

本発明に係る音声認識方式の構成は、認識対象の各単語
に対応して複数組の標準音声バタンデータを記憶してお
き、入力音声の特徴抽出を行い、その特徴データと上記
標準バタンデータとの類似度計算処理を行い、その類似
度が最上位となる標準音声パタンを認識結果として判定
・出力する機能を有する音声認識装置において、入力音
声の認識処理を最初に所定組の標準音声パタンについて
行い、リジェクトとなったときは、上記所定組の標準音
声パタンを除き他の組の標準音声パタンを用いて繰返し
音声認識処理を行わしめるように制御・処理するもので
ある。The configuration of the speech recognition method according to the present invention is to store a plurality of sets of standard speech bang data corresponding to each word to be recognized, extract features of the input speech, and combine the feature data with the standard speech bang data. In a speech recognition device that has a function of performing similarity calculation processing and determining and outputting the standard speech pattern with the highest similarity as a recognition result, the recognition processing of input speech is first performed on a predetermined set of standard speech patterns. If the recognition result is rejected, control is performed so that the speech recognition process is repeatedly performed using standard speech patterns of other sets except for the predetermined set of standard speech patterns.

以下、本発明の実施例を図に基づいて説明する。Embodiments of the present invention will be described below with reference to the drawings.

第１図は、本発明に係る音声認識方式の一実施例の方式
構成図、第２図は、その処理フローチャートである。FIG. 1 is a system configuration diagram of an embodiment of the speech recognition method according to the present invention, and FIG. 2 is a processing flowchart thereof.

ここで、１は、認識対象の各単語について複数組の標準
音声バタンデータを格納（記憶）している標準音声バタ
ンメモリ、２は、その選択制御をする標準音声バタン選
択部、３は、音声入力に係るマイクロフォン、４は、そ
の入力音声の特徴抽出をする分析部、５は、その特徴デ
ータと標準音声バタンデータとの類似度計算処理（バタ
ンマソチング処理）を行う音声認識部、６は、その処理
結果に基づき入力音声に対する類似度が高い標準音声パ
タンの組を判定する判定部、７は、認識結果の表示に係
る音声合成部、８は、同スピーカ、９は、認識結果の確
認および繰返し音声入力の指示に係るコンソール部、１
０は、上記各部に対する制御その他所要の処理を行う制
御部、１１は、認識結果に基づいて所望の処理をするホ
スト装置である。Here, 1 is a standard voice button memory that stores (memorizes) multiple sets of standard voice button data for each word to be recognized, 2 is a standard voice button selection unit that controls the selection, and 3 is a voice button memory. 4 is a microphone related to input; 4 is an analysis unit that extracts features of the input voice; 5 is a speech recognition unit that performs similarity calculation processing (bang masoching process) between the feature data and standard speech bang data; 6 is a , a determination unit that determines a set of standard speech patterns that have a high degree of similarity to the input voice based on the processing results; 7 is a speech synthesis unit that displays the recognition results; 8 is the same speaker; 9 is a confirmation of the recognition results. and a console unit related to instructions for repeated voice input, 1
0 is a control unit that controls the above-mentioned units and other necessary processes, and 11 is a host device that performs desired processing based on the recognition result.

まず、マイクロフォン３からの入力音声の認識に先立ち
、制御部１０は、音声入力に対する準備を分析部４．音
声認識部５に指示し、ｉ！た、その時の認識対象となる
べき所定組（例えば、男９女用の標準的なもの各１組）
の標準音声パタンを標準音声バタンメモリ１から選択す
るように標準音声バタン選択部２に指示する（第２図の
処理２１）。First, prior to recognizing the input voice from the microphone 3, the control unit 10 prepares the analysis unit 4 for the voice input. Instructs the voice recognition unit 5 to perform i! In addition, a predetermined set that should be recognized at that time (for example, one standard set for each of 9 men and 9 women)
The standard voice button selection unit 2 is instructed to select the standard voice pattern from the standard voice button memory 1 (process 21 in FIG. 2).

これらの準備が完了すると、発声者に対して音声入力を
促すべき入力催告メソセージを出力するよう音声合成部
７．に指示し、スピーカ８から同メツセージを放声せし
める（処理２２）。When these preparations are completed, the voice synthesis unit 7. outputs an input reminder message to prompt the speaker to input voice. and causes the speaker 8 to emit the same message (process 22).

これにより、発声者がマイクロフォン３から音声入力を
すると（処理２３）、分析部４は、その入力音声の音声
分析をして特徴抽出を行う（処理２４）。As a result, when the speaker inputs voice from the microphone 3 (process 23), the analysis unit 4 analyzes the input voice and extracts features (process 24).

音声認識部５は、上述のように制御部１０からの制御に
より標準バタン選択部２が選択・指示する標準音声パタ
ンのデー、夕と上記入力音声の特徴データとの間で類似
度計算処理（バタンマツチング処理）を行い、上記各組
の中から入力音声との類似度が最上位のものを認識結果
の候補とするとともに、すべての認識結果、類似度を判
定部６に伝える（処理２５）。As described above, the speech recognition section 5 performs a similarity calculation process between the standard speech pattern selected/instructed by the standard button selection section 2 and the characteristic data of the input speech under the control of the control section 10 as described above. The one with the highest degree of similarity to the input voice is selected as a recognition result candidate from among the above-mentioned sets, and all recognition results and degrees of similarity are transmitted to the determination unit 6 (process 25). ).

判定部６は、類（Ｊ、Ｊ度が最上位の標準バタンを判定
し、そのデータを制御部１０へ報告する（処理２６）。The determination unit 6 determines the standard baton with the highest class (J, J degree), and reports the data to the control unit 10 (processing 26).

制御部１０は、認識結果の類似度が前もって定められた
定数（リジェクト定数〕よシも低く、認識結果として出
力するには疑わしいものとみなすべきもの（リジェクト
）に該当するかどうかを判断しく判断２７）、リジェク
トの場合には、標準音声バタン選択部２に対して今まで
使用してきた標準音声バタン以外の残りの組を選択する
ように指示しく処理３３）、Ｗに音声合成部に対して再
び同一内容の入力催告メツセージを出力するよう指示シ
、同メツセージをスピーカ８から放声せしめる（処理３
４）。これにより、上述と同様な処理２３以降の再認識
処理（繰返し認識処理）が行われる。The control unit 10 determines whether the similarity of the recognition result is lower than a predetermined constant (rejection constant) and falls under a category that should be considered suspicious (reject) to be output as a recognition result. 27), in the case of rejection, instructs the standard voice button selection section 2 to select the remaining set of standard voice buttons other than the standard voice button that has been used up until now. Instructs to output an input reminder message with the same content again, and causes the same message to be emitted from the speaker 8 (processing 3
4). As a result, re-recognition processing (repeated recognition processing) after processing 23 similar to that described above is performed.

一方、リジェクトでない場合には、その認識結果の候補
が正しいものであるか否かを発声者に確認させるための
表示として、確認要求メツセージを音声合成部７経由で
スピーカ８から放声せしめる（処理２８）。On the other hand, if the recognition result is not rejected, a confirmation request message is emitted from the speaker 8 via the speech synthesis section 7 as a display for the speaker to confirm whether or not the recognition result candidate is correct (process 28 ).

発声者は、これを聴取して入力音声が正しい認識（正認
識）をされたのか、誤った認識（誤認識）をされたのか
を知り、その旨をコンソール部９から制御部１０へ入力
する（処理２９）。The speaker listens to this and knows whether the input voice has been recognized correctly (correct recognition) or incorrectly (misrecognition), and inputs this information from the console unit 9 to the control unit 10. (Process 29).

との認識結果の正否の確認入力は、必ずしもコンソール
部９における操作による必要はなく、マイクロフォン３
からの確認用音声入力によってもよいが、その内容は、
音声認識が確実に行われるように簡単で誤認識をしにく
いものであるものが望ましい。The input to confirm whether the recognition result is correct or incorrect does not necessarily have to be done by operating the console section 9, but by using the microphone 3.
It is also possible to use a confirmation voice input from
It is desirable to have something simple and difficult to misrecognize so that voice recognition can be performed reliably.

制御部１０は、」二記確認情報により正認識（上記認識
候補が正しいものであること）であったか否かの判断を
しく判断３０）、正認識であった場合には、必要に応し
て上記認識結果をホスト装置１１へ送出せしめるととも
に、標準音声バタン選択部２に対して次の認識に備えて
今まで使用してきた標準音声バタンと同一のものを選択
するように指示しておく（処理３１）。The control unit 10 determines whether or not the recognition was correct (that the above recognition candidate is correct) based on the confirmation information described in "2" (30), and if the recognition is correct, the The above recognition result is sent to the host device 11, and the standard voice button selection unit 2 is instructed to select the same standard voice button that has been used so far in preparation for the next recognition (processing 31).

更に、一連のサービス動作が終了したか否かを判断しく
判断３２）、終了しでいないときは、町び前述の処理２
２へ戻って次の入力音声の認識処理を行い、終了してい
るときは、全認識結果をホスト装置１１へ送出し、１つ
の入力音声に対する認識処理を終了し、次の人力に備え
る。Furthermore, it is determined whether the series of service operations has been completed or not (32), and if it has not been completed, the process 2 described above is carried out.
2, the recognition process for the next input voice is performed, and when the recognition process has been completed, all recognition results are sent to the host device 11, the recognition process for one input voice is completed, and the next input voice is prepared.

一方、誤認識であった場合には、制御部１０は、標準音
声バタン選択部２に対して今丑で使用してきた標準音声
バタンと同一の組を選択するように指示しく処理３５）
、再入力催告の放声をせしめ（処理３６）、前述と同様
な処理２３以下の繰返し認識処理が行われる。On the other hand, if it is a misrecognition, the control unit 10 instructs the standard voice button selection unit 2 to select the same set of standard voice buttons that have been used in the current situation (35).
, a re-input reminder is emitted (process 36), and the repeated recognition process from process 23 as described above is performed.

このようにして、リジェクトが発生しても未使用の標準
音声バタンによって繰返し認識処理を行うので、リジェ
クトを繰り返すという確率が低くなる。In this way, even if a rejection occurs, the recognition process is performed repeatedly using an unused standard voice button, so that the probability of a rejection being repeated is reduced.

以上、詳細に説明したように、本発明によれば、リジェ
クトとなったときの訂正処理の効率を向上し、ひいては
総合的な認識率の向上をすることができるので、この種
の音声認識システムにおける信頼性、サービス性、効率
の向上に顕著な効果が得られる。As described above in detail, according to the present invention, it is possible to improve the efficiency of correction processing when a rejection occurs, and further improve the overall recognition rate. This will have a significant effect on improving reliability, serviceability, and efficiency.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は、本発明に係る音声認識方式の一実施例の方式
構成図、第２図は、その処理フローチャートである。１・・・標準音声バタンメモリ、２・・・標準音声バタ
ン選択部、３・・・マイクロフォン、４・・・分析部、
５・・・音声認識部、６・・・判定部、７・・・音声合
成部、８・・・スピーカ、９・・・コンソール部、１０
・・・制御部、１１・・・ホスト装置。代理人　弁理士　福田幸作（ほか１名）茅ｌ目FIG. 1 is a system configuration diagram of an embodiment of the speech recognition method according to the present invention, and FIG. 2 is a processing flowchart thereof. 1... Standard voice button memory, 2... Standard voice button selection section, 3... Microphone, 4... Analysis section,
5... Speech recognition section, 6... Judgment section, 7... Speech synthesis section, 8... Speaker, 9... Console section, 10
...Control unit, 11...Host device. Agent: Patent attorney Kosaku Fukuda (and 1 other person)

Claims

【特許請求の範囲】[Claims]

１、認識対象の各単語に対応して複数組の標準音声バタ
ンデータを記憶しておき、入力音声の特徴抽出を行い、
その特徴データと上記標準バタンデータとの類似度計算
処理を行い、その類似度が最上位となる標準音声パタン
を認識結果として判定・出力する機能を有する音声認識
装置において、入力音声の認識処理を最初に所定組の標
準音声パタンについて行い、リジェクトとなったときは
、上記所定組の標準音声パタンを除き他の組の標準音声
パタンを用いて繰返し音声認識処理を行わしめるように
制御・処理することを特徴とする音声認識方式。1. Store multiple sets of standard voice button data corresponding to each word to be recognized, extract features of the input voice,
A speech recognition device that has a function of calculating the similarity between the feature data and the above-mentioned standard bang data, and determining and outputting the standard speech pattern with the highest similarity as the recognition result, performs the recognition processing of the input speech. Control and processing is performed on a predetermined set of standard voice patterns first, and when a rejection occurs, the speech recognition process is performed repeatedly using other sets of standard voice patterns except for the above predetermined set of standard voice patterns. A voice recognition method characterized by: