JPS5946696A

JPS5946696A - Voice recognition system

Info

Publication number: JPS5946696A
Application number: JP57155983A
Authority: JP
Inventors: 徳子松井; 俊宏木村
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1982-09-09
Filing date: 1982-09-09
Publication date: 1984-03-16

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は、認識対象の各単語に対応して複数組の標準音
声パタンを記憶しておき、入力音声に対する類似度が最
上位となる組の標準音声パタンを認識結果として出力・
表示する音声認識装置において、その認識率を向上させ
るだめの音声認識方式に関するものである。[Detailed Description of the Invention] [Field of Application of the Invention] The present invention stores a plurality of sets of standard speech patterns corresponding to each word to be recognized, and selects the set with the highest degree of similarity to the input speech. Output standard speech patterns as recognition results.
The present invention relates to a speech recognition method for improving the recognition rate of a display speech recognition device.

この種の音声認識装置における従来の音声認識方式は、
例えば、一連のザービスが完了する寸で、内蔵されてい
る＋、Ｌｄ組の全標準音声パタンを使用して音声＝　ｉ
＝処理を行わせるようにしていたので、ある特定の発声
者による特定の単語が、ある特だ組の標準音声パタンに
関して誤認１試、リジェクトを起こし易いことがあり、
そのような場合には、一連の音声認識処理において誤認
識、リジェクトが当該単語について集中・多発をすると
いうおそれかあった。The conventional speech recognition method for this type of speech recognition device is
For example, when a series of services is about to be completed, all standard voice patterns of the built-in + and Ld groups are used to create voice = i.
= Because we had the system perform this processing, a specific word by a specific speaker may easily be rejected in one trial when it is misidentified in relation to a particular set of standard speech patterns.
In such a case, there is a risk that incorrect recognitions and rejections may occur in a concentrated manner or frequently in a series of speech recognition processes.

〔元明の目的〕[Genmei's purpose]

本発明の目的は、上記した従来技術の欠点をなくシ、特
に、発声者別に生ずる、特定単１ｉｎの標／１１；音声
パタンの特定組に対する誤認識を防１．１−Ｌ、ｇ：α
識率を総合的に向上することができる音声、４識方式を
４鹸することにある。An object of the present invention is to eliminate the above-mentioned drawbacks of the prior art, and in particular to prevent erroneous recognition of a specific set of voice patterns that occurs depending on the speaker.
The goal is to improve the voice and 4 cognition methods that can comprehensively improve the cognition rate.

〔発明の概要〕[Summary of the invention]

本発明に係る音声認識方式の（１１′ｆ成は、認識対象
の各単語に対応して複数組の標準音声バタンデータを記
１意しておき、入力音声の特徴抽出を行い、その特徴デ
ータと上記各標準音声バタンデータとの類似度Ｈ」算処
理を行い、その類似度が最上位となる標準音声パタンを
認識結果として判定・出力する機能を有する音声認識装
置において、音声認識処理に先立って最初に入力される
所定のキーワードについてクラスタリングを行っておく
ことにより、そのクラスタリングの結果に基づき、当該
入力音声に対応する標準音声バタンの組を選択し、それ
に従って以後の一連の音声８Ｍ　ｊｊｌ＆処理を行わし
めるように利仰・処理するものである。The (11'f) configuration of the speech recognition method according to the present invention is to write down a plurality of sets of standard speech button data corresponding to each word to be recognized, extract the features of the input speech, and extract the feature data. In a speech recognition device that has a function of calculating the degree of similarity H between the above-mentioned standard speech pattern and each of the above-mentioned standard speech bang data, and determining and outputting the standard speech pattern with the highest degree of similarity as a recognition result, prior to speech recognition processing. By performing clustering on a predetermined keyword that is first input, a set of standard voice buttons corresponding to the input voice is selected based on the clustering results, and a subsequent series of voice 8M jjl & processing is performed accordingly. It is the purpose of advising and disposing of matters so that they are carried out.

これを要するに、音声認識処理の最初に所定のキーワー
ド（ｊ’ｉｌＪえは、各個人の発ｆＶ３特徴の基本とな
る５ｍ音「あ」、「い」、「う」、「え」。In short, at the beginning of the speech recognition process, a predetermined keyword (j'ilJe is the 5 m sound "a", "i", "u", "e", which is the basis of each individual's pronunciation fV3 characteristics).

「お」）を発声せしめ、その各特徴パラメータ（スペク
トラム）を求め、これらと標準音声パタンの各組の対応
語との相互距離を計算し、その最も近い標準音声パタン
の組を当該入力音声の正認識が得られ易い対応するもの
として選択し、以後の音声認識処理を行うようにするも
のである。"O") is uttered, each characteristic parameter (spectrum) is determined, the mutual distance between these and the corresponding word of each set of standard speech patterns is calculated, and the closest set of standard speech patterns is selected from the input speech. A corresponding one that is likely to be correctly recognized is selected, and subsequent speech recognition processing is performed.

〔発明の実施例〕[Embodiments of the invention]

以下、本発明の実施例を図に基づいて説明する。 Embodiments of the present invention will be described below with reference to the drawings.

第１図は、本発明に係る音声認識方式の一実施例の方式
構成図、第２図は、その処理フ「Ｊ−チャートである。FIG. 1 is a system configuration diagram of an embodiment of the speech recognition method according to the present invention, and FIG. 2 is a processing flowchart thereof.

ここで、１は、制御部であって、音声ｉｊＪ識表置の各
部に対する制御をして所要の認識処理を行い、その認識
結果をホスト装置ｉ＃１４ｓＴに伝え、これに所望のザ
ービス蟲埋を行わしめるもの、２ば、認識対象の各単語
に対応して各羨故組の標準１１声バタンデータが用意さ
れている標準音１）イパタンメモリ、３は、標準音声パ
タン・パ択部、４−２、Ｍ声ｎ２織部、５け、そのバタ
ンマツチングの結果に応して人力音声に対する標準音声
バク／の組を判定する判定１′ｌ］Ｓ、６Ｌ」：、人力
８声がら／１１徴ノー　夕を抽出する分析部、７は、音
声入力に係るマイク「７ノ劃ン、８は、認識結果の表示
に係る）イー声合成部、９は、同スピーカ、１０は、認
識結果の確認および繰返し音声人力に係るコンソール部
、１１は、分析された入力音声バタンについてクラスタ
リングを行うだめのクラスタリング部であ）。Here, 1 is a control unit that controls each part of the audio identification table to perform the required recognition processing, transmits the recognition result to the host device i#14sT, and sends the desired service insect burial to the host device i#14sT. 2. A standard sound pattern memory in which standard 11-voice baton data for each group is prepared corresponding to each word to be recognized; 3. A standard voice pattern/pattern selection section; 4. -2, M voice n2 Oribe, 5 digits, Judgment 1'l] that determines the set of standard voice baku/ for human voice according to the result of the bang matching 1'l] S, 6L'':, 8 human voice voice/11 7 is a microphone for voice input; 8 is a voice synthesis unit for displaying recognition results; 9 is a speaker; 10 is a voice synthesis unit for displaying recognition results. The console unit 11 for confirmation and repetition of voice input is a clustering unit for performing clustering on the analyzed input voice button).

まず、音声認識処理に先立ち、制御部１は、音声入力に
対する準備を分析部６．クラスタリング部１１に指示し
、クラスタリングの対象となるべき標準音声パタンを標
準音声バタンメモリ２から選択するように標準音声バタ
ン選択部３に指示する（第２図の処理２１）。First, prior to voice recognition processing, the control unit 1 prepares the analysis unit 6 for voice input. It instructs the clustering section 11 and instructs the standard voice button selection section 3 to select a standard voice pattern to be clustered from the standard voice button memory 2 (process 21 in FIG. 2).

これらの４（へ備が完了すると、発声者に対してキーワ
ード（例えば、母音「あ」、「い」、「う」。Once these four preparations have been completed, the speaker should be asked the keywords (for example, the vowels ``a'', ``i'', ``u'').

「え」、「お」）の音声入力を促すべき入力催告メツセ
ージを出力するよう音声合成部８に指示するのでスピー
カ９から上記入力催告メツセージが放声される（同処理
２２）。The voice synthesizing section 8 is instructed to output an input reminder message to prompt the voice input of "e", "o"), so the input reminder message is emitted from the speaker 9 (process 22).

コレにより、発声者がマイクロフォン７かも上記キーワ
ードの音声を入力すると（同処理２４）、分析部６は、
入力された音声を分析して特徴データを抽出する（同処
理２５）。With this, when the speaker inputs the voice of the above keyword into the microphone 7 (same process 24), the analysis unit 6
The input voice is analyzed to extract feature data (process 25).

ここで、クラスタリング部１１−は、標準音声バタン選
択部３が示す標準片声バタンと」−：記入カｉイ声パタ
ンとの間でクラスタリング（１クリえば、多変量解析の
分野における階層的クラスタリングと同様なもの）を行
い、」二記キーワー　１・が標準音声パタンのいずれの
組に属しく対応）でいるかを調べる（同処理２６）。Here, the clustering unit 11- performs clustering between the standard one-voice slam indicated by the standard voice button selection unit 3 and the input voice pattern (if one click is performed, hierarchical clustering in the field of multivariate analysis is performed). 26), and it is checked to which set of standard speech patterns the keyword 1. belongs (correspondence).

制イ卸部１は、上記クラスタリングの結果より、標準音
声バタン選択部３に対し、以後の音声認識処理に備えて
どの組の標準音声パタンを選択するかを指示する（同処
理２７）。Based on the result of the clustering, the control unit 1 instructs the standard voice button selection unit 3 which set of standard voice patterns to select in preparation for the subsequent voice recognition process (process 27).

次に、発声者に対して本来の音声□想識をずへき音声入
力をするように、入カ催告メッセー　ジを音声合成部８
経出でスピーカ９から放声ゼしめる（同処理２３）。Next, the voice synthesis unit 8 sends an input reminder message to the speaker so that he or she inputs the original voice without thinking.
At the output, a loud voice is emitted from the speaker 9 (same process 23).

発声者がマイクロフォン７がら音声を人力−７−ルと（
同処理２４）、分析部６は、人力さ１１．／こ１゛１声
を分析して特徴データを抽出する（同処理２５）。The speaker uses the microphone 7 to manually record the voice (
The same process 24), the analysis section 6 is performed by human power 11. Analyze each voice and extract characteristic data (process 25).

音声認識部４は、標準音声バタン選択部３が示す標準音
声パタンと、」二記入カ音−辺バタンとの間でパタンマ
ツチングを行い、その結果の類似度を判定部５へ送る（
同処理２８）。The speech recognition section 4 performs pattern matching between the standard speech pattern indicated by the standard speech button selection section 3 and the "2-input cursive sound - side bang," and sends the resulting similarity to the determination section 5 (
Same process 28).

判定部５は、類似度が最」１位のものを認識結果とし７
て制御？ｉ′ＩＳ１へ送る（同処理２９）。　　　　□
人力音声に対して最も確からしい類似度の値が低くて認
識結果として決定するのは疑わしいとすべきりジエクト
の場合には、制？ｉ１１＋部１は、標準音声パタン選択
部３に対して今寸でと同一の標準音声パタンを選択する
ように指ンＪミしく同処理３０）、更に音声合成部８に
対して再音声人力を促すメツセージをスピーカ９から放
声せしめるように制御する（同処理３１）。これにより
、上述の処理２４以降が繰り返される。The determination unit 5 selects the one with the highest degree of similarity as the recognition result 7
Control? Send it to i'IS1 (same process 29). □
If the value of the most probable similarity to a human voice is low and the recognition result is questionable, is there a restriction? The i11+ unit 1 instructs the standard voice pattern selection unit 3 to select the same standard voice pattern as the current one (30), and also instructs the voice synthesis unit 8 to re-speech human power. A prompting message is controlled to be emitted from the speaker 9 (process 31). As a result, the above-described process 24 and subsequent steps are repeated.

制御部１ば、その認識結果が正しいものであるか否かを
発声者に確認させるだめの表示として、確認要求メツセ
ージを音声合成部８から出力させ、それをスピーカ９か
ら放声さぜる（同処理３２）。The control unit 1 outputs a confirmation request message from the voice synthesis unit 8 and sounds it from the speaker 9 as a display for the speaker to confirm whether or not the recognition result is correct. Processing 32).

発声者は、これを聴取して入力音声が正しく認識された
のか、誤Ｍｇ　ｉｉ＋ｉ’！されたのかを知り、その旨
をコンノール部１０から開側］部１へ入力する（同処理
３３）。The speaker hears this and wonders if the input voice has been correctly recognized or if the input voice is incorrectly Mg ii+i'! The controller 10 learns whether it has been opened and inputs the information from the control section 10 to the open side section 1 (same process 33).

制御部１−＼の認識結果の正否の確認人力は、必ずしも
コンノール部１０における操作による必要はなく、マイ
クロフォン７から確認用音声の入力に」：ってもよいが
、その内容は音声認識が確実に行われるように、１４１
単で誤認識しにくいものであることが望寸しい。The human power to confirm whether the recognition result of the control unit 1-\ is correct or not does not necessarily have to be done by operating the control unit 10, but it may be possible to input confirmation voice from the microphone 7, but the content is certain to be determined by voice recognition. as done in 141
It is desirable that it be simple and difficult to misrecognize.

制姐ｊ部］は、上記確認情報により、上述の認識候補が
正しいものであるときは、それを認識結果としてホスト
装置１−Ｉ　Ｓ　Ｔへ送出し、１つの人力音声に対する
処理を終了せしめて次の人力に備える。If the above-mentioned recognition candidate is correct based on the confirmation information, the control unit sends it as a recognition result to the host device 1-IST, and ends the processing for one human voice. Prepare for the next manpower.

一方、誤認識であったという確認’１ｉＩｒ報を受けだ
ときには、？１ｔｌＪ御部１は、上述のリジェクトの場
合と同様に、音声合成部８Ｖこ対し、画び回−の１°１
声入力をするようにメツセージ送出をせしめ、用度、上
述の処理２４以降の認識を行う。、以上の動作を一連のザービスが完了する斗で繰り返して
行う。On the other hand, when I received confirmation that it was a misrecognition, what happened? 1tlJ control section 1, as in the case of reject mentioned above,
The message is sent out in the same manner as the voice input, and the purpose and the above-mentioned process 24 and subsequent steps are recognized. , The above operations are repeated until a series of services are completed.

このように本実施例によれは、発声者による！１１１定
の単語の、特定の組の標準音声パタンに対ずイ）誤認識
が減少し、認識率を向上せしめることができる。In this way, in this embodiment, the error depends on the speaker! (a) Misrecognition can be reduced and the recognition rate can be improved for a specific set of standard speech patterns of 111 fixed words.

〔発明の効果〕〔Effect of the invention〕

以上、詳１１ｉ１１に説明したように、本発明によれば
、谷発声者ごとに適合した標準音声パタンによって音声
ＭＩＪ　ｉｊｍ　Ａ理をすることができるので、特定者
による発声が標準音声パタンの特定絹に誤認識されるこ
とが減少し、音声認識装置における認識率向上。As explained above in detail 11i11, according to the present invention, it is possible to perform voice MIJ ijm A management using a standard voice pattern adapted to each voice speaker. This reduces the number of false recognitions and improves the recognition rate of speech recognition devices.

ザービス性向上に顕著な効果が伶られる。It has a remarkable effect on improving serviceability.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は、本発明に係る音声認識方式の一実施例の方式
構成図、第２図は、その処理フローチャー１・である。１・・・制ｆｌ１１ｊ部、２・・・標準音声パフ／メモ
リ、３・・・標準音声バタン選択部、４・・・音声認識
部、５・・・判定部、６・・・分析部、７・・・マイク
ロフォン、８・・・音声合成部、９・・・スピーカ、１
０・・・コンソール部、１１’−”７ｉ　２７　’）　
’−り“Ｊ３．　代理人　弁理士　福田幸作（ほか１名
）＄　１　目FIG. 1 is a system configuration diagram of an embodiment of the speech recognition method according to the present invention, and FIG. 2 is a processing flowchart 1. DESCRIPTION OF SYMBOLS 1... Control fl11j section, 2... Standard voice puff/memory, 3... Standard voice button selection section, 4... Speech recognition section, 5... Judgment section, 6... Analysis section, 7...Microphone, 8...Speech synthesis unit, 9...Speaker, 1
0...Console part, 11'-"7i 27')
'-ri' J3. Agent Patent attorney Kosaku Fukuda (and 1 other person) $ 1

Claims

【特許請求の範囲】[Claims]

１、認識対象の各単語に対して泉数組の標準音声パタン
データを記憶しておき、人力音声の特徴抽出を行い、そ
の特徴データと」二記谷標準音声パタンデータとの類似
度計算処理を行い、その類似度が最上位となる標準音声
パタンを認識結果として判定・出力する・吹射を有する
音声認識装置において、音声認識処理に先立って最初に
入力される所定のキーワードについてクラスタリングを
行っておくことにより、そのクラスタリングの結果に基
づき、尚該入力音声に対応する標準音声パタンの組を選
択し、それに従って以後の一連の音声認識処Ｊｇｊｌを
行わしめるように制御・処理することを特徴とする音声
認識方式。1. Store Izumi's set of standard speech pattern data for each word to be recognized, extract the features of the human speech, and calculate the similarity between the feature data and the Nikiya standard speech pattern data. , and determines and outputs the standard speech pattern with the highest degree of similarity as a recognition result.In a speech recognition device with injection, clustering is performed for a predetermined keyword that is first input prior to speech recognition processing. Based on the result of clustering, a set of standard speech patterns corresponding to the input speech is selected, and the subsequent series of speech recognition processing Jgjl is controlled and processed accordingly. A voice recognition method that uses