JPS61230198A

JPS61230198A - Voice recognition equipment

Info

Publication number: JPS61230198A
Application number: JP60070265A
Authority: JP
Inventors: 杉田　守男
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1985-04-03
Filing date: 1985-04-03
Publication date: 1986-10-14

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は、不特定話者を対象にし、認識結果の出力によ
り、他の装置の制御、情報交換等に利用する音声認識装
置に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a speech recognition device that targets unspecified speakers and is used for controlling other devices, exchanging information, etc. by outputting recognition results.

従来の技術第２図は従来の音声認識装置の構成を示している。第２
図において、１は音声入力信号をアナログデジタル変換
し、１０〜２０ｍ５ｅｃのフレーム区間毎に特徴パラメ
ータを抽出する音声分析部であり、３は特徴パラメータ
の形で認識すべき単語をあらかじめ分析したデータを蓄
えてお（標準パターンメモリである。そしてこのメモリ
３に蓄えられる標準パターンは各単語毎に入力音声のバ
ラツキをカバーするため複数個用意されている。２は音
声分析部ｌの出力と標準パターンメモリ３の標準パ標準
パターンとのマツチングをとるマツチング部である。モ
して５は、各単語のマツチング度合から最小値を選択し
、それに対応した単語を認識結果として出力する単語判
定部である。BACKGROUND OF THE INVENTION FIG. 2 shows the configuration of a conventional speech recognition device. Second
In the figure, 1 is a voice analysis unit that converts the voice input signal into analog and digital and extracts feature parameters for each frame section of 10 to 20 m5ec, and 3 is a voice analysis unit that converts the voice input signal into analog to digital and extracts feature parameters for each frame section of 10 to 20 m5ec. A plurality of standard patterns are stored in the memory 3 to cover variations in the input speech for each word. 2 is the output of the speech analysis section l and the standard pattern. This is a matching unit that performs matching with the standard pattern in memory 3.Moreover, 5 is a word determining unit that selects the minimum value from the matching degree of each word and outputs the corresponding word as a recognition result. .

次に上記従来例の動作について説明する。第２図におい
て、使用者が発生した音声は音声入力信号として、音声
分析部１に入力され、ここでアナログデジタル変換され
、ｌθ〜２０ｉｓの適当なフレーム区間毎に特徴パラメ
ータが抽出され距離計算部２に送られる。距離計算部２
では標準パターンメモリ３に蓄積された音声標準パター
ン２との距離をリアルタイムで時系列的に計算し、その
結果をマツチング部４に伝送する。したがってマツチン
０　　　　　グ部４では時間軸の補正等を行いながら一
単語としての距離値からマツチングの度合を計算する。Next, the operation of the above conventional example will be explained. In FIG. 2, the voice generated by the user is input as a voice input signal to the voice analysis section 1, where it is analog-digital converted, feature parameters are extracted for each appropriate frame section from lθ to 20is, and the distance calculation section Sent to 2. Distance calculation section 2
Then, the distance from the standard voice pattern 2 stored in the standard pattern memory 3 is calculated in real time in time series, and the result is transmitted to the matching section 4. Therefore, the matching unit 4 calculates the degree of matching from the distance value for one word while correcting the time axis.

そして単語判定部５で、音声入力信号と標準パターン（
複数個）とのマツチング度合の最小値を選択し、認識結
果として出力する。尚音声入力信号が、著しく標準パタ
ーンとかけ離れている場合にはマツチング部４または単
語判定部５でリジェクトされ、認識不可能という結果を
出力する。Then, the word judgment unit 5 compares the audio input signal with the standard pattern (
selects the minimum value of the degree of matching with multiple items) and outputs it as a recognition result. If the audio input signal is significantly different from the standard pattern, it will be rejected by the matching unit 4 or the word determining unit 5, and a result indicating that it is unrecognizable will be output.

このように、上記従来の音声認識装置でも、使用者が音
声入力すると、その認識結果を得ることができる。In this way, even with the above-mentioned conventional speech recognition device, when the user inputs speech, the recognition result can be obtained.

発明が解決しようとする問題点しかしながら、上記従来の音声認識装置では、使用者が
不特定である場合、種々の音声入力を考慮し、標準パタ
ーンを、膨大なデータベース、すなわちあらゆる条件（
人、発声のくせ、方言等）での発声サンプルをもとにし
て、コンピュータ等で分析し、代表的なものを選択登録
する必要があり、要求性能に対して幾何級数的に作業量
およびコストが増大するという問題があった。そして、
その様にして、作成した標準パターンであっても、ある
特定の人に対しては、認識しにくいという問題があり、
さらに使用音信としてみれば、はする不信感を抱いてし
まうという問題があった。Problems to be Solved by the Invention However, in the case where the user is unspecified, the conventional speech recognition device takes into account various speech inputs and stores standard patterns in a huge database, that is, under all conditions (
It is necessary to analyze vocal samples using a computer, etc., and select and register representative ones based on vocal samples (from people, vocal habits, dialects, etc.), and the amount of work and cost increases exponentially relative to the required performance. There was a problem of increasing and,
Even if the standard pattern is created in this way, there is a problem that it is difficult for certain people to recognize it.
Furthermore, when viewed as a message in use, there was a problem in that it gave rise to a sense of distrust.

本発明は以上のような従来の欠点を除去するものであり
、簡単な構成で認識の度合を音声に判定できる優れた音
声認識装置を提供することを目的とするものである。The present invention eliminates the above-mentioned conventional drawbacks, and aims to provide an excellent speech recognition device that can determine the degree of recognition based on speech with a simple configuration.

問題点を解決するための手段本発明は上記目的を達成するために、マツチング度を指
数化するマツチング指数計算部を設け、使用者が発声し
た音声が、認識装置の持つ標準音声パターンとどこまで
マツチングがとれているか、可視または可聴的に知らせ
る様にしたものである。Means for Solving the Problems In order to achieve the above object, the present invention provides a matching index calculation unit that indexes the degree of matching, and determines to what extent the voice uttered by the user matches the standard voice pattern possessed by the recognition device. It is designed to visually or audibly notify whether the

作用したがって本発明によれば、使用者が発声することによ
りマツチング指数を、たとえば１００点満点で何点とい
うように、ディスプレイまたは音声で知らせてくれるの
で、使用者はより高い点数を取る様に自分の発声に注意
し努力するようになり、また、ミスマツチングが著しい
場合には標準音声を用いて発声を訓練して（れるので機
械が自分の声を認識するためには、どの様に発声すれば
μいかが容易に理解でき、認識装置にとってより望まし
い発声を入力してくれる様になるという利点を有する。Therefore, according to the present invention, when the user speaks, the user is informed of the matching index by display or voice, for example, how many points out of 100, so the user can motivate himself or herself to get a higher score. In addition, if the mismatching is significant, you can train your vocalizations using a standard voice (so that you can learn how to speak in order for the machine to recognize your voice). This method has the advantage that it is easy to understand the meaning of μ, and inputs a more desirable utterance for the recognition device.

実施例第１図は本発明の一実施例の構成を示すものである。第
１図において、１は音声分析部、２は距離計算部、３は
標準パターンのメモリ部、４はマツチング部、５は単語
判定部であり、入力音声から認識結果に至る構成および
動作は従来例と同じである。そして６はマツチング部４
および単語判定部５からの、距離値および判定結果の出
力データから、好みのマツチング指数に計算するマツチ
ング指数計算部、７は計算部６の計算結果を可視的に表
示する表示部、８は計算部６の計算結果によりメモリー
９内のガイダンス音声パターンまたはメモリー１１内の
マツチ指数パターンまたは、メモリー１１内の標準発声
パターンを選択あるいは組合せて、前記各パターンメモ
リ９．１０．１１から合成音声を作成して、出力音声と
して送り出す音声合成部である。Embodiment FIG. 1 shows the configuration of an embodiment of the present invention. In Fig. 1, 1 is a speech analysis section, 2 is a distance calculation section, 3 is a standard pattern memory section, 4 is a matching section, and 5 is a word judgment section.The structure and operation from input speech to recognition results are conventional. Same as example. and 6 is the matching part 4
and a matching index calculation unit that calculates a desired matching index from the output data of distance values and determination results from the word determination unit 5; 7 is a display unit that visually displays the calculation results of the calculation unit 6; 8 is a calculation unit; Based on the calculation results of section 6, the guidance voice pattern in memory 9, the matching index pattern in memory 11, or the standard vocalization pattern in memory 11 is selected or combined to create a synthesized voice from each of the pattern memories 9.10.11. This is a voice synthesis unit that outputs the output voice as output voice.

次に上記実施例の動作について説明する。上記実施例に
おいて、使用者が入力音声を発声すると音声分析部１が
音声分析し、距離計算部２が標準パターンメモリＨ３の
標準パターンと比較する。Next, the operation of the above embodiment will be explained. In the above embodiment, when the user utters an input voice, the voice analysis section 1 analyzes the voice, and the distance calculation section 2 compares it with the standard pattern in the standard pattern memory H3.

そして、このことによって距離計算を行い、マツチング
部４がメモリー３内の標準パターンとのマツチング度合
を計算し、単語判定部５を通して認識結果を出力する。Then, distance calculation is performed based on this, and the matching section 4 calculates the degree of matching with the standard pattern in the memory 3, and outputs the recognition result through the word judgment section 5.

そしてマツチング部４および単語判定部５の結果が更に
マツチング指数計算部６に送られ、ここで人間がわかり
易い指数、たとえば１００点満点法等に計算される。そ
して、可視表示部７に送られて、可視的なディスプレイ
手段により表示されるとともに、音声合成部８に送られ
、マツチング指数パターンメモリ内ｌＯのパターンに基
く音声合成音が作成されて使用者に通知される。さらに
、マツチング部４におけるマツチング度合あるいはマツ
チング指数計算部６におけるマツチ指数が著しく低い場
合は、ガイダンス音声パターンメモリ９内のガイダンス
パターンにより音声合成部８が使用者に発声練習モード
に入った事を知らせ、標準発声パターンメモリ１１内の
標準発声パターンにより音声合成部８が、標準的な発声
音を出し、使用者が反復した時に再び、マツチ指数を知
らせるという繰返し動作を行なう。The results of the matching unit 4 and the word determining unit 5 are further sent to a matching index calculation unit 6, where they are calculated into an index that is easy for humans to understand, such as a 100-point scale. Then, it is sent to the visual display section 7 and displayed on a visual display means, and is also sent to the speech synthesis section 8, where a speech synthesized sound based on the pattern lO in the matching index pattern memory is created and presented to the user. Be notified. Furthermore, if the matching degree in the matching section 4 or the matching index in the matching index calculation section 6 is extremely low, the speech synthesis section 8 notifies the user that the user has entered the vocal practice mode using the guidance pattern in the guidance speech pattern memory 9. , the speech synthesis unit 8 produces a standard vocalization sound according to the standard vocalization pattern in the standard vocalization pattern memory 11, and when the user repeats it, performs a repetitive operation of informing the user of the match index again.

このように、上記実施例によれば、使用者が発声した音
声が、認識装置の標準パターンに対してどの位の差異が
あるかを、わかり易い指数で可視的または可聴的に知る
ことができ、マンマシンインタフェースとして使い易い
システムを構成することができる。さらに、従来のもの
ではリジェクトしていたミスタッチングの著しい音声で
も上記実施例によれば音声合成により正しい発声を聞き
、自分の発声を訓練することができるという効果を有す
る。In this way, according to the above embodiment, it is possible to visually or audibly know how much the voice uttered by the user differs from the standard pattern of the recognition device using an easy-to-understand index. A system that is easy to use as a man-machine interface can be constructed. Furthermore, even if the voice has significant mistouching, which would have been rejected in the conventional system, the above-mentioned embodiment has the advantage that the user can listen to the correct utterance through voice synthesis and practice his or her own utterance.

なお、上記実施例では、可視可聴表示によりマツチ指数
を使用者に通知しているが、これはシステムの構成条件
により、どちらかが省略されたものであってもよい。ま
た、使用者が認識可能な単語や言い方を忘れた場合、使
用者の要求（操作査）により、直ちにガイダンスおよび
標準音声を聞かせるようにしてもよい。In the above embodiment, the match index is notified to the user through a visual and audible display, but either of these may be omitted depending on the configuration conditions of the system. Further, if the user forgets a recognizable word or how to say it, the guidance and standard voice may be immediately played at the user's request (operation check).

発明の効果本発明は上記実施例より明らかなように、マツチング指
数を使用者に通知することができるので、認識結果の出
力だけのシステムよりも、使用者は安心し納得して使用
することができるという利点を有する。そして発声練習
モードにすれば、標準発声を間きがら訓練できるので、
より望ましい発声をすることができ、認識率がより向上
するという利点を有する。Effects of the Invention As is clear from the above embodiments, the present invention can notify the user of the matching index, so the user can use it with peace of mind and satisfaction compared to a system that only outputs recognition results. It has the advantage of being able to If you switch to vocal practice mode, you can practice standard vocalizations in between,
This has the advantage that more desirable utterances can be made and the recognition rate is further improved.

また、本発明によれば、認識可能な単語や発声を忘れた
場合でも、標準発声を聞くことにより、簡単にそれを思
い出すことができ、正しい発声をすることができる。そ
してこれらのことより使用者が自然に発声を装置にとワ
で望ましいものに変えてくれることになるので、標準パ
ターンのデータベースによる分析量や標準パターンの数
を削減することが可能となり、著しいコストの削減を図
ることができるという利点を有する。Furthermore, according to the present invention, even if the user forgets a recognizable word or utterance, by listening to the standard utterance, the user can easily remember it and pronounce it correctly. As a result, the user can naturally change his/her vocalizations to the desired one using the device, making it possible to reduce the amount of analysis using the standard pattern database and the number of standard patterns, resulting in significant cost savings. This has the advantage that it is possible to reduce the amount of

【図面の簡単な説明】[Brief explanation of drawings]

第１図は本発明の一実施例における音声認識装置のブロ
ック図、第２図は従来の音声認識装置のブロック図であ
る。１・・・・・・音声分析部、２・・・・・・距離計算部
、３・・・・・・標準パターンメモリ部、４・・・・・
・マツチング部、５・・・・・・単語判定部、６・・・
・・・マツチ指数計算部、７・・・・・・可視表示部、
８・・・・・・音声合成部、９・・・・・・ガイダンス
音声パターンメモリ部、１０・・・・・・マツチ指数パ
ターンメモリ部、１１・・・・・・標準発声パターンメ
モリ部。FIG. 1 is a block diagram of a speech recognition device according to an embodiment of the present invention, and FIG. 2 is a block diagram of a conventional speech recognition device. 1... Voice analysis section, 2... Distance calculation section, 3... Standard pattern memory section, 4...
・Matching section, 5... Word judgment section, 6...
...Matsuchi index calculation part, 7...Visible display part,
8...Speech synthesis unit, 9...Guidance voice pattern memory unit, 10...Match exponent pattern memory unit, 11...Standard vocal pattern memory unit.

Claims

【特許請求の範囲】[Claims]

音声入力を分析し、あらかじめデータとして蓄えられて
いる標準パターンとの距離計算を行なう距離計算部とこ
の距離計算されたパターンと上記標準パターンとのマッ
チングを行ない、その認識結果を出力する単語判定部と
上記マッチングの度合を指数計算し、出力するマッチン
グ指数計算部と、上記マッチング指数を可視表示する表
示部と、さらに使用者への音声による使用法ガイダンス
、マッチ指数および標準発声生パターンによる音声出力
等の機能を有する音声合成部を有し、使用者に対し上記
表示部でマッチング指数を可視または可聴的に通知する
と共にミスマッチングが著しい場合には、標準発声パタ
ーンによる音声合成部を作動させて、使用者に標準音声
を聞かせ、さらに使用者の発声に対してマッチング指数
を再出力するように構成した音声認識装置。A distance calculation unit that analyzes voice input and calculates the distance to a standard pattern stored in advance as data, and a word judgment unit that matches this distance-calculated pattern with the standard pattern and outputs the recognition result. a matching index calculation unit that calculates and outputs the index of the degree of matching; a display unit that visually displays the matching index; and a display unit that visually displays the matching index, and further provides audio guidance to the user on how to use the matching index, and audio output based on the matching index and standard vocalization pattern. It notifies the user of the matching index visually or audibly on the above-mentioned display section, and if there is significant mismatching, it activates the speech synthesis section using a standard vocal pattern. , a speech recognition device configured to allow a user to hear a standard voice and further output a matching index in response to the user's utterances.