JP3384165B2

JP3384165B2 - Voice recognition device

Info

Publication number: JP3384165B2
Application number: JP01486495A
Authority: JP
Inventors: 照博後藤
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 1995-02-01
Filing date: 1995-02-01
Publication date: 2003-03-10
Anticipated expiration: 2018-03-10
Also published as: JPH08211892A

Description

【発明の詳細な説明】【０００１】【産業上の利用分野】本発明は、話者の発声音を言葉と
して認識する音声認識装置、特に誤認識の検出に関す
る。【０００２】【従来の技術】従来より、マンマシンインターフェース
の非常に優れた（操作者にとって負担が少ない）ものと
して、音声認識を利用したものが考えられている。例え
ば、各種の操作を操作者の発声によって指示できれば、
非常に操作性がよい。そこで、従来より、音声認識につ
いての各種の研究がなされ、各種音声認識装置について
の提案もある。【０００３】例えば、自動車などの車両において、オー
ディオ、エアコンディショナー、ナビゲーションシステ
ム等の機器を音声によって制御できれば、非常に便利で
あり、かつ運転者の負担を軽減できる。このためには、
発声音を音声認識装置によい制御のための言葉と認識
し、認識結果に基づいて、機器を制御すればよい。【０００４】しかし、音声認識は、その入力音声が雑音
を含まず、かつ発音明瞭な発音であれば、かなりの精度
で正しい認識が行われるが、走行中の車両では入力音声
がエンジン音等の雑音を含む可能性が高い。特に、アク
セルやブレーキを強く踏み込んだり、ギアチェンジをす
ると、車両内の雑音レベルが大きく変化する可能性が高
い。【０００５】そこで、特開平４ー２４６９６号公報にお
いては、車両のアクセルブレーキなどの運転操作状態を
検出する手段を設け、車両内の雑音が大きく変化すると
考えられる操作を検出した場合に、音声認識装置にこれ
についての信号を送る。そして、この場合には、音声認
識装置への音声信号の入力をキャンセルして誤認識を防
止する。このような装置によって、運転操作に起因する
車内騒音の増加に伴う誤認識を低減することができる。【０００６】【発明が解決しようとする課題】しかし、上記従来の音
声認識装置においては、通常走行時におけるエンジン音
その他の雑音が考慮されていない。音声認識における正
しい認識の確率を上昇するためには、できるだけ多くの
雑音を排除することが望ましい。また、自動車の場合、
音声認識の対象は、運転者の発声である場合が多い。運
転者は、アクセル、ブレーキ、ステアリング操作などの
各種の操作を行っており、その操作状態によっては、十
分明確な発声が行えない場合もある。例えば、発声の途
中で急ブレーキをかけたときなどは、その発声音声は正
しい認識を行えないものである場合が多い。従って、こ
のような場合も考慮して音声認識を行うことが望まれ
る。【０００７】本発明は、上記課題に鑑みなされたもので
あり、雑音や話者の発声状態を考慮してより正しい認識
が行える音声認識装置を提供することを目的とする。【０００８】【課題を解決するための手段】本発明は、話者の発声音
声から音声信号を発生する音声入力部と、前記音声入力
部から入力された音声信号を所定の言葉として認識処理
する認識部と、車両の操作状況を検出して、話者が正常
な発声を行える状態にあるか否かという話者の発声状態
を検出する発声状態検出部と、発声状態検出部における
検出結果に基づき、話者の誤発声レベルを判断する発声
状態判断部と、前記認識部での認識結果における一致度
と前記発声状態判断部における誤発声レベルに基づき認
識部における認識結果についての誤認識の確率を判断す
る認識結果判断部と、を含み、前記認識結果判断部が誤
認識が高いと判断したときにはその時の認識部の出力を
キャンセルすることを特徴とする。【０００９】【００１０】【作用】本発明によれば、認識部において上述の場合と
同様の普通の音声認識を行う。そして、発声状態検出部
において話者の発声の状態を検出する。例えば、高速で
カーブしているときであり、運転者が緊張しているとき
には、声がうわずって、正常なときの音声とは異なる場
合もある。このようなことを発声状態検出部において検
出し、この検出結果に応じて正常な発声が行われないと
判断された場合には、その時に認識部における認識結果
をキャンセルする。【００１１】このようにして、本出願の発明によって、
誤認識の発声を効果的に防止して効率的な音声認識が行
える。【００１２】【実施例】以下、本発明の実施例について、図面に基づ
いて説明する。【００１３】「実施例」図１は、実施例の全体構成を示すブロック図であり、マ
イクロフォン３０には、認識部３２が接続されており、
マイクロフォン３０で収集された音声が音声信号として
認識部３２に供給される。認識部３２には、認識辞書３
４が接続されており、認識部３２は、入力音声を後述の
主認識部１２と同様に音声認識する。そして、認識した
言葉についてのデータとその距離が認識結果として、認
識結果判断部３６に供給される。認識結果判断部３６に
は、発声状態判断部３８が接続されており、認識結果判
断部３６は発声状態判断部３８からの情報に基づき認識
部３２における認識結果の信頼性を判定する。そして、
認識結果判断部３６における判断結果において信頼性が
高いとされた場合には、認識部３２における認識結果を
そのまま出力する。一方、信頼性が低いと判断された場
合には、認識部３２における認識結果を無効として、音
声の再入力を促す。【００１４】そして、本実施例では、発声状態判断部３
８に各種センサからの信号が供給されている。すなわ
ち、ステアリングセンサ４０からの操舵情報、アクセル
センサ４２からのアクセル操作状況についての情報、ク
ラッチセンサ４４からのクラッチ操作状況についての情
報、ブレーキセンサ４６からのブレーキ操作状況につい
ての情報、シフトレバーセンサ４８からのシフトレバー
の操作状況についての情報、コンビスイッチセンサ５０
からのワイパーやウインカーの操作状況についての情
報、シートセンサ５２からのシート位置を移動させてい
るかの情報、車速センサ５４からの車両の速度、加速度
についての情報、心拍センサ５６からの運転者の心拍数
についての情報等が発声状態判断部３８に供給される。
なお、運転者の瞬きの状態なども、運転者の緊張度と関
係があるためこれを検出するとよい。【００１５】発声状態判断部３８は、各種センサから供
給される情報に基づいて、発声者のの状態を判定する。
例えば、急加速、急ブレーキ、急ハンドルなどの運転状
態では、運転者は通常通りの発声はできないと考えられ
る。また、シフトチェンジ時、ウインカー操作時、シー
ト移動操作時、高速走行時、高心拍数時等にも発声が正
常でなくなる確率が高い。発声状態判断部３８は、これ
らの情報に基づいて、発声者が正しく発声したかを判定
する。この判定は、例えば、ファジー推論によったり、
各種センサからの出力状態に応じたマップを予め作成し
ておきこれに基づいて行う。そして、発声状態の判断結
果が認識結果判断部に送られ、認識結果判断部３６が認
識部３２の信頼性を発声状態判断部３８からの情報に基
づいて判定する。すなわち、認識部３２から供給される
認識結果における距離がある程度大きい場合であって、
発声状態がある程度以上正常でない場合には、認識結果
が信頼できないとして、この出力を無効にする。一方、
発声状態が正常であれば、認識結果をそのまま出力す
る。【００１６】また、発声状態によって、声の質が異なる
場合も多い。例えば、緊張している場合には、声が上ず
ることが多い。そこで、このような状態に応じた複数の
辞書を用意しておくことが好適である。すなわち、認識
辞書３４内に、正常状態の時に使用する辞書の他に、緊
張した状態における標準音声についてのデータを記憶す
る辞書を設けておき、発声状態判断部３８の判断結果に
基づき、使用する辞書を切り換えることで、音声認識の
精度を向上することができる。【００１７】さらに、認識が正常に行えなかった場合
に、再度の入力を促し、入力された音声を分析した結果
から１回目の判定結果を評価し、この評価内容を学習し
ていき、認識結果判断部３６の判断に反映させてもよ
い。学習によって、辞書を書き換えることも好適であ
る。【００１８】「関連構成例」図２は、本発明に関連する雑音低減のための構成例の全
体構成を示すブロック図であり、運転者などの発声音声
を電気信号である音声信号に変換するマイクロフォン１
０を音声入力部として有している。マイクロフォン１０
から出力された音声信号は、主認識部１２に入力され
る。この主認識部１２には、認識辞書１４が接続されて
おり、主認識部１２は、認識辞書１４を利用して、入力
される音声信号の音声認識を行う。そして、主認識部１
２は、主認識結果を出力する。一方、音声信号は副認識
部１６にも入力される。この副認識部１６には、リジェ
クト辞書１８が接続されており、副認識部１６はリジェ
クト辞書１８を利用して、入力される音声信号の音声認
識を行う。そして、副認識部１６は、副認識結果を出力
する。【００１９】主認識部１２および副認識部１６は、認識
結果判断部２０が接続されており、この主認識結果およ
び副認識結果はこの認識結果判断部２０に供給される。
そして、認識結果判断部２０は、主、副認識結果から主
認識結果の信頼性を判定し、主認識結果を出力するか否
かを判定する。そして、信頼性が所定以上であった場合
には、主認識結果を出力し、信頼性が所定以下であった
場合には、主認識結果をキャンセルし、音声信号の再度
の入力を求める。【００２０】ここで、主認識部１２における音声認識処
理について説明する。主認識部１２に接続されている認
識辞書１４には、認識の対象となっている言葉（例え
ば、「エアコン」「ラジオ」「オン」「オフ」等の言
葉）についての標準音声についてのデータが記憶されて
いる。このデータの形式としては各種のものが考えられ
るが、例えば、線形予測コーディング（ＬＰＣ）による
係数列（ＬＰＣケプストラム）のように、音声信号の周
波数特性を表す係数群を数値として持つことが好適であ
る。なお、車両の場合、その車両を使用する人間はある
程度限られる場合が多い。そこで、認識辞書１４に記憶
するデータは、その特定人の音声から作成したものにす
るとよい。例えば、使用開始当初に、利用者に認識対象
となっている言葉について発声させ、これに基づいて認
識辞書１４のデータを作成する。また、この場合利用者
は複数でもよい。【００２１】そして、主認識部１２では、入力される音
声信号からＬＰＣケプストラムを得ると共に、得られた
ＬＰＣケプストラムと認識辞書に記憶されている各単語
のＬＰＣケプストラムとの差を距離として算出し、最も
近いものを選択する。そして、選択された言葉について
のデータとその距離を主認識部１２における認識結果と
する。なお、ここで算出される距離は、選択された言葉
と入力音声の一致度が高いほど小さくなる。また、距離
が最短のものでもその距離が所定値以上であった場合に
は、認識不能とするとよい。【００２２】一方、副認識部１６に接続されているリジ
ェクト辞書１８には、「えー」、「あのー」、くしゃみ
の音、ゴットンという音等、マイクロフォン１０から入
力されると考えられる認識辞書１４にない（認識対象で
ない）音についてのデータ（ＬＰＣケプストラム）が記
憶されている。そして、副認識部１６は、入力されてく
る音声信号のＬＰＣケプストラムを算出すると共に、こ
れをリジェクト辞書１８に記憶されているデータとの距
離を演算算出し、最も近いもの選択する。そして、選択
された言葉についてのデータとその距離を主認識部１２
における認識結果とする。なお、入力音声信号について
のＬＰＣケプストラムの算出は、これのための算出部を
設け、算出結果を主認識部１２および副認識部１６に供
給するとよい。【００２３】そして、認識結果判断部２０は、主認識部
１２および副認識部１６から出力される距離を比較し、
副認識部１６において得た距離の方が小さいときには、
主認識部１２における認識結果を無効にする。そして、
スピーカを通じて、再度の発声を促す。一方、主認識部
１２において得た距離の方が小さいときには、主認識部
１２において得た認識結果を採用し、これを出力する。
例えば、認識結果がエアコンをオンする指令であれば、
この認識結果に基づきエアコンをオンする。【００２４】また、副認識部１６において、「えー」、
「あのー」、くしゃみの音等通常の音声に追加されるよ
うな言葉については、入力音声における該当区間を検出
するとよい。そして、追加された言葉について認識でき
たときには、このデータを主認識部１２に伝える。主認
識部１２は、このデータに基づいて、入力音声の中から
のこの区間を除外して、音声認識処理を行う。これによ
って、主認識部１２における音声認識の信頼性を大幅に
上昇することができる。このような処理は、副認識部１
６から主認識部１２に直接信号を供給して行ってもよい
し、認識結果判断部２０が主副認識部１２、１６の認識
結果から判定し、主認識部１２に音声区間の再設定の指
令を送り、主認識部１２が音声区間を再設定して、音声
認識をやり直してもよい。【００２５】【発明の効果】以上説明したように、本発明によれば、
発声状態検出部において、話者の状態を検出し、この検
出結果に応じて正常な発声が行われないと判断された場
合には、その時に認識部における認識結果をキャンセル
する。これによって、誤認識の発声を効果的に防止する
ことができる。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus for recognizing a utterance of a speaker as a word, and more particularly to detection of erroneous recognition. 2. Description of the Related Art Heretofore, a man-machine interface which utilizes voice recognition has been considered as a very excellent man-machine interface (with little burden on an operator). For example, if various operations can be instructed by the voice of the operator,
Very operable. Therefore, various studies on speech recognition have been conventionally performed, and there have been proposals for various speech recognition devices. For example, in a vehicle such as an automobile, if devices such as an audio system, an air conditioner, and a navigation system can be controlled by voice, it is very convenient and the burden on the driver can be reduced. To do this,
What is necessary is just to recognize the utterance sound as a word for good control by the voice recognition device, and control the device based on the recognition result. [0004] However, in speech recognition, if the input speech does not contain noise and the pronunciation is clear, correct recognition can be performed with considerable accuracy. It is likely to contain noise. In particular, when the accelerator or brake is depressed strongly or the gear is changed, the noise level in the vehicle is likely to change significantly. In Japanese Patent Application Laid-Open No. Hei 4-24696, means for detecting a driving operation state such as an accelerator brake of a vehicle is provided, and when an operation which is considered to greatly change noise in the vehicle is detected, voice recognition is performed. Send a signal about this to the device. In this case, the input of the voice signal to the voice recognition device is canceled to prevent erroneous recognition. With such a device, it is possible to reduce erroneous recognition due to an increase in vehicle interior noise caused by a driving operation. [0006] However, in the above-mentioned conventional voice recognition device, engine noise and other noises during normal running are not taken into account. In order to increase the probability of correct recognition in speech recognition, it is desirable to eliminate as much noise as possible. In the case of a car,
In many cases, the target of voice recognition is a driver's utterance. The driver performs various operations such as accelerator operation, brake operation, steering operation, and the like, and depending on the operation state, there is a case where sufficiently clear speech cannot be performed. For example, when sudden braking is applied in the middle of utterance, the uttered voice often cannot be correctly recognized. Therefore, it is desired to perform speech recognition in consideration of such a case. SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and has as its object to provide a speech recognition apparatus capable of performing more accurate recognition in consideration of noise and the utterance state of a speaker. According to the present invention, there is provided a voice input unit for generating a voice signal from a voice uttered by a speaker, and a voice signal input from the voice input unit is recognized as a predetermined word. A recognition unit, a voice state detection unit that detects an operation state of the vehicle, and detects a voice state of the speaker as to whether or not the speaker is in a state where normal voice can be generated; and a detection result of the voice state detection unit. An utterance state determination unit that determines an erroneous utterance level of the speaker based on the degree of coincidence in the recognition result in the recognition unit and a probability of erroneous recognition of the recognition result in the recognition unit based on the erroneous utterance level in the utterance state determination unit. And a recognition result judging unit for judging the error. When the recognition result judging unit judges that the erroneous recognition is high, the output of the recognition unit at that time is canceled. According to the present invention, the ordinary voice recognition similar to that described above is performed in the recognition unit. Then, the utterance state detection unit detects the utterance state of the speaker. For example, when the vehicle is turning at a high speed, and the driver is nervous, the voice may be muted and different from the normal voice. Such a situation is detected by the utterance state detection unit, and if it is determined that normal utterance is not performed according to the detection result, the recognition result in the recognition unit is canceled at that time. Thus, according to the invention of the present application,
Efficient speech recognition can be performed by effectively preventing erroneous recognition. Embodiments of the present invention will be described below with reference to the drawings. [Embodiment] FIG. 1 is a block diagram showing an overall configuration of an embodiment. A recognition unit 32 is connected to a microphone 30.
The voice collected by the microphone 30 is supplied to the recognition unit 32 as a voice signal. The recognition unit 32 includes a recognition dictionary 3
4 is connected, and the recognition unit 32 performs voice recognition of the input voice in the same manner as the main recognition unit 12 described later. Then, data on the recognized word and its distance are supplied to the recognition result determination unit 36 as a recognition result. The utterance state determination unit 38 is connected to the recognition result determination unit 36, and the recognition result determination unit 36 determines the reliability of the recognition result in the recognition unit 32 based on information from the utterance state determination unit 38. And
If the result of the determination by the recognition result determination unit 36 is determined to be highly reliable, the result of the recognition by the recognition unit 32 is output as it is. On the other hand, when it is determined that the reliability is low, the recognition result in the recognition unit 32 is invalidated, and the voice input is prompted again. In this embodiment, the utterance state determination unit 3
8 are supplied with signals from various sensors. That is, the steering information from the steering sensor 40, the information about the accelerator operation state from the accelerator sensor 42, the information about the clutch operation state from the clutch sensor 44, the information about the brake operation state from the brake sensor 46, the shift lever sensor 48 About the operation status of the shift lever from the user, the combination switch sensor 50
From the seat sensor 52, information on whether the seat position is being moved, information on the vehicle speed and acceleration from the vehicle speed sensor 54, and information on the driver's heart rate from the heart rate sensor 56. Information about the number and the like are supplied to the utterance state determination unit 38.
It should be noted that the blinking state of the driver and the like are related to the driver's degree of tension, so that it may be detected. The uttering state determining section 38 determines the state of the utterer based on information supplied from various sensors.
For example, in a driving state such as sudden acceleration, sudden braking, and sudden steering, it is considered that the driver cannot speak normally. Also, there is a high probability that the utterance will not be normal even at the time of a shift change, at the time of a turn signal operation, at the time of a seat moving operation, at the time of high-speed running, at the time of high heart rate, or the like. The utterance state determination unit 38 determines whether the utterer has uttered correctly based on these pieces of information. This determination may be based on, for example, fuzzy inference,
A map is prepared in advance according to the output state from various sensors, and the map is created based on the map. Then, the determination result of the utterance state is sent to the recognition result determination unit, and the recognition result determination unit 36 determines the reliability of the recognition unit 32 based on the information from the utterance state determination unit 38. That is, when the distance in the recognition result supplied from the recognition unit 32 is large to some extent,
If the utterance state is not normal to some extent, the output is invalidated because the recognition result is not reliable. on the other hand,
If the utterance state is normal, the recognition result is output as it is. In many cases, the voice quality differs depending on the utterance state. For example, when nervous, the voice often rises. Therefore, it is preferable to prepare a plurality of dictionaries corresponding to such a state. That is, in the recognition dictionary 34, in addition to the dictionary used in the normal state, a dictionary for storing data on the standard voice in the tense state is provided, and the dictionary is used based on the determination result of the utterance state determination unit 38. By switching the dictionary, the accuracy of speech recognition can be improved. Further, when the recognition is not performed normally, the user is prompted to input again, the first judgment result is evaluated from the result of analyzing the input voice, and the evaluation contents are learned. The determination may be reflected in the determination of the determination unit 36. It is also preferable to rewrite the dictionary by learning. [Related Configuration Example] FIG. 2 is a block diagram showing the overall configuration of a configuration example for noise reduction related to the present invention, and converts a voice uttered by a driver or the like into a voice signal which is an electric signal. Microphone 1
0 is provided as a voice input unit. Microphone 10
Are output to the main recognition unit 12. A recognition dictionary 14 is connected to the main recognition unit 12, and the main recognition unit 12 uses the recognition dictionary 14 to perform voice recognition of an input voice signal. And the main recognition unit 1
2 outputs a main recognition result. On the other hand, the audio signal is also input to the sub-recognition unit 16. A reject dictionary 18 is connected to the sub-recognition unit 16, and the sub-recognition unit 16 uses the reject dictionary 18 to perform voice recognition of an input voice signal. Then, the sub recognition unit 16 outputs a sub recognition result. The main recognition unit 12 and the sub-recognition unit 16 are connected to a recognition result judgment unit 20, and the main recognition result and the sub-recognition result are supplied to the recognition result judgment unit 20.
Then, the recognition result determination unit 20 determines the reliability of the main recognition result from the main and sub recognition results, and determines whether to output the main recognition result. If the reliability is equal to or higher than a predetermined value, the main recognition result is output. If the reliability is equal to or lower than the predetermined value, the main recognition result is canceled and the input of the audio signal is requested again. Here, the speech recognition processing in the main recognition unit 12 will be described. The recognition dictionary 14 connected to the main recognition unit 12 stores data on standard speech for words to be recognized (for example, words such as “air conditioner”, “radio”, “on”, and “off”). It is remembered. Various types of data can be considered as the format of the data. For example, it is preferable to have a coefficient group representing the frequency characteristic of the audio signal as a numerical value, such as a coefficient sequence (LPC cepstrum) by linear predictive coding (LPC). is there. In the case of a vehicle, the number of people using the vehicle is often limited to some extent. Therefore, the data stored in the recognition dictionary 14 may be created from the voice of the specific person. For example, at the beginning of use, the user utters a word to be recognized, and the data of the recognition dictionary 14 is created based on this. In this case, there may be a plurality of users. The main recognition unit 12 obtains an LPC cepstrum from the input speech signal, and calculates a difference between the obtained LPC cepstrum and the LPC cepstrum of each word stored in the recognition dictionary as a distance. Select the closest one. Then, the data on the selected word and its distance are used as the recognition result in the main recognition unit 12. The distance calculated here becomes smaller as the degree of coincidence between the selected word and the input voice becomes higher. Further, even if the distance is the shortest, if the distance is equal to or more than a predetermined value, it may be determined that recognition is not possible. On the other hand, the reject dictionary 18 connected to the sub-recognition unit 16 includes the recognition dictionary 14 which is considered to be input from the microphone 10 such as "er", "a", sneezing sound, and screaming sound. Data (LPC cepstrum) about a sound that is not present (not a recognition target) is stored. Then, the sub-recognition unit 16 calculates the LPC cepstrum of the input audio signal, calculates the distance from the data stored in the reject dictionary 18, and selects the closest one. Then, the data of the selected word and its distance are determined by the main recognition unit 12.
And the recognition result. It should be noted that a calculation unit for calculating the LPC cepstrum for the input audio signal may be provided, and the calculation result may be supplied to the main recognition unit 12 and the sub recognition unit 16. Then, the recognition result determination unit 20 compares the distances output from the main recognition unit 12 and the sub recognition unit 16 and
When the distance obtained by the sub-recognition unit 16 is smaller,
The recognition result in the main recognition unit 12 is invalidated. And
Prompt re-utterance through the speaker. On the other hand, when the distance obtained by the main recognition unit 12 is smaller, the recognition result obtained by the main recognition unit 12 is adopted and output.
For example, if the recognition result is a command to turn on the air conditioner,
The air conditioner is turned on based on the recognition result. In the sub-recognition section 16, "er",
For words that are added to the normal voice, such as "Ah," or a sneezing sound, the corresponding section in the input voice may be detected. Then, when the added word can be recognized, this data is transmitted to the main recognition unit 12. The main recognition unit 12 performs a voice recognition process by excluding this section from the input voice based on the data. Thereby, the reliability of the voice recognition in the main recognition unit 12 can be greatly increased. Such processing is performed by the sub-recognition unit 1
6 may directly supply a signal to the main recognition unit 12, or the recognition result determination unit 20 may make a determination based on the recognition results of the main and sub recognition units 12 and 16, and may use the main recognition unit 12 to reset the voice section. A command may be sent, and the main recognition unit 12 may reset the voice section and perform voice recognition again. As described above, according to the present invention,
The utterance state detection unit detects the state of the speaker, and if it is determined that normal utterance is not performed according to the detection result, the recognition result in the recognition unit is canceled at that time. As a result, erroneous recognition can be effectively prevented.

【図面の簡単な説明】【図１】実施例の構成を示すブロック図である。【図２】関連構成例の構成を示すブロック図である。【符号の説明】１０マイクロフォン、１２主認識部、１４認識辞
書、１６副認識部、１８リジェクト辞書、２０認
識結果判断部、３０マイクロフォン、３２認識部、３
４認識辞書、３６認識結果判断部、３８発声状態
判断部。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing a configuration of an embodiment. FIG. 2 is a block diagram illustrating a configuration of a related configuration example. [Description of Signs] 10 microphones, 12 main recognition units, 14 recognition dictionaries, 16 secondary recognition units, 18 reject dictionaries, 20 recognition result determination units, 30 microphones, 32 recognition units, 3
4 recognition dictionary, 36 recognition result determination unit, 38 utterance state determination unit.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩＧ１０Ｌ 3/00 ５６１Ａ (56)参考文献特開平１−158498（ＪＰ，Ａ) 特開昭61−57996（ＪＰ，Ａ) 特開昭63−132300（ＪＰ，Ａ) 特開昭58−76893（ＪＰ，Ａ) 特開昭63−191198（ＪＰ，Ａ) 特開平３−129400（ＪＰ，Ａ) 特開平２−308299（ＪＰ，Ａ) 特開平５−119792（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/00 G10L 15/28 ──────────────────────────────────────────────────続き Continuation of the front page (51) Int.Cl. ⁷ Identification symbol Fig10L 3/00 561A (56) References JP-A-1-158498 (JP, A) JP-A-61-57996 (JP, A) JP-A-63-132300 (JP, A) JP-A-58-76893 (JP, A) JP-A-63-191198 (JP, A) JP-A-3-129400 (JP, A) JP-A-2-308299 (JP, A) JP-A-5-119792 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 15/00 G10L 15/28

Claims

(57)【特許請求の範囲】【請求項１】話者の発声音声から音声信号を発生する
音声入力部と、前記音声入力部から入力された音声信号を所定の言葉と
して認識処理する認識部と、車両の操作状況を検出して、話者が正常な発声を行える
状態にあるか否かという話者の発声状態を検出する発声
状態検出部と、発声状態検出部における検出結果に基づき、話者の誤発
声レベルを判断する発声状態判断部と、前記認識部での認識結果における一致度と前記発声状態
判断部における誤発声レベルに基づき認識部における認
識結果についての誤認識の確率を判断する認識結果判断
部と、を含み、前記認識結果判断部が誤認識が高いと判断したときには
その時の認識部の出力をキャンセルすることを特徴とす
る音声認識装置。(57) [Claims] 1. A voice input unit for generating a voice signal from a voice uttered by a speaker, and a recognition unit for recognizing and processing the voice signal input from the voice input unit as a predetermined word. And an utterance state detection unit that detects an operation state of the vehicle and detects whether or not the speaker is in a state where the speaker can perform normal utterance. An utterance state determination unit that determines a speaker's erroneous utterance level; and a probability of erroneous recognition of the recognition result in the recognition unit based on the degree of coincidence in the recognition result in the recognition unit and the erroneous utterance level in the utterance state determination unit. A speech recognition device, comprising: a recognition result judging unit that performs a recognition result judgment; and when the recognition result judging unit judges that misrecognition is high, the output of the recognition unit at that time is canceled.