JP2009210703A

JP2009210703A - Speech recognition device

Info

Publication number: JP2009210703A
Application number: JP2008051975A
Authority: JP
Inventors: Hiroyuki Sato; 浩之佐藤
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2008-03-03
Filing date: 2008-03-03
Publication date: 2009-09-17
Anticipated expiration: 2028-03-03
Also published as: JP5189858B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition device reducing discomfort and stress due to wrong recognition, and improving operability. <P>SOLUTION: The politeness degree of apology expression included in re-input request voice is enhanced as the number of failures in speech recognition increases by using a re-input request output means 12. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音声認識装置に係り、特に、発話音声に対する音声認識を行う音声認識装置に関する。 The present invention relates to a speech recognition apparatus, and more particularly to a speech recognition apparatus that performs speech recognition on uttered speech.

一般に、音声認識は、人の発話音声をコンピュータによって解析し、発話内容を文字データとして取り出すことによって、発話音声が表す言語を認識する処理として知られている。 In general, speech recognition is known as a process of recognizing a language represented by a speech by analyzing a speech of a person using a computer and extracting the content of the speech as character data.

音声認識の方法としては、例えば、大量の発話を記録した学習用データから音声の特徴を学習し、入力された発話音声とそれらの特徴と照らし合わせながら、最も尤もらしい言語系列を認識結果として出力する手法等が知られている。 As a speech recognition method, for example, the features of speech are learned from learning data in which a large amount of utterances are recorded, and the most likely language sequence is output as a recognition result while comparing the input speech with those features. There are known methods to do this.

このような音声認識を行う音声認識装置は、キーボード、リモコンまたはタッチパネル等に代わるコマンド入力手段として注目を集めており、パソコンや車載器等、その利用分野は多岐にわたっている。 A voice recognition apparatus that performs such voice recognition has been attracting attention as a command input means that replaces a keyboard, a remote controller, a touch panel, or the like, and has various fields of use such as a personal computer and an in-vehicle device.

特開２０００−１９３４６３号公報JP 2000-193463 A 特開平１１−３７７６６号公報JP-A-11-37766 特開２００１−１６６７９４号公報JP 2001-166794 A

しかしながら、現状における音声認識装置は、必ずしも発話音声に対する音声認識を正確に行うことができず、誤認識が生じる場合があった。 However, the current voice recognition apparatus cannot always perform voice recognition with respect to the uttered voice accurately, and erroneous recognition may occur.

そして、このような誤認識が複数回生じると、ユーザは、不快感が募ることになり、さらに、誤認識が複数回生じた上で最終的に音声認識に失敗する場合には、不快感は非常に大きなものとなる。 And if such misrecognition occurs multiple times, the user will feel uncomfortable, and further, if misrecognition occurs multiple times and ultimately speech recognition fails, discomfort will be It will be very big.

図３は、このような誤認識が生じた場合における現状の音声認識装置の動作例として、車載用ナビゲーション装置に適用される音声認識装置の動作例を示したものである。 FIG. 3 shows an operation example of the voice recognition device applied to the in-vehicle navigation device as an example of the operation of the current voice recognition device when such erroneous recognition occurs.

図３に示すように、まず、ステップ１（ＳＴ１）においては、音声認識装置側の発話によって、ユーザに対して住所の音声入力を促す。 As shown in FIG. 3, first, in step 1 (ST1), the user is prompted to input an address voice by utterance on the voice recognition device side.

次いで、ステップ２（ＳＴ２）においては、ユーザの発話により、音声認識装置に対して発話音声「トウキョウトシナガワク」を入力する。 Next, in step 2 (ST2), an utterance voice “Tokyo Shinagawa” is input to the voice recognition device by the user's utterance.

次いで、ステップ３（ＳＴ３）においては、音声認識装置側の発話によって、ステップ２（ＳＴ２）において入力された発話音声に対する音声認識の認識結果「ドウキョウトタイトウク」を出力した後に、「ピー」という音を発した上で次のコマンドの入力を促す。 Next, in Step 3 (ST3), after the speech recognition recognition result “Dark Tight” is output with respect to the uttered voice input in Step 2 (ST2) by utterance on the voice recognition device side, it is called “Pie”. Prompt for the next command after making a sound.

しかしながら、このステップ３（ＳＴ３）における認識結果は誤認識であるため、ユーザは、誤認識であることを音声認識装置に入力するために、続くステップ４（ＳＴ４）において、ボタン操作によって「戻る」のコマンドを選択する。 However, since the recognition result in step 3 (ST3) is misrecognition, the user “returns” by button operation in the subsequent step 4 (ST4) in order to input the recognition to the speech recognition apparatus. Select the command.

次いで、ステップ５（ＳＴ５）においては、音声認識装置側の発話によって、再びユーザに対して住所の音声入力を促す。 Next, in step 5 (ST5), the user is prompted to input the address again by utterance on the voice recognition device side.

次いで、ステップ６（ＳＴ６）においては、ユーザの発話により、音声認識装置に対して発話音声「トウキョウトシナガワク」を再び入力する。 Next, in step 6 (ST6), the speech “Tokyo Shinagawa” is input again to the speech recognition apparatus by the user's speech.

次いで、ステップ７（ＳＴ７）においては、音声認識装置側の発話によって、ステップ６（ＳＴ６）において入力された発話音声に対する音声認識の認識結果「ドウキョウトチュウオウク」を出力した後に、「ピー」という音を発した上で次のコマンドの入力を促す。 Next, in Step 7 (ST7), after the speech recognition recognition result “Dokutokuoku” for the uttered speech input in Step 6 (ST6) is output by the speech recognition apparatus side utterance, it is referred to as “Pie”. Prompt for the next command after making a sound.

しかしながら、このステップ７（ＳＴ７）における認識結果はまたしても誤認識であるため、ユーザは、誤認識であることを音声認識装置に入力するために、続くステップ８（ＳＴ８）において、「戻る」のコマンドを再び選択する。このとき、２度目の誤認識によってユーザは不快感が生じて苛々するであろう。 However, since the recognition result in step 7 (ST7) is erroneously recognized again, the user returns to “return” in subsequent step 8 (ST8) in order to input the recognition to the speech recognition apparatus. ”Command again. At this time, the second misrecognition causes the user to feel uncomfortable and frustrated.

次いで、ステップ９（ＳＴ９）においては、音声認識装置側の発話によって、再びユーザに対して住所の音声入力を促す。 Next, in step 9 (ST9), the user is prompted to input the address again by utterance on the voice recognition device side.

次いで、ステップ１０（ＳＴ１０）においては、ユーザの発話により、音声認識装置に対して発話音声「トウキョウトシナガワク」を再び入力する。 Next, in step 10 (ST10), the speech “Tokyo Shinagawa” is input again to the speech recognition apparatus by the user's speech.

次いで、ステップ１１（ＳＴ１１）においては、音声認識装置側の発話によって、ステップ１０（ＳＴ１０）において入力された発話音声に対する音声認識の認識結果「ドウキョウトシンジュクク」を出力した後に、「ピー」という音を発した上で次のコマンドの入力を促す。 Next, in step 11 (ST11), after the speech recognition recognition result “Dark Shinjuku” for the uttered speech input in step 10 (ST10) is output by the speech on the speech recognition apparatus side, it is called “Pie”. Prompt for the next command after making a sound.

しかしながら、このステップ１１（ＳＴ１１）における認識結果もまたしても誤認識であるため、ユーザは、誤認識であることを音声認識装置に入力するために、続くステップ１２（ＳＴ１２）において、「戻る」のコマンドを再び選択する。このとき、３度目の誤認識によってユーザの不快感はさらに大きくなるであろう。 However, since the recognition result in step 11 (ST11) is also erroneous recognition, the user returns to “Return” in subsequent step 12 (ST12) in order to input the recognition to the speech recognition apparatus. ”Command again. At this time, the user's discomfort will be further increased by the third misrecognition.

次いで、ステップ１３（ＳＴ１３）においては、音声認識装置側の発話により、音声認識に失敗した旨が通知されて、コマンドの再入力の要求はなされなくなる。これにより、ユーザの不快感は極めて大きなものになるであろう。 Next, in step 13 (ST13), the speech recognition apparatus side is notified that the speech recognition has failed, and the command re-input is not requested. Thereby, the user's discomfort will be extremely large.

このように、従来から、音声認識装置においては、誤認識が生じる度に、ユーザに不快感を与えるばかりでなく、このような不快感を与えたままユーザに発話音声の再入力を繰り返し強いることになり、操作上のストレスも与えてしまうといった問題が発生していた。 Thus, conventionally, in a speech recognition device, every time misrecognition occurs, not only does the user feel uncomfortable, but the user is repeatedly forced to re-input the spoken voice while giving such discomfort. As a result, there has been a problem of giving stress on operation.

そこで、本発明は、このような問題点に鑑みなされたものであり、誤認識にともなう不快感および操作上のストレスを軽減することができる音声認識装置を提供することを目的とするものである。 Therefore, the present invention has been made in view of such problems, and an object of the present invention is to provide a voice recognition device that can reduce discomfort and operational stress associated with erroneous recognition. .

前述した目的を達成するため、本発明に係る音声認識装置は、マイクを介して入力された発話音声が表す言語を認識する音声認識を行う音声認識手段と、この音声認識手段の認識結果を表す音声を、スピーカを介して出力する認識結果出力手段と、この認識結果出力手段によって出力された音声が表す前記認識結果が誤認識である旨の入力が可能とされた誤認識入力手段と、この誤認識入力手段による前記誤認識である旨の入力に応答して、前記発話音声の再入力を要求するための音声である再入力要求音声を、前記スピーカを介して出力する再入力要求出力手段と、前記誤認識が所定回数連続した場合に、前記音声認識に失敗したことを通知するための音声である失敗通知音声を、前記スピーカを介して出力する認識失敗通知手段とを備えた音声認識装置であって、前記再入力要求出力手段は、謝罪表現が含まれた前記再入力要求音声を出力するように形成されているとともに、前記誤認識の回数の増加にともなって、前記再入力要求音声に含まれる謝罪表現の丁寧さの度合いを高めるように形成されていることを特徴としている。 In order to achieve the above-mentioned object, a speech recognition apparatus according to the present invention represents speech recognition means for performing speech recognition for recognizing a language represented by a speech input through a microphone, and a recognition result of the speech recognition means. A recognition result output means for outputting a sound via a speaker, a misrecognition input means capable of inputting that the recognition result represented by the sound output by the recognition result output means is a false recognition, and Re-input request output means for outputting a re-input request voice, which is a voice for requesting re-input of the utterance voice, in response to an input indicating that the erroneous recognition is made by the erroneous recognition input means. And a recognition failure notification means for outputting a failure notification sound, which is a sound for notifying that the voice recognition has failed, when the erroneous recognition continues for a predetermined number of times via the speaker. In the voice recognition device, the re-input request output means is configured to output the re-input request voice including an apology expression, and the re-input request output means is configured to increase the number of times of erroneous recognition. It is characterized in that it is formed so as to increase the degree of politeness of the apology expression included in the input request voice.

そして、このような構成によれば、再入力要求出力手段により、音声認識の誤認識の回数の増加にともなって、再入力要求音声に含まれる謝罪表現の丁寧さの度合を高めることができるので、誤認識にともなう不快感およびストレスを軽減することができる。 According to such a configuration, the re-input request output means can increase the degree of politeness of the apology expression included in the re-input request voice as the number of erroneous recognitions of voice recognition increases. , Discomfort and stress associated with misrecognition can be reduced.

なお、本明細書において、不快感およびストレスの軽減は、これら不快感およびストレスの発生を未然に抑制するといった意味での軽減の場合のみならず、不快感およびストレスが一旦発生するが直ちに緩和もしくは解消されるといった意味での軽減の場合も含むものとする。換言すれば、音声認識のための一連のユーザ操作における不快感およびストレスの蓄積が、従来よりも軽減されるということである。 In this specification, discomfort and stress are not only reduced in the sense of suppressing the occurrence of discomfort and stress, but also once discomfort and stress are generated, Including the case of mitigation in the sense of being eliminated. In other words, discomfort and stress accumulation in a series of user operations for voice recognition are reduced as compared with the conventional case.

また、前記認識結果出力手段は、前記誤認識の回数の増加にともなって、前記認識結果を表す音声を出力する際の表現の丁寧さの度合を高めるように形成されていることが好ましい。 The recognition result output means is preferably formed so as to increase the degree of politeness of the expression when outputting the voice representing the recognition result as the number of erroneous recognitions increases.

そして、このような構成によれば、認識結果出力手段により、誤認識の回数の増加にともなって、前記認識結果を表す音声を出力する際の表現の丁寧さの度合を高めるようにすることができるので、誤認識にともなう不快感およびストレスをさらに有効に軽減することができる。 According to such a configuration, the recognition result output means increases the degree of politeness of the expression when outputting the speech representing the recognition result as the number of times of erroneous recognition increases. Therefore, discomfort and stress associated with misrecognition can be reduced more effectively.

さらに、前記認識失敗通知手段は、謝罪表現が含まれた前記失敗通知音声を出力するように形成されていることが好ましい。 Furthermore, it is preferable that the recognition failure notification means is configured to output the failure notification sound including an apology expression.

そして、このような構成によれば、認識失敗通知手段により、謝罪表現が含まれた失敗通知音声を出力することができるので、音声認識の失敗にともなう不快感を軽減することができる。 According to such a configuration, since the failure notification voice including the apology expression can be output by the recognition failure notification means, it is possible to reduce discomfort associated with the voice recognition failure.

さらにまた、音声認識装置本体の動作状態に応じて擬人化されたキャラクタの画像を表示部に表示するキャラクタ表示処理手段を備え、前記キャラクタ表示処理手段は、前記再入力要求音声の出力の際に、前記キャラクタの画像として、謝罪姿勢を呈するようなキャラクタの画像を表示するように形成されているとともに、前記誤認識の回数の増加にともなって、当該キャラクタの画像が呈する謝罪姿勢の丁寧さの度合を高めるように形成されていることが好ましい。 Furthermore, the image display device further comprises character display processing means for displaying an image of the personified character according to the operation state of the voice recognition apparatus main body on the display unit, and the character display processing means is configured to output the re-input request voice. The image of the character is displayed so as to display an apology posture as the character image, and the politeness of the apology posture displayed by the image of the character as the number of misrecognitions increases. It is preferably formed so as to increase the degree.

そして、このような構成によれば、キャラクタ表示処理手段により、誤認識の回数の増加にともなって、再入力要求音声の出力の際におけるキャラクタの画像が呈する謝罪姿勢の丁寧さの度合を高めるようにすることができるので、そのようなキャラクタの画像による癒し効果も手伝って、誤認識にともなう不快感およびストレスをさらに有効に軽減することができる。 According to such a configuration, the character display processing means increases the degree of politeness of the apology posture exhibited by the character image when outputting the re-input request voice as the number of erroneous recognition increases. Therefore, the healing effect by the image of such a character can also be helped, and the discomfort and stress accompanying misrecognition can be reduced more effectively.

また、前記認識結果出力手段は、前記誤認識の回数の増加にともなって、前記認識結果を表す音声を出力する際の表現の丁寧さの度合を高めるように形成され、前記キャラクタ表示処理手段は、前記認識結果を表す音声の出力の際に、前記キャラクタの画像として、前記誤認識の回数の増加にともなって、当該キャラクタの画像が呈する丁重姿勢の度合いが高まるようなキャラクタの画像を表示するように形成されていることが好ましい。 The recognition result output means is formed so as to increase the degree of politeness of the expression when outputting the voice representing the recognition result as the number of erroneous recognition increases, and the character display processing means When outputting sound representing the recognition result, an image of the character is displayed as the character image such that the degree of polite posture exhibited by the character image increases as the number of erroneous recognition increases. It is preferable to be formed as described above.

そして、このような構成によれば、キャラクタ表示処理手段により、誤認識の回数の増加にともなって、キャラクタの画像が呈する丁重姿勢の度合を高めることができるので、誤認識にともなう不快感およびストレスをより有効に軽減することができる。 According to such a configuration, the character display processing means can increase the degree of polite posture exhibited by the character image as the number of misrecognitions increases, so discomfort and stress associated with misrecognition. Can be reduced more effectively.

さらに、前記認識失敗通知手段は、謝罪表現が含まれた前記失敗通知音声を出力するように形成され、前記キャラクタ表示処理手段は、前記失敗通知音声の出力の際に、前記キャラクタの画像として、謝罪姿勢を呈するようなキャラクタの画像を表示するように形成されていることが好ましい。 Further, the recognition failure notification means is configured to output the failure notification sound including an apology expression, and the character display processing means is configured to output the failure notification sound as an image of the character. It is preferable to display an image of a character that exhibits an apology posture.

そして、このような構成によれば、キャラクタ表示処理手段により、失敗通知音声の出力の際に謝罪姿勢を呈するキャラクタの画像を表示することがきるので、音声認識の失敗にともなう不快感をさらに有効に軽減することができる。 According to such a configuration, the character display processing means can display an image of the character that shows an apology when outputting the failure notification voice, so that the discomfort associated with the voice recognition failure is further effective. Can be reduced.

さらにまた、本発明は、車載器に適用されることが好ましい。 Furthermore, the present invention is preferably applied to an on-vehicle device.

そして、このような構成によれば、車載器に適用する場合においても、誤認識にともなう不快感およびストレスの軽減を図ることができるので、ひいては、運転の安全性および快適性を向上させることができる。 According to such a configuration, even when applied to the vehicle-mounted device, discomfort and stress associated with misrecognition can be reduced, so that driving safety and comfort can be improved. it can.

本発明によれば、誤認識にともなう不快感および操作上のストレスを軽減することができる。 According to the present invention, discomfort and operational stress associated with misrecognition can be reduced.

以下、本発明に係る音声認識装置の実施形態について、図１および図２を参照して説明する。 Hereinafter, embodiments of a speech recognition apparatus according to the present invention will be described with reference to FIGS. 1 and 2.

図１は、本実施形態における音声認識装置１を示したものであり、この音声認識装置１は、マイク２および発話ボタン３を有している。 FIG. 1 shows a voice recognition device 1 according to this embodiment, and this voice recognition device 1 has a microphone 2 and an utterance button 3.

ユーザは、発話ボタン３を押し下げた後マイク２に向かって発話することによって、発話音声をマイク２を介して音声認識装置１の内部に入力することが可能とされている。 The user can input the uttered voice into the voice recognition apparatus 1 through the microphone 2 by speaking to the microphone 2 after pressing the utterance button 3.

また、本実施形態における音声認識装置１は、音声認識手段としての音声認識部５を有しており、この音声認識部５には、マイク２および発話ボタン３がそれぞれ接続されている。 The speech recognition apparatus 1 according to the present embodiment includes a speech recognition unit 5 as speech recognition means, and a microphone 2 and an utterance button 3 are connected to the speech recognition unit 5, respectively.

音声認識部５は、発話ボタン３が押し下げられると、音声入力待ち状態となり、マイク２を介してユーザの発話音声が音声認識部５に入力されるようになっている。 When the utterance button 3 is depressed, the voice recognition unit 5 enters a voice input waiting state, and the user's uttered voice is input to the voice recognition unit 5 via the microphone 2.

そして、音声認識部５は、入力された発話音声（音声データ）に対して、発話音声が表す言語を認識する音声認識を行うようになっている。この音声認識は、例えば、認識対象言語の文字列とその音声パターンとを対応付けた音響モデルを、音声辞書データベースにあらかじめ登録しておき、入力された発話音声から算出された特徴量と、音響モデルの特徴量とを比較して、類似度が最も高い音声パターンを検索し、その音声パターンに対応する文字列を発話音声が表す文字列であると認識することによって行うようにしてもよい。 The voice recognition unit 5 performs voice recognition for recognizing the language represented by the uttered voice with respect to the input uttered voice (voice data). In this speech recognition, for example, an acoustic model in which a character string of a recognition target language and its speech pattern are associated with each other is registered in advance in a speech dictionary database, and a feature amount calculated from input speech speech, It may be performed by comparing the feature amount of the model, searching for a speech pattern having the highest similarity, and recognizing that the character string corresponding to the speech pattern is a character string represented by the speech voice.

さらに、本実施形態における音声認識装置１は、認識結果出力手段としての認識結果出力部６を有しており、この認識結果出力部６には、音声認識部５が接続されている。また、認識結果出力部６には、音声合成部７を介してスピーカ８が接続されている。音声合成部７は、例えば、Text to Speech（ＴＴＳ）エンジンとされており、テキストベースの情報を入力してオーディオに変換して出力し、音声読み上げを行うようになっている。 Furthermore, the speech recognition apparatus 1 in the present embodiment has a recognition result output unit 6 as a recognition result output unit, and a speech recognition unit 5 is connected to the recognition result output unit 6. In addition, a speaker 8 is connected to the recognition result output unit 6 via a speech synthesis unit 7. The speech synthesizer 7 is, for example, a Text to Speech (TTS) engine, which inputs text-based information, converts it into audio, outputs it, and reads out the speech.

認識結果出力部６は、音声認識部５から、発話音声に対する音声認識の認識結果を取得するようになっている。そして、認識結果出力部６は取得された認識結果に対応する音声データ（以下、認識結果音声データと称する）を生成し、生成された認識結果音声データを、例えばＴＴＳエンジンを使って１文字分ずつ音声合成部７に出力するようになっている。音声合成部７は、認識結果出力部６から出力された認識結果音声データを文字列（単語）として例えばＴＴＳエンジンに入力して、これをスピーカ８を介して音声出力するようになっている。 The recognition result output unit 6 obtains a recognition result of speech recognition for the uttered speech from the speech recognition unit 5. Then, the recognition result output unit 6 generates voice data corresponding to the acquired recognition result (hereinafter referred to as recognition result voice data), and the generated recognition result voice data is converted into one character using, for example, a TTS engine. The information is output to the speech synthesizer 7 one by one. The voice synthesizer 7 inputs the recognition result voice data output from the recognition result output unit 6 as a character string (word), for example, to a TTS engine, and outputs the voice through the speaker 8.

このようにして、認識結果出力部６により、音声認識部５の認識結果を表す音声が、音声合成部７およびスピーカ８を介して音声出力（トークバック）されるようになっている。 In this way, the recognition result output unit 6 outputs the voice representing the recognition result of the voice recognition unit 5 through the voice synthesis unit 7 and the speaker 8 (talkback).

そして、ユーザは、この認識結果出力部６によって音声出力された音声を聴取することによって、この音声が表す言語が誤認識であるか否かを判断することができるようになっている。 The user can determine whether or not the language represented by the sound is erroneous recognition by listening to the sound output by the recognition result output unit 6.

さらにまた、本実施形態における音声認識装置１は、コマンド入力要求出力部１０を有しており、このコマンド入力要求出力部１０には、発話ボタン３、認識結果出力部６および音声合成部７がそれぞれ接続されている。 Furthermore, the speech recognition apparatus 1 according to the present embodiment includes a command input request output unit 10, and the command input request output unit 10 includes an utterance button 3, a recognition result output unit 6, and a speech synthesis unit 7. Each is connected.

コマンド入力要求出力部１０は、音声認識装置１の動作状態に応じて、ユーザにコマンドの入力を要求するための音声であるコマンド入力要求音声の音声データ（以下、コマンド入力要求音声データと称する）を生成し、生成されたコマンド入力要求音声データを音声合成部７に出力するようになっている。音声合成部７は、コマンド入力要求出力部１０から出力されたコマンド入力要求音声データに対応するコマンド入力要求音声を、スピーカ８を介して出力するようになっている。 The command input request output unit 10 generates voice data of a command input request voice that is a voice for requesting a user to input a command in accordance with the operation state of the voice recognition device 1 (hereinafter referred to as command input request voice data). And the generated command input request voice data is output to the voice synthesizer 7. The voice synthesizing unit 7 outputs a command input request voice corresponding to the command input request voice data output from the command input request output unit 10 via the speaker 8.

このようにして、コマンド入力要求出力部１０により、コマンド入力要求音声が、音声合成部７およびスピーカ８を介して音声出力されるようになっている。 In this way, the command input request output unit 10 outputs the command input request voice through the voice synthesis unit 7 and the speaker 8.

本実施形態において、コマンド入力要求出力部１０は、認識結果出力部６による認識結果の出力の際に、現段階における最新の入力済みコマンド（すなわち、当該認識結果に対応する発話音声）の次のコマンド（例えば、発話音声）の入力を要求する音声を出力するようにしてもよい。 In the present embodiment, when the recognition result output unit 6 outputs the recognition result, the command input request output unit 10 next to the latest input command at the current stage (that is, the utterance voice corresponding to the recognition result). You may make it output the audio | voice which requests | requires the input of a command (for example, speech audio | voice).

また、本実施形態における音声認識装置１は、誤認識入力手段としてのバックボタン１１を有している。ユーザは、認識結果出力部６によって出力された認識結果に対して、バックボタン１１を押し下げることによって、認識結果出力部６によって出力された音声が表す認識結果が誤認識である旨の入力（以下、誤認識入力と称する）が可能とされている。 Moreover, the speech recognition apparatus 1 in the present embodiment includes a back button 11 as an erroneous recognition input unit. The user depresses the back button 11 with respect to the recognition result output by the recognition result output unit 6 to input that the recognition result represented by the voice output by the recognition result output unit 6 is erroneous recognition (hereinafter referred to as “recognition result”). , Referred to as erroneous recognition input).

さらに、本実施形態において、コマンド入力要求出力部１０は、再入力要求出力手段としての再入力要求出力部１２を有しており、この再入力要求出力部１２には、バックボタン１１が接続されている。 Further, in this embodiment, the command input request output unit 10 has a reinput request output unit 12 as a reinput request output unit, and a back button 11 is connected to the reinput request output unit 12. ing.

この再入力要求出力部１２は、バックボタン１１による誤認識入力の入力結果を取得し、取得された入力結果に応答して、誤認識となった発話音声の再入力を要求する音声である再入力要求音声の音声データ（以下、再入力要求音声データと称する）を生成するようになっている。なお、本実施形態において、再入力要求音声は、コマンド入力要求音声の一態様とされ、再入力要求音声データは、コマンド入力要求音声データの一態様とされている。そして、再入力要求出力部１２は、生成された再入力要求音声データを音声合成部７に出力するようになっている。音声合成部７は、再入力要求出力部１２から出力された再入力要求音声データに対応する再入力要求音声を、スピーカ８を介して出力するようになっている。このようにして、再入力要求出力部１２により、誤認識入力に応答して、音声合成部７およびスピーカ８を介した再入力要求音声の出力を行うことができるようになっている。 The re-input request output unit 12 acquires the input result of the erroneous recognition input by the back button 11, and in response to the acquired input result, the re-input request output unit 12 is a voice requesting the re-input of the speech voice that has been erroneously recognized. The voice data of the input request voice (hereinafter referred to as re-input request voice data) is generated. In the present embodiment, the re-input request voice is an aspect of command input request voice, and the re-input request voice data is an aspect of command input request voice data. The re-input request output unit 12 outputs the generated re-input request voice data to the voice synthesizer 7. The voice synthesizer 7 outputs a re-input request voice corresponding to the re-input request voice data output from the re-input request output unit 12 via the speaker 8. In this manner, the re-input request output unit 12 can output the re-input request voice via the speech synthesizer 7 and the speaker 8 in response to the erroneous recognition input.

さらにまた、本実施形態における音声認識装置１は、誤認識回数計測部１４を有しており、この誤認識回数計測部１４には、バックボタン１１および再入力要求出力部１２がそれぞれ接続されている。この誤認識回数計測部１４は、バックボタン１１からの誤認識入力の入力回数に基づいて、誤認識の発生回数を計測するようになっている。 Furthermore, the speech recognition apparatus 1 according to the present embodiment has a misrecognition frequency measurement unit 14, and a back button 11 and a re-input request output unit 12 are connected to the misrecognition frequency measurement unit 14. Yes. The erroneous recognition frequency measuring unit 14 measures the number of erroneous recognition occurrences based on the number of erroneous recognition inputs from the back button 11.

また、本実施形態における音声認識装置１は、認識失敗通知手段としての認識失敗通知部１５を有しており、この認識失敗通知部１５には、誤認識回数計測部１４および音声合成部７がそれぞれ接続されている。 In addition, the speech recognition apparatus 1 according to the present embodiment includes a recognition failure notification unit 15 as a recognition failure notification unit. The recognition failure notification unit 15 includes an erroneous recognition frequency measurement unit 14 and a speech synthesis unit 7. Each is connected.

この認識失敗通知部１５は、誤認識回数計測部１４の計測結果を取得し、取得された測定結果に基づいて、誤認識が所定回数連続した場合に、音声認識に失敗したことを通知する音声である失敗通知音声の音声データ（以下、失敗通知音声データと称する）を生成し、生成された失敗通知音声データを音声合成部７に出力するようになっている。音声合成部７は、認識失敗通知部１５から出力された音声データに対応する失敗通知音声を、スピーカ８を介して出力するようになっている。このようにして、認識失敗通知部１５により、音声合成部７およびスピーカ８を介して失敗通知音声を出力することができるようになっている。 This recognition failure notification unit 15 acquires the measurement result of the erroneous recognition frequency measurement unit 14 and, based on the acquired measurement result, the voice for notifying that the speech recognition has failed when the erroneous recognition continues a predetermined number of times. The failure notification voice data (hereinafter referred to as failure notification voice data) is generated, and the generated failure notification voice data is output to the voice synthesizer 7. The voice synthesizing unit 7 outputs failure notification voice corresponding to the voice data output from the recognition failure notification unit 15 via the speaker 8. In this way, the recognition failure notification unit 15 can output the failure notification sound via the speech synthesis unit 7 and the speaker 8.

さらに、本実施形態における音声認識装置１は、キャラクタ表示処理手段としてのキャラクタ描画部１６を有しており、このキャラクタ描画部１６には、認識結果出力部６、コマンド入力要求出力部１０（再入力要求出力部１２を含む）、および、認識失敗通知部１５がそれぞれ接続されている。また、キャラクタ描画部１６には、表示処理部１８を介して表示部としてのディスプレイ１９が接続されている。 Furthermore, the speech recognition apparatus 1 according to the present embodiment includes a character drawing unit 16 as a character display processing unit. The character drawing unit 16 includes a recognition result output unit 6 and a command input request output unit 10 (re-input). An input request output unit 12) and a recognition failure notification unit 15 are connected to each other. In addition, a display 19 as a display unit is connected to the character drawing unit 16 via a display processing unit 18.

キャラクタ描画部１６は、音声認識装置１の動作状態に応じて、擬人化されたキャラクタの画像の描画データを生成し、生成された描画データを表示処理部１８に出力するようになっている。そして、表示処理部１８は、キャラクタ描画部１６から出力された描画データに対応するキャラクタの画像を、ディスプレイ１９に表示するようになっている。このようにして、キャラクタ描画部１６により、表示処理部１８を介してディスプレイ１９にキャラクタの画像を表示することができるようになっている。 The character drawing unit 16 generates drawing data of an anthropomorphic character image according to the operation state of the voice recognition device 1, and outputs the generated drawing data to the display processing unit 18. The display processing unit 18 displays a character image corresponding to the drawing data output from the character drawing unit 16 on the display 19. In this way, the character drawing unit 16 can display the character image on the display 19 via the display processing unit 18.

より具体的には、本実施形態において、キャラクタ描画部１６は、キャラクタの画像として、認識結果出力部６、コマンド入力要求出力部１０、再入力要求出力部１２、および、認識失敗通知部１５のそれぞれの音声出力による音声認識装置１側の発話動作の際に、発話動作に適合した表示状態を呈するキャラクタの画像を表示するようになっている。 More specifically, in the present embodiment, the character drawing unit 16 includes a recognition result output unit 6, a command input request output unit 10, a reinput request output unit 12, and a recognition failure notification unit 15 as character images. At the time of speech operation on the side of the speech recognition apparatus 1 by each speech output, an image of a character exhibiting a display state suitable for the speech operation is displayed.

なお、このようなキャラクタの画像としては、例えば、人物や動物等を模したキャラクタの画像で、音声認識装置１側の発話動作に連動してキャラクタの表情（口等）や身振り手振り等の表示状態が発話動作に適合した状態（動き）を示す画像を表示すればよい。 In addition, as an image of such a character, for example, an image of a character imitating a person, an animal, or the like, and display of the character's facial expression (mouth, etc.), gesture gesture, etc. in conjunction with the speech operation on the voice recognition device 1 side. What is necessary is just to display the image which shows the state (movement) in which the state adapted to speech operation | movement.

以上のような基本的な構成を備えた上で、本実施形態にける音声認識装置１は、再入力要求出力部１２が、謝罪表現が含まれた再入力要求音声を出力するように形成されているとともに、誤認識回数計測部１４によって測定される誤認識の回数の増加にともなって、再入力要求音声に含まれる謝罪表現の丁寧さの度合を高めるようになっている。 The voice recognition device 1 according to the present embodiment having the above basic configuration is configured such that the re-input request output unit 12 outputs a re-input request voice including an apology expression. At the same time, as the number of erroneous recognitions measured by the erroneous recognition frequency measuring unit 14 increases, the degree of politeness of the apology expression included in the re-input request voice is increased.

つまり、本実施形態においては、発話音声に対する誤認識の回数が増加するほど、誤認識入力に応答して、より丁寧な言葉遣いによる謝罪表現が含まれた再入力要求音声によって、発話音声の再入力が促されるようになっている。 That is, in this embodiment, as the number of misrecognitions for uttered speech increases, in response to misrecognition input, a re-input request speech including an apology expression with more polite language is used to regenerate the utterance speech. You are prompted for input.

具体的な例としては、再入力要求出力部１２は、住所を音声認識する場合における１回目の誤認識に対しては、「すみません、住所をお話下さい。」といった再入力要求音声を出力し、２回目の誤認識に対しては、「申し訳ございません。もう一度住所をお話下さい。」といった再入力要求音声を出力するようにしてもよい。 As a specific example, the re-input request output unit 12 outputs a re-input request voice such as “I'm sorry, please tell me your address” for the first misrecognition when recognizing the address as a voice, For the second misrecognition, a re-input request voice such as “I'm sorry. Please tell me your address again.” May be output.

また、再入力要求出力部１２は、謝罪表現の丁寧さの度合が異なる複数の再入力要求音声のパターンを、誤認識の回数と対応関係を有した状態としてデータベース化しておき、誤認識回数計測部１４から取得された計測結果に対応するパターンに該当する再生入力要求音声を出力するようにしてもよい。 Further, the re-input request output unit 12 creates a database of a plurality of re-input request voice patterns having different degrees of politeness of the apology expression as a state having a correspondence relationship with the number of times of erroneous recognition, and measures the number of times of erroneous recognition. The reproduction input request sound corresponding to the pattern corresponding to the measurement result acquired from the unit 14 may be output.

ここで、音声認識の誤認識が繰り返されれば、ユーザの不快感は徐々に高まっていくことが多い。しかし、誤認識が生じる度ごとに謝罪の言葉をかけられ、また、謝罪の言葉が誤認識の回数の増加にともなってより丁寧なものになれば、ユーザの不快感は軽減されるであろう。また、不快感が軽減された状態で発話音声の再入力を行えば、再入力の際のストレスも軽減されるであろう。 Here, if misrecognition of voice recognition is repeated, user discomfort often increases gradually. However, if the misrecognition occurs every time an apology is made and the apology becomes more polite as the number of misrecognitions increases, the user's discomfort will be reduced. . In addition, if the speech voice is re-input in a state where the discomfort is reduced, the stress at the time of re-input will also be reduced.

したがって、本実施形態によれば、誤認識が繰り返される場合においても、誤認識の回数の増加にともなってより丁寧な謝罪表現を用いた再入力要求を行うことができるので、誤認識によるユーザの不快感および発話音声の再入力にともなうユーザのストレスを軽減することができる。 Therefore, according to the present embodiment, even when misrecognition is repeated, a re-input request using a more polite apology can be made as the number of misrecognitions increases. The user's stress associated with discomfort and re-input of the spoken voice can be reduced.

上記構成に加えて、さらに、本実施形態においては、認識失敗通知部１５が、謝罪表現が含まれた失敗通知音声を出力するようになっている。 In addition to the above configuration, in the present embodiment, the recognition failure notification unit 15 outputs a failure notification voice including an apology expression.

具体的な例としては、認識失敗通知部１５は、「大変申し訳ございません。音声認識に失敗しました。」といった内容の失敗通知音声を出力するようにしてもよい。 As a specific example, the recognition failure notification unit 15 may output a failure notification sound with a content such as “I am very sorry. Speech recognition failed.”.

このような構成によれば、誤認識が複数回繰り返された上で最終的に音声認識に失敗した場合においても、謝罪表現が含まれた失敗通知音声を出力することができるので、音声認識の失敗にともなう不快感を軽減することができる。なお、この失敗通知音声に含まれる謝罪表現は、再入力要求音声に含まれる謝罪表現よりも丁寧であることが好ましい。 According to such a configuration, even when erroneous recognition is repeated a plurality of times and finally speech recognition fails, a failure notification sound including an apology expression can be output. Discomfort associated with failure can be reduced. Note that the apology expression included in the failure notification voice is preferably more polite than the apology expression included in the re-input request voice.

上記構成に加えて、さらに、認識結果出力部６が、誤認識回数計測部１４によって計測される誤認識の回数の増加にともなって、認識結果を表す音声を出力する際の表現の丁寧さの度合を高めるようにしてもよい。 In addition to the above configuration, the recognition result output unit 6 further increases the politeness of the expression when outputting the voice representing the recognition result as the number of erroneous recognitions measured by the erroneous recognition number measuring unit 14 increases. The degree may be increased.

具体的な例としては、認識結果出力部６は、第１回目の認識結果の出力の際には、「ドウキョウトタイトウク」といった誤認識の音声を出力し、これに続く第２回目の認識結果の出力の際には、「ドウキョウトチュウオウクでよろしいでしょうか。」といった丁寧な表現による認識結果の出力を行うようにしてもよい。 As a specific example, when the first recognition result is output, the recognition result output unit 6 outputs a misrecognition voice such as “Dark Tight”, and then the second recognition. When outputting the result, the recognition result may be output in a polite expression such as “Are you sure?

このようにすれば、誤認識の回数の増加にともなってより丁重な言葉遣いで再認識の結果を出力することができるので、誤認識によるユーザの不快感および発話音声の再入力にともなうユーザのストレスをさらに有効に軽減することができる。 In this way, the result of re-recognition can be output with more polite words as the number of misrecognitions increases, so the user's discomfort due to misrecognition and the re-input of the utterance voice Stress can be reduced more effectively.

上記構成に加えて、さらに、本実施形態においては、キャラクタ描画部１６が、再入力要求出力部１２による再入力要求音声の出力の際に、謝罪姿勢を呈するようなキャラクタの画像を表示するように形成されているとともに、誤認識の回数の増加にともなって、キャラクタの画像が呈する謝罪姿勢の丁寧さの度合を高めるようになっている。 In addition to the above configuration, in the present embodiment, the character drawing unit 16 displays an image of a character that exhibits an apology when the re-input request output unit 12 outputs the re-input request voice. As the number of misrecognitions increases, the degree of politeness of the apology posture presented by the character image is increased.

具体的な例としては、キャラクタ描画部１６は、誤認識の回数の増加にともなって、より深々と頭を下げるようなキャラクタの画像を表示するようにしてもよい。 As a specific example, the character drawing unit 16 may display an image of a character that lowers his head more deeply as the number of erroneous recognitions increases.

また、キャラクタ描画部１６は、謝罪姿勢の丁寧さの度合が異なる複数のキャラクタパターンを、誤認識の回数と対応関係を有した状態としてデータベース化しておき、誤認識回数計測部１４から取得された計測結果に対応するパターンに該当するキャラクタを表示するようにしてもよい。 Further, the character drawing unit 16 creates a database of a plurality of character patterns having different degrees of politeness of the apology posture as having a correspondence relationship with the number of times of misrecognition, and is acquired from the number of times of misrecognition count measurement 14. You may make it display the character applicable to the pattern corresponding to a measurement result.

このような構成によれば、キャラクタが呈する謝罪姿勢も手伝って、誤認識にともなう不快感およびストレスをさらに有効に軽減することができる。 According to such a configuration, it is possible to more effectively reduce discomfort and stress associated with misrecognition by helping the apology posture presented by the character.

上記構成に加えて、さらに、本実施形態においては、キャラクタ描画部１６が、認識失敗通知部１５による失敗通知音声の出力の際にも、謝罪姿勢を呈するキャラクタを表示するようになっている。 In addition to the above configuration, in the present embodiment, the character drawing unit 16 also displays a character exhibiting an apology posture when the failure notification sound is output by the recognition failure notification unit 15.

具体的な例としては、キャラクタ描画部１６は、認識失敗通知部１５が失敗通知音声を出力する際に、誤認識の場合よりもより深々と頭を下げるか、もしくは、土下座をするようなキャラクタを表示するようにしてもよい。 As a specific example, when the recognition failure notification unit 15 outputs the failure notification sound, the character drawing unit 16 lowers the head more deeply than the case of misrecognition, or makes a character prostrate. May be displayed.

そして、このような構成によれば、音声認識の失敗にともなう不快感をさらに有効に軽減することができる。 And according to such a structure, the discomfort accompanying the failure of speech recognition can be further effectively reduced.

上記構成に加えて、さらに、前述のように、誤認識の回数の増加にともなって認識結果を表す音声を出力する際の表現の丁寧さの度合を高めるようにする場合には、キャラクタ描画部１６が、キャラクタの画像として、誤認識の回数の増加にともなって、当該キャラクタの画像が呈する丁重姿勢の度合いが高まるようなキャラクタ画像を表示するようにしてもよい。 In addition to the above-described configuration, as described above, in the case where the degree of politeness of the expression when outputting the voice representing the recognition result as the number of erroneous recognition increases, the character drawing unit 16 may display a character image that increases the degree of polite posture exhibited by the character image as the number of erroneous recognition increases.

なお、丁重姿勢の具体的な例としては、例えば、お辞儀のようなかしこまった状態であたかもユーザの表情を恐る恐る窺うような姿勢を挙げることができる。 As a specific example of the polite posture, for example, it is possible to include a posture in which the user's facial expression is afraid as if it is in a state of bowing.

このようにすれば、認識結果を表す音声を出力する際の表現の丁寧さの度合が高まることにともなって、表示されるキャラクタが呈する丁重姿勢の度合を高めることができるので、誤認識にともなう不快感およびストレスをより有効に軽減することができる。 In this way, the degree of politeness of the displayed character can be increased as the degree of politeness of the expression when outputting the voice representing the recognition result is increased, resulting in erroneous recognition. Discomfort and stress can be reduced more effectively.

また、本実施形態における音声認識装置１は、車載器に適用されるようにしてもよい。具体的には、本実施形態における音声認識装置１は、車載器としての車載用ナビゲーション装置における目的地や経由地の設定の際における住所の音声入力等に適用することができる。また、本実施形態における音声認識装置１は、車載器としての車載用のオーディオ装置、ＤＶＤ再生装置、ラジオおよびテレビ等における音声入力による再生対象（楽曲、映像作品、番組）の選択にも適用することができる。 In addition, the voice recognition device 1 in the present embodiment may be applied to an on-vehicle device. Specifically, the voice recognition device 1 according to the present embodiment can be applied to voice input of an address when setting a destination or waypoint in a vehicle-mounted navigation device as a vehicle-mounted device. The voice recognition device 1 in the present embodiment is also applied to selection of a playback target (music, video work, program) by voice input in a vehicle-mounted audio device as a vehicle-mounted device, a DVD playback device, a radio, a television, or the like. be able to.

このように、本実施形態における音声認識装置１を車載器に適用すれば、誤認識にともなう不快感およびストレスの軽減を図ることによって、運転の安全性および快適性の向上に繋がることになる。 As described above, when the voice recognition device 1 according to the present embodiment is applied to the vehicle-mounted device, the discomfort and stress associated with misrecognition are reduced, which leads to improvement in driving safety and comfort.

次に、本実施形態の作用として、音声認識装置１の動作例について説明する。 Next, an operation example of the speech recognition apparatus 1 will be described as an operation of the present embodiment.

図２は、音声認識装置１の動作例として、音声認識装置１を車載用ナビゲーション装置に適用した場合における目的地や経由地の設定の際の住所の入力を行う場合における動作例を示したものである。 FIG. 2 shows an example of the operation of the voice recognition device 1 when inputting an address when setting a destination or waypoint when the voice recognition device 1 is applied to an in-vehicle navigation device. It is.

この動作例においては、図２に示すように、まず、ステップ２１（ＳＴ２１）において、コマンド入力要求出力部１０により、ユーザに対して住所の音声入力を促す発話動作として、「住所をお話下さい」といった音声出力をスピーカ８を介して行う。なお、このステップ２１（ＳＴ２１）の発話動作は、車載用ナビゲーション装置（図示せず）に対する住所の音声入力に移行するためのユーザ操作がなされたことを待って行われるようになっている。 In this operation example, as shown in FIG. 2, first, in step 21 (ST21), the command input request output unit 10 causes the user to input the address by voice as “speak address”. Such audio output is performed via the speaker 8. Note that the utterance operation in step 21 (ST21) is performed after a user operation for shifting to voice input of an address to an in-vehicle navigation device (not shown) is performed.

また、このステップ２１（ＳＴ２１）においては、キャラクタ描画部１６により、ディスプレイ１９に、画面アイコンとしてキャラクタの画像を表示するとともに、このキャラクタの画像の表示状態が、ステップ２１（ＳＴ２１）における発話動作に適合するようにする。なお、このステップ２１（ＳＴ２１）におけるキャラクタは、謝罪姿勢を呈してはいない通常状態のキャラクタとされている。 In step 21 (ST21), the character drawing unit 16 displays a character image as a screen icon on the display 19, and the display state of the character image corresponds to the speech operation in step 21 (ST21). Make it fit. It should be noted that the character in this step 21 (ST21) is a normal character that does not exhibit an apology posture.

次いで、ステップ２２（ＳＴ２２）においては、ユーザが、発話ボタン３を押し下げた状態でマイク２に向かって「トウキョウトシナガワク」と発話すると、この発話音声が音声認識装置１内に入力される。 Next, in step 22 (ST22), when the user utters “Tokyo Shinagawa” toward the microphone 2 in a state where the utterance button 3 is pressed, the uttered speech is input into the speech recognition apparatus 1.

次いで、ステップ２３（ＳＴ２３）においては、音声認識部５により、ステップ２２（ＳＴ２２）において入力された発話音声に対する音声認識を行った上で、認識結果出力部６により、当該音声認識の認識結果「ドウキョウトタイトウク」を出力する発話動作を行い、その直後に、コマンド入力要求出力部１０により、「ピー」という音を発した上で次のコマンドの入力を促す発話動作を行う。 Next, in step 23 (ST23), the speech recognition unit 5 performs speech recognition on the uttered speech input in step 22 (ST22), and then the recognition result output unit 6 performs the recognition result “ An utterance operation for outputting “Daily Tight” is performed, and immediately after that, the command input request output unit 10 performs an utterance operation for urging the input of the next command after making a sound of “pea”.

また、このステップ２３（ＳＴ２３）においても、ステップ２１（ＳＴ２１）と同様に、キャラクタ描画部１６により、ディスプレイ１９に表示されたキャラクタの画像の表示状態が、ステップ２３（ＳＴ２３）における発話動作に適合するようにする。なお、このステップ２３（ＳＴ２３）におけるキャラクタも、謝罪姿勢を呈してはいない通常状態のキャラクタとされている。 Also in step 23 (ST23), as in step 21 (ST21), the display state of the character image displayed on the display 19 by the character drawing unit 16 is adapted to the speech operation in step 23 (ST23). To do. The character in this step 23 (ST23) is also a normal character that does not exhibit an apology posture.

しかしながら、このステップ２３（ＳＴ２３）における認識結果は誤認識であるため、ユーザは、誤認識であることを音声認識装置に入力するために、続くステップ２４（ＳＴ２４）において、バックボタン１１を操作して「戻る」のコマンドを入力することによって、誤認識入力を行う。 However, since the recognition result in step 23 (ST23) is misrecognition, the user operates the back button 11 in the following step 24 (ST24) in order to input the recognition to the voice recognition apparatus. Then, by inputting a “return” command, erroneous recognition input is performed.

このステップ２４（ＳＴ２４）の操作により、誤認識回数計測部１４は、１回目の誤認識を計測する。 By the operation in step 24 (ST24), the misrecognition frequency measurement unit 14 measures the first misrecognition.

次いで、ステップ２５（ＳＴ２５）においては、再入力要求出力部１２により、誤認識回数計測部１４の計測結果に基づいて、１回目の誤認識に応答する再入力要求音声として、「すみません、住所をお話下さい。」という音声を出力する発話動作を行う。この再入力要求音声は、ステップ２１（ＳＴ２１）とは異なり、謝罪表現が含まれている。 Next, in step 25 (ST25), the re-input request output unit 12 uses the re-input request output to respond to the first misrecognition based on the measurement result of the mis-recognition frequency measurement unit 14 as “sorry, address. Perform a speech operation that outputs a voice saying "Please speak." Unlike the step 21 (ST21), this re-input request voice includes an apology expression.

また、このステップ２５（ＳＴ２５）においても、キャラクタ描画部１６により、ディスプレイ１９に表示されたキャラクタの画像の表示状態が、ステップ２５（ＳＴ２５）における発話動作に適合するようにする。ただし、このステップ２５（ＳＴ２５）におけるキャラクタの画像は、ステップ２１（ＳＴ２１）とは異なり、謝罪姿勢を呈している（例えば、頭を下げている）キャラクタの画像とされている。 Also in step 25 (ST25), the character drawing unit 16 causes the display state of the character image displayed on the display 19 to be adapted to the speech operation in step 25 (ST25). However, unlike the step 21 (ST21), the image of the character in the step 25 (ST25) is an image of a character exhibiting an apology posture (for example, with the head lowered).

次いで、ステップ２６（ＳＴ２６）においては、ユーザの発話により、音声認識装置１に対して発話音声「トウキョウトシナガワク」を再び入力する。 Next, in step 26 (ST26), the speech “Tokyo Shinagawa” is input again to the speech recognition apparatus 1 by the user's speech.

次いで、ステップ２７（ＳＴ２７）においては、音声認識部５により、ステップ２６（ＳＴ２６）において入力された発話音声に対する音声認識を行った上で、認識結果出力部６により、当該音声認識の認識結果「ドウキョウトチュウオウク」を出力する発話動作を行い、その直後に、コマンド入力要求出力部１０により、「ピー」という音を発した上で次のコマンドの入力を促す発話動作を行う。 Next, in step 27 (ST27), the speech recognition unit 5 performs speech recognition on the uttered speech input in step 26 (ST26), and then the recognition result output unit 6 performs the recognition result “ An utterance operation for outputting “DOUGHOUT OUKU” is performed, and immediately after that, the command input request output unit 10 performs an utterance operation for urging the input of the next command after generating a beep.

また、このステップ２７（ＳＴ２７）においても、キャラクタ描画部１６により、ディスプレイ１９に表示されたキャラクタの画像の表示状態が、ステップ２７（ＳＴ２７）における発話動作に適合するようにする。 Also in step 27 (ST27), the character drawing unit 16 causes the display state of the character image displayed on the display 19 to be adapted to the speech operation in step 27 (ST27).

しかしながら、このステップ２７（ＳＴ２７）における認識結果はまたしても誤認識であるため、ユーザは、誤認識であることを音声認識装置に入力するために、続くステップ２８（ＳＴ２８）において、バックボタン１１の操作によって「戻る」のコマンドを入力する。 However, since the recognition result in step 27 (ST27) is erroneously recognized again, the user selects the back button in the following step 28 (ST28) in order to input the erroneous recognition to the voice recognition device. The command of “return” is input by the operation of 11.

このステップ２８（ＳＴ２８）の操作により、誤認識回数計測部１４は、２回目の誤認識を計測する。 By the operation of this step 28 (ST28), the erroneous recognition frequency measurement unit 14 measures the second erroneous recognition.

次いで、ステップ２９（ＳＴ２９）においては、再入力要求出力部１２により、誤認識回数計測部１４の計測結果に基づいて、２回目の誤認識に応答する再入力要求音声として、「申し訳ございません。もう一度住所をお話下さい。」という音声を出力する発話動作を行う。この再入力要求音声は、ステップ２５（ＳＴ２５）のときよりも更に丁寧さおよび謝罪表現の度合が高まったものとなっている。 Next, at step 29 (ST29), the re-input request output unit 12 makes a re-input request voice to respond to the second misrecognition based on the measurement result of the misrecognition frequency measuring unit 14, and “I ’m sorry. Perform the utterance operation that outputs the voice "Please speak your address again." This re-input request voice has a higher level of politeness and apology than in step 25 (ST25).

また、このステップ２９（ＳＴ２９）においても、キャラクタ描画部１６により、ディスプレイ１９に表示されたキャラクタの画像の表示状態が、ステップ２９（ＳＴ２９）における発話動作に適合するようにする。 Also in step 29 (ST29), the character drawing unit 16 causes the display state of the character image displayed on the display 19 to be adapted to the speech operation in step 29 (ST29).

このステップ２９（ＳＴ２９）におけるキャラクタの画像は、ステップ２５（ＳＴ２５）のときよりも更に謝罪姿勢の度合いが高まったキャラクタの画像（例えば、さらに深々と頭を下げている画像）とされている。 The character image in this step 29 (ST29) is a character image (for example, an image in which the head is lowered more deeply) in which the degree of apology is higher than that in step 25 (ST25).

次いで、ステップ３０（ＳＴ３０）においては、ユーザの発話により、音声認識装置１に対して発話音声「トウキョウトシナガワク」を再び入力する。 Next, in step 30 (ST30), the speech “Tokyo Shinagawa” is input again to the speech recognition apparatus 1 by the user's speech.

次いで、ステップ３１（ＳＴ３１）においては、音声認識部５により、ステップ３０（ＳＴ３０）において入力された発話音声に対する音声認識を行った上で、認識結果出力部６により、当該音声認識の認識結果「ドウキョウトシンジュクク」を出力する発話動作を行い、その直後に、コマンド入力要求出力部１０により、「ピー」という音を発した上で次のコマンドの入力を促す発話動作を行う。 Next, in step 31 (ST31), the speech recognition unit 5 performs speech recognition on the uttered speech input in step 30 (ST30), and then the recognition result output unit 6 performs the recognition result “ An utterance operation for outputting “DOCOMO SYNC” is performed, and immediately after that, the command input request output unit 10 performs an utterance operation for urging the input of the next command after generating a beep.

このとき、認識結果出力部６は、「ドウキョウトシンジュククでよろしいですか」という丁寧な表現で認識結果を出力するようにしてもよい。また、このとき、認識結果出力部６は、申し訳なさを表現するために、認識結果を弱い声で出力するようにしてもよい。 At this time, the recognition result output unit 6 may output the recognition result with a polite expression “Are you sure? At this time, the recognition result output unit 6 may output the recognition result with a weak voice in order to express apologeticity.

また、このステップ３１（ＳＴ３１）においても、キャラクタ描画部１６により、ディスプレイ１９に表示されたキャラクタの画像の表示状態が、ステップ３１（ＳＴ３１）における発話動作に適合するようにする。 Also in step 31 (ST31), the character drawing unit 16 causes the display state of the character image displayed on the display 19 to be adapted to the speech operation in step 31 (ST31).

しかしながら、このステップ３１（ＳＴ３１）における認識結果はまたしても誤認識であるため、ユーザは、誤認識であることを音声認識装置に入力するために、続くステップ３２（ＳＴ３２）において、バックボタン１１の操作によって「戻る」のコマンドを入力する。 However, since the recognition result in step 31 (ST31) is erroneously recognized again, the user presses the back button in subsequent step 32 (ST32) in order to input the recognition to the voice recognition device. The command of “return” is input by the operation of 11.

このステップ３２（ＳＴ３２）の操作により、誤認識回数計測部１４は、３回目の誤認識を計測する。 By the operation in step 32 (ST32), the misrecognition frequency measurement unit 14 measures the third misrecognition.

次いで、ステップ３３（ＳＴ３３）においては、認識失敗通知部１５により、誤認識回数計測部１４の計測結果に基づいて、失敗通知音声として、「大変申し訳ございません。音声認識に失敗しました。」という音声を出力する発話動作を行う。 Next, in step 33 (ST33), the recognition failure notification unit 15 causes the failure notification voice based on the measurement result of the erroneous recognition frequency measurement unit 14 to be “sorry. Sorry, voice recognition failed.” Perform an utterance operation that outputs voice.

また、このステップ３３（ＳＴ３３）においても、キャラクタ描画部１６により、ディスプレイ１９に表示されたキャラクタの画像の表示状態が、ステップ３３（ＳＴ３３）における発話動作に適合するようにする。 Also in step 33 (ST33), the character drawing unit 16 causes the display state of the character image displayed on the display 19 to be adapted to the speech operation in step 33 (ST33).

このステップ３３（ＳＴ３３）におけるキャラクタの画像は、ステップ２９（ＳＴ２９）のときよりも更に謝罪姿勢の度合いが高まったキャラクタの画像とされている。 The character image in step 33 (ST33) is an image of a character with a higher apology posture than in step 29 (ST29).

以上述べたように、本実施形態によれば、音声認識の誤認識の回数の増加にともなって、再入力要求音声が表す言語についての丁寧さの度合および謝罪表現の度合を高めることができるので、誤認識にともなう不快感およびストレスを軽減することができる。 As described above, according to the present embodiment, as the number of misrecognitions of voice recognition increases, the degree of politeness and apology for the language represented by the re-input request voice can be increased. , Discomfort and stress associated with misrecognition can be reduced.

また、不快感が募った状態では、適正な発話が困難な場合が多いため、最終的な音声認識の成功率が低減してしまうこともあるため、本発明のように不快感を軽減することができれば、最終的な音声認識の成功率を向上させることにもつながる。 In addition, in the state where discomfort is solicited, proper utterance is often difficult, and the success rate of the final speech recognition may be reduced, so that discomfort is reduced as in the present invention. If this is possible, it will lead to an improvement in the final speech recognition success rate.

なお、本発明は、前述した実施の形態に限定されるものではなく、必要に応じて種々の変更が可能である。 In addition, this invention is not limited to embodiment mentioned above, A various change is possible as needed.

例えば、前述した実施形態においては、誤認識入力手段として、バックボタン１１を用いていたが、本発明は、このような構成に限定されるものではなく、マイク２を誤認識入力手段として機能させることによって、誤認識入力を音声入力によって行うことが可能に構成してもよい。ただし、この場合には、誤認識入力がなされたと判断することができるように、例えば、音声認識装置側で、予め誤認識入力に相当する特定の言語（例えば、「間違い」、「駄目」、あるいは、特定の罵詈雑言等）を保持しておき、この特定の言語が入力されたか否かによって誤認識入力がなされたことの有無を判定することが必要となる。 For example, in the above-described embodiment, the back button 11 is used as the erroneous recognition input unit. However, the present invention is not limited to such a configuration, and the microphone 2 functions as the erroneous recognition input unit. Thus, it may be configured such that erroneous recognition input can be performed by voice input. However, in this case, for example, on the voice recognition device side, a specific language corresponding to the erroneous recognition input (for example, “Fail”, “No”, Alternatively, it is necessary to determine whether or not a misrecognition input has been made depending on whether or not this specific language has been input.

また、誤認識の回数の増加にともなって、キャラクタの画像を徐々に癒し度の高いもの（例えば、丸みを帯びたもの）にするようにしてもよい。 Further, as the number of erroneous recognitions increases, the character image may be gradually healed (eg, rounded).

本発明に係る音声認識装置の実施形態を示すブロック図The block diagram which shows embodiment of the speech recognition apparatus which concerns on this invention 本発明に係る音声認識装置の実施形態において、動作例を示す工程図Process drawing which shows operation example in embodiment of the speech recognition apparatus which concerns on this invention. 従来の音声認識装置における動作例を示す工程図Process diagram showing an example of operation in a conventional speech recognition apparatus

符号の説明Explanation of symbols

１音声認識装置
２マイク
５音声認識部
６認識結果出力部
８スピーカ
１１バックボタン
１２再入力要求出力部
１５認識失敗通知部
１６キャラクタ描画部
１９ディスプレイ DESCRIPTION OF SYMBOLS 1 Voice recognition apparatus 2 Microphone 5 Voice recognition part 6 Recognition result output part 8 Speaker 11 Back button 12 Re-input request output part 15 Recognition failure notification part 16 Character drawing part 19 Display

Claims

マイクを介して入力された発話音声が表す言語を認識する音声認識を行う音声認識手段と、
この音声認識手段の認識結果を表す音声を、スピーカを介して出力する認識結果出力手段と、
この認識結果出力手段によって出力された音声が表す前記認識結果が誤認識である旨の入力が可能とされた誤認識入力手段と、
この誤認識入力手段による前記誤認識である旨の入力に応答して、前記発話音声の再入力を要求するための音声である再入力要求音声を、前記スピーカを介して出力する再入力要求出力手段と、
前記誤認識が所定回数連続した場合に、前記音声認識に失敗したことを通知するための音声である失敗通知音声を、前記スピーカを介して出力する認識失敗通知手段と
を備えた音声認識装置であって、
前記再入力要求出力手段は、謝罪表現が含まれた前記再入力要求音声を出力するように形成されているとともに、前記誤認識の回数の増加にともなって、前記再入力要求音声に含まれる謝罪表現の丁寧さの度合いを高めるように形成されていること
を特徴とする音声認識装置。 Speech recognition means for performing speech recognition for recognizing the language represented by the uttered speech input via the microphone;
Recognition result output means for outputting a voice representing the recognition result of the voice recognition means via a speaker;
A misrecognition input unit capable of inputting that the recognition result represented by the voice output by the recognition result output unit is a misrecognition;
A re-input request output for outputting a re-input request voice, which is a voice for requesting re-input of the uttered voice, in response to an input to the effect of the erroneous recognition by the erroneous recognition input means. Means,
A recognition failure notification means comprising: a recognition failure notification means for outputting, via the speaker, a failure notification sound that is a sound for notifying that the voice recognition has failed when the erroneous recognition has continued for a predetermined number of times. There,
The re-input request output means is configured to output the re-input request voice including an apology expression, and the apology included in the re-input request voice as the number of times of erroneous recognition increases. A speech recognition device characterized by being formed so as to increase the degree of politeness of expression.

前記認識結果出力手段は、前記誤認識の回数の増加にともなって、前記認識結果を表す音声を出力する際の表現の丁寧さの度合を高めるように形成されていること
を特徴とする請求項１に記載の音声認識装置。 The recognition result output means is formed to increase the degree of politeness of the expression when outputting the voice representing the recognition result as the number of erroneous recognitions increases. The speech recognition apparatus according to 1.

前記認識失敗通知手段は、謝罪表現が含まれた前記失敗通知音声を出力するように形成されていること
を特徴とする請求項１または請求項２に記載の音声認識装置。 The speech recognition apparatus according to claim 1, wherein the recognition failure notification unit is configured to output the failure notification sound including an apology expression.

音声認識装置本体の動作状態に応じて擬人化されたキャラクタの画像を表示部に表示するキャラクタ表示処理手段を備え、
前記キャラクタ表示処理手段は、前記再入力要求音声の出力の際に、前記キャラクタの画像として、謝罪姿勢を呈するようなキャラクタの画像を表示するように形成されているとともに、前記誤認識の回数の増加にともなって、当該キャラクタの画像が呈する謝罪姿勢の丁寧さの度合を高めるように形成されていること
を特徴とする請求項１乃至請求項３のいずれか１項に記載の音声認識装置。 Character display processing means for displaying an anthropomorphic character image on the display unit according to the operation state of the speech recognition apparatus body,
The character display processing means is configured to display an image of a character that exhibits an apology as the character image when the re-input request voice is output, and the number of times of erroneous recognition is determined. The voice recognition device according to any one of claims 1 to 3, wherein the voice recognition device is formed so as to increase a degree of politeness of an apology posture exhibited by the image of the character as the number increases.

前記認識結果出力手段は、前記誤認識の回数の増加にともなって、前記認識結果を表す音声を出力する際の表現の丁寧さの度合を高めるように形成され、
前記キャラクタ表示処理手段は、前記認識結果を表す音声の出力の際に、前記キャラクタの画像として、前記誤認識の回数の増加にともなって、当該キャラクタの画像が呈する丁重姿勢の度合いが高まるようなキャラクタの画像を表示するように形成されていること
を特徴とする請求項４に記載の音声認識装置。 The recognition result output means is formed so as to increase the degree of politeness of the expression when outputting the voice representing the recognition result as the number of misrecognitions increases.
The character display processing means may increase the degree of polite posture exhibited by the character image as the character image increases as the number of times of erroneous recognition is increased when outputting the voice representing the recognition result. The voice recognition apparatus according to claim 4, wherein the voice recognition apparatus is configured to display an image of a character.

前記認識失敗通知手段は、謝罪表現が含まれた前記失敗通知音声を出力するように形成され、
前記キャラクタ表示処理手段は、前記失敗通知音声の出力の際に、前記キャラクタの画像として、謝罪姿勢を呈するようなキャラクタの画像を表示するように形成されていること
を特徴とする請求項４または請求項５に記載の音声認識装置。 The recognition failure notification means is configured to output the failure notification sound including an apology expression,
The character display processing means is configured to display an image of a character that exhibits an apology as the character image when the failure notification sound is output. The speech recognition apparatus according to claim 5.

車載器に適用されることを特徴とする請求項１乃至請求項６のいずれか１項に記載の音声認識装置。 The speech recognition apparatus according to claim 1, wherein the speech recognition apparatus is applied to an on-vehicle device.