JP2656234B2

JP2656234B2 - Conversation voice understanding method

Info

Publication number: JP2656234B2
Application number: JP60178615A
Authority: JP
Inventors: 浩之千本
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1985-08-15
Filing date: 1985-08-15
Publication date: 1997-09-24
Anticipated expiration: 2012-09-24
Also published as: JPS6239899A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、音声入力による情報システムに用いられる
会話音声理解方法に関する。Description: TECHNICAL FIELD The present invention relates to a conversation voice understanding method used in an information system by voice input.

〔発明の技術的背景とその問題点〕[Technical background of the invention and its problems]

近年、音声認識・合成技術の発展は目覚しく、例えば
連続音声認識や不特定話者を対象とした音声認識が可能
となり、また一方、精度の高い音声合成が可能となって
いる。In recent years, the development of speech recognition / synthesis technology has been remarkable. For example, continuous speech recognition and speech recognition for unspecified speakers have become possible, while highly accurate speech synthesis has become possible.

この様な技術を用いて電話公衆回線による各種のサー
ビスを行なう電話音声応答サービスなどが開発されてお
り、現在ではこれを一歩すすめた会話音声理解システム
が開発されている。しかしこの種のシステムのユーザー
は不特定であり、例えば老人、子供、女性のようにシス
テムに不慣れな人を多く、システムが誤認識したり、会
話の内容が理解できなくなる事も多く、特に従来のシス
テムでは、誤認識した場合等では再入力する場合も同じ
単語を発声させる為、１度ひっかかるとなかなか認識が
できない場合が多くスムーズに会話が行なわれないとい
う欠点があった。A telephone voice response service for providing various services via a telephone public line using such a technology has been developed, and a conversation voice understanding system that takes this one step further has been developed. However, the users of this type of system are unspecified.For example, many people are unfamiliar with the system, such as the elderly, children, and women. In this system, the same word is uttered even when re-entered in the case of erroneous recognition or the like, and there is a drawback in that once it is caught once, it is often difficult to recognize the word and the conversation is not smoothly performed.

〔発明の目的〕[Object of the invention]

本発明の目的は、人間と機械の会話において会話をス
ムーズにかつ正確に行なう事が可能となる会話音声理解
方法を提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to provide a conversation voice understanding method which enables a conversation between a human and a machine to be performed smoothly and accurately.

〔発明の概要〕[Summary of the Invention]

本発明は、話者が発声する会話音声を認識し、認識さ
れた結果に基づき応答あるいは質問を行い、前記により
行われる会話中に誤認識あるいは認識不能の状態になっ
た場合に、前記話者の指定により前記質問を選択方式に
変更し、該選択方式に変更した後は該方式を続けること
を特徴とするものである。The present invention recognizes a conversational voice uttered by a speaker, gives a response or a question based on the recognized result, and, when a false recognition or an unrecognizable state occurs during the conversation performed as described above, the speaker The question is changed to a selection method by the designation of, and after changing to the selection method, the method is continued.

〔発明の効果〕〔The invention's effect〕

本発明によれば、誤認識が多い場合等、システムが理
解しにくい時、質問を選択方式にする事で再入力を何度
もする必要がなく会話がスムーズに行なう事が可能とな
り、ユーザーにとって実用性が向上する。According to the present invention, when the system is difficult to understand, such as when there are many misrecognitions, the question can be selected so that the conversation can be smoothly performed without having to re-enter the number of times, and for the user. Practicality is improved.

〔発明の実施例〕(Example of the invention)

以下、図面を参照しながら本発明の実施例について説
明する。Hereinafter, embodiments of the present invention will be described with reference to the drawings.

第１図は本発明の第１の実施例のブロック図であり、
第２図は第１の実施例のフローチャートである。第１の
実施例はシステムとの会話中に、誤認識が多発したり、
会話内容が理解できない場合、自動的にシステムとの会
話が選択方式となるものである。FIG. 1 is a block diagram of a first embodiment of the present invention,
FIG. 2 is a flowchart of the first embodiment. In the first embodiment, erroneous recognition frequently occurs during conversation with the system,
When the conversation contents cannot be understood, the conversation with the system is automatically selected.

まずシステムと会話をする前にシステム内のカウンタ
ー（Ｎ）を０にクリアーし（ステップ10）、次にカウン
ター（Ｎ）に１を加える（ステップ11）。カウンターを
セットした後に会話or選択モードを会話モードにし（ス
テップ12）、カウンターのカウント番号にしたがってシ
ステムからの質問内容を会話生成部Ａ（第１図４）で生
成し、応答部８、音声出力部９をへて、外部へ出力され
る（ステップ13）。この質問に対して話者が返答した音
声を音声入力部１でA/D変換等の音響処理し、処理した
音声データを用いて音声認識部３で辞書２と比較しなが
ら音声認識を行なう（ステップ15）。この認識した結果
を会話生成部A4で判断し、誤認識と判断された場合は、
会話生成部４の内にあるミスカウンターｍに１を加え
（ステップ16,17）、カウンターｍの値と閾値Ｍとの比
較を行なう（ステップ18）。もしここでミスカウンター
のカウントｍがＭより小さい場合は、再度質問を行ない
音声を入力してもらう。もしこの時の入力は、正常に認
識されたとすると、会話生成部A4で発話内容をチェック
し（ステップ22,23）、モードが会話モードなら、前記
ステップ23,24で会話内容に合っていると判断されたこ
ととなり、会話中の次の単語を誤認識するか否かを判断
する（ステップ16）ために、会話中の単語をカウントす
るカウンターＮのカウントを１つ増し、次の会話中の単
語に関しての質問を行なう（ステップ24,13−1,13）。
このようなサイクルにより会話をつづけていく。一方上
記会話中に生じた誤認識の回数がミスカウンターのカウ
ントｍにたしこまれていく。もし会話中にこのミスカウ
ンターのカウントｍが閾値Ｍより大きくなった場合（ス
テップ18）もしくは、システムがまったく予期していな
い答えが使用者から返ってきた場合（ステップ23）、選
択部６により会話生成部B5にスイッチ７がスイッチング
され、モードが会話モードから選択モードへ変更される
（ステップ19）。こうして選択モードへ変更された後、
会話生成部B5で現在までの会話内容から質問等を決定し
（ステップ20）、応答部８、音声出力部９を通して質問
を出力する（ステップ21）。この質問に対して再び話者
が答えるというサイクルを最後まで選択モードで行なっ
ていく。First, before talking with the system, the counter (N) in the system is cleared to 0 (step 10), and then 1 is added to the counter (N) (step 11). After setting the counter, the conversation or selection mode is set to the conversation mode (step 12), the contents of the question from the system are generated by the conversation generation unit A (FIG. 1) according to the count number of the counter, the response unit 8, voice output. It is output to the outside through the section 9 (step 13). The voice answered by the speaker in response to this question is subjected to acoustic processing such as A / D conversion by the voice input unit 1 and the voice recognition unit 3 performs voice recognition using the processed voice data while comparing it with the dictionary 2 ( Step 15). The result of the recognition is determined by the conversation generation unit A4, and when it is determined that the recognition is erroneous,
One is added to the miss counter m in the conversation generating unit 4 (steps 16 and 17), and the value of the counter m is compared with the threshold M (step 18). If the count m of the miss counter is smaller than M, a question is asked again and a voice is input. If the input at this time is recognized normally, the speech generation unit A4 checks the utterance content (steps 22 and 23). In order to determine whether or not the next word in the conversation is erroneously recognized (step 16), the count of the counter N for counting the words in the conversation is increased by one, and the next word in the conversation is increased. A question about a word is made (steps 24, 13-1, 13).
Conversation is continued by such a cycle. On the other hand, the number of erroneous recognitions occurring during the conversation is accumulated in the count m of the miss counter. If the count m of the miss counter becomes larger than the threshold value M during the conversation (step 18), or if the user returns an answer that the system does not expect at all (step 23), the selection unit 6 performs the conversation. The switch 7 is switched to the generator B5, and the mode is changed from the conversation mode to the selection mode (step 19). After changing to selection mode in this way,
The conversation generation unit B5 determines a question or the like from the conversation contents up to the present (step 20), and outputs the question through the response unit 8 and the voice output unit 9 (step 21). The cycle in which the speaker answers this question again is performed in the selection mode until the end.

上記実施例によれば、システムが会話中に誤認識を多
々起こす場合、選択モードに自動的に変更される事によ
り、選択方式に対話が進むので、むだな誤認識による再
入力を必要とせず、会話がスムーズに進むことが可能で
ある。According to the above embodiment, when the system frequently causes erroneous recognition during a conversation, the mode is automatically changed to the selection mode, and the dialogue proceeds to the selection method, so that re-input due to useless erroneous recognition is not required. , Conversation can proceed smoothly.

次に本発明の第２の実施例について図面を参照して説
明する。Next, a second embodiment of the present invention will be described with reference to the drawings.

第３図は本発明の第２の実施例のブロック図であり、
第４図は第２の実施例のフローチャートである。第２の
実施例は、システムとの会話中に、誤認識が多発した
り、会話内容が理解できない場合、自動的にシステムと
の会話が選択方式となるだけではなく、話者（使用者）
が必要に応じて選択方式をいつでも取り入れることがで
きるものである。FIG. 3 is a block diagram of a second embodiment of the present invention.
FIG. 4 is a flowchart of the second embodiment. In the second embodiment, when erroneous recognition frequently occurs or conversation contents cannot be understood during a conversation with the system, the conversation with the system is not only automatically selected but also performed by a speaker (user).
Can always adopt a selection method as needed.

まずシステムと会話が始まる前にシステム内のカウン
ター（Ｎ）を０にクリアーし（ステップ34）、次にカウ
ンター（Ｎ）に１を加える（ステップ35）。この後にま
ず会話モードor選択モードを会話モードにし（ステップ
36）、この時点でシステムとの会話に対する初期設定が
終了する。初期設定終了後、カウンターのカウント番号
Ｎにしたがって質問を会話生成部Ａ（第３図28）で決定
し、応答部32、音声出力部33を通して質問を出力する
（ステップ37）。またここで質問の内容、意味等がよく
わからないなどの問題が生じた時、外部選択部31よりス
イッチ等の入力により、モード変更を行ない（ステップ
38）、選択モード側の会話生成部B29より、現在までの
会話内容から質問の内容を決定しなおし、応答部32、音
声出力部33を通して出力される。こで会話モードで会話
が進んでいるとした場合、上記の質問に対して話者が返
答した音声を音声入力部25でA/D変換等の音響処理し、
処理した音声データを用いて音声認識部27で辞書26と比
較しながら音声認識を行なう（ステップ41）。この認識
結果を会話生成部A28で判断し、誤認識と判断された場
合は、会話生成部A28の中にあるミスカウンターｍに１
を加え（ステップ42,43）、カウンターｍの値と閾値Ｍ
との比較を行ない（ステップ44）、もしここでミスカウ
ンターのカウントｍがＭより小さい場合は、再度入力を
行なってもらう。もしこの入力が正常に認識されたとす
ると、会話生成部A28で発話内容をチェックし（ステッ
プ48,48）、モードが会話モードなら、前記ステップ49,
50で会話内容に合っていると判断されたこととなり、会
話中の次の単語を誤認識するか否かを判断する（ステッ
プ42）ために、会話中の単語をカウントするカウンター
Ｎのカウントを１つ増し、次の会話中の単語に関しての
質問を行なう（ステップ50,37−1,37）。上記サイクル
により会話をつづけていく。一方、上記会話中に生じた
誤認識の回数がミスカウンターのカウントｍにたしこま
れていくが、もし会話中にこのミスカウントｍが閾値Ｍ
より大きくなった場合（ステップ44）もしくは、システ
ムがまったく予期しない答えが使用者から返ってきた場
合（ステップ49）、選択部30が自動的に選択モード用会
話生成部B29の方にスイッチ34がスイッチングされ、モ
ードが会話モードから選択モードへ変更される（ステッ
プ45）。こうして選択モードに変更された後、会話生成
部B29で現在までの会話内容から質問を決定し（ステッ
プ46）、応答部32、音声出力部９を通じて質問を出力す
る（ステップ47）。この質問に対して再び話者が答える
というサイクルを続けていくものである。First, before the conversation with the system starts, the counter (N) in the system is cleared to 0 (step 34), and then 1 is added to the counter (N) (step 35). After this, first set the conversation mode or selection mode to conversation mode (step
36) At this point, the initial settings for the conversation with the system are completed. After the completion of the initial setting, the conversation generation unit A (FIG. 28) determines the question according to the count number N of the counter, and outputs the question through the response unit 32 and the voice output unit 33 (step 37). If a problem such as the contents or meaning of the question is not understood well, the mode is changed by inputting a switch or the like from the external selection unit 31 (step
38), the content of the question is determined again from the conversation content up to the present time by the conversation generation unit B29 on the selection mode side, and is output through the response unit 32 and the voice output unit 33. If it is assumed that the conversation is proceeding in the conversation mode, the voice input by the speaker 25 in response to the above-mentioned question is subjected to acoustic processing such as A / D conversion,
Using the processed voice data, the voice recognition unit 27 performs voice recognition while comparing it with the dictionary 26 (step 41). The result of this recognition is determined by the conversation generation unit A28, and if it is determined to be erroneous recognition, 1 is added to the miss counter m in the conversation generation unit A28.
(Steps 42 and 43), and the value of the counter m and the threshold M
Is compared (step 44). If the count m of the miss counter is smaller than M, an input is made again. If the input is recognized normally, the contents of the utterance are checked by the conversation generation unit A28 (steps 48 and 48).
In step 50, it is determined that the word matches the content of the conversation. In order to determine whether or not the next word in the conversation is erroneously recognized (step 42), the counter N for counting the words in the conversation is counted. One more question is asked about the word in the next conversation (steps 50, 37-1, 37). The conversation is continued by the above cycle. On the other hand, the number of misrecognitions occurring during the conversation is accumulated in the count m of the miss counter.
If it becomes larger (step 44) or if the system returns a completely unexpected answer from the user (step 49), the selector 30 automatically switches the switch 34 to the selection mode conversation generator B29. The mode is switched, and the mode is changed from the conversation mode to the selection mode (step 45). After the mode is changed to the selection mode in this way, the conversation generation unit B29 determines a question from the conversation content up to the present (step 46), and outputs the question through the response unit 32 and the voice output unit 9 (step 47). This is a cycle in which the speaker answers this question again.

上記実施例によれば、システムが会話中に誤認識を多
々起こす場合、選択モードに自動的に変更されるだけで
なく、使用者が会話に対して不慣れな場合等でも、使用
者自身の判断でいつでも好きな時に選択方式の会話がで
きる事により、むだな誤認識による再入力が減り、かつ
使用者に対して不安な気持ちを取り除く事ができ、会話
がスムーズに行なわれることが可能である。According to the above embodiment, when the system frequently causes misrecognition during a conversation, not only is the mode automatically changed to the selection mode, but also if the user is unfamiliar with the conversation, etc. It is possible to have a conversation of the selection method whenever you want, and it is possible to reduce re-input due to unnecessary misrecognition and to remove the user's uneasy feeling, and it is possible to have a smooth conversation .

尚、本発明は上記実施例に限定されるものではない。
たとえば、第２の実施例で外部よりモード変更する際、
外部選択部としてスイッチを設けるのではなく、会話中
に音声入力によって行なってもよい。入力音声の認識処
理や合成の方法、内容判断の方法は従来より知られた種
々の方式を適宜採用すればよい。要するに本発明は、そ
の要旨を逸脱しない範囲で種々変形して実施することが
できる。The present invention is not limited to the above embodiment.
For example, when the mode is changed externally in the second embodiment,
Rather than providing a switch as an external selection unit, it may be performed by voice input during a conversation. As a method of recognizing and synthesizing the input voice, a method of judging the content, various conventionally known methods may be appropriately adopted. In short, the present invention can be variously modified and implemented without departing from the gist thereof.

【図面の簡単な説明】[Brief description of the drawings]

第１図は本発明の第１の実施例のブロック図、第２図は
本発明の第１の実施例のフローチャート、第３図は本発
明の第２の実施例のブロック図、第４図は本発明の第２
の実施例のフローチャートである。１……音声入力部２……辞書３……音声認識部４……会話生成部Ａ５……会話生成部Ｂ６……選択部７……スイッチ８……応答部９……音声出力部 25……音声入力部 26……辞書 27……音声認識部 28……会話生成部Ａ 29……会話生成部Ｂ 30……選択部 31……外部選択部 32……応答部 33……音声出力部 34……スイッチFIG. 1 is a block diagram of a first embodiment of the present invention, FIG. 2 is a flowchart of a first embodiment of the present invention, FIG. 3 is a block diagram of a second embodiment of the present invention, and FIG. Is the second of the present invention.
6 is a flowchart of the embodiment. DESCRIPTION OF SYMBOLS 1 ... Voice input part 2 ... Dictionary 3 ... Voice recognition part 4 ... Conversation generation part A 5 ... Conversation generation part B 6 ... Selection part 7 ... Switch 8 ... Response part 9 ... Voice output part 25 Voice input unit 26 Dictionary 27 Voice recognition unit 29 Conversation generation unit A 29 Conversation generation unit B 30 Selection unit 31 External selection unit 32 Response unit 33 Voice Output unit 34 Switch

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】話者が発声する会話音声を認識し、認識された結果に基づき応答あるいは質問を行い、前記により行われる会話中に誤認識あるいは認識不能の
状態になった場合に、前記話者の指定により前記質問を
選択方式に変更し、該選択方式に変更した後は該方式を続けることを特徴と
する会話音声理解方法。1. A speaker which recognizes a spoken voice spoken and responds or inquires based on the recognized result. A conversation voice comprehension method characterized in that the question is changed to a selection method according to a specification of a person, and the method is continued after the change to the selection method.

【請求項２】前記選択方式への変更は、前記会話中のキ
ーワード抽出を行うことにより自動的に選択方式に変更
することを特徴とする特許請求の範囲第１項記載の会話
音声理解方法。2. The conversation voice understanding method according to claim 1, wherein the change to the selection method is automatically changed to the selection method by extracting a keyword during the conversation.