JP6736225B2

JP6736225B2 - Interactive device, interactive device control method, and program

Info

Publication number: JP6736225B2
Application number: JP2017063689A
Authority: JP
Inventors: 喜昭野田; 節夫山田; 杉崎　正之; 正之杉崎
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-03-28
Filing date: 2017-03-28
Publication date: 2020-08-05
Anticipated expiration: 2037-03-28
Also published as: JP2018165805A

Description

本発明は、発話者に対する応答を行う対話装置、対話装置の制御方法およびプログラムに関する。 The present invention relates to a dialogue device that responds to a speaker, a control method for the dialogue device, and a program.

発話者の発した音声に対する音声認識の結果を発話内容として取得し、取得した発話内容を解析して、解析結果に応じた応答を発話者に対して行う対話システムが検討されている。このような対話システムによれば、例えば、顧客がコールセンタに電話し、不明点などの質問を話すことで質問に対する回答を自動的に顧客に提示することができる。また、非特許文献１には、上述したような対話システムにおいて、対話システム側が相槌を行うことで、ユーザが話しやすく感じるという効果を得られることが記載されている。 A dialogue system has been studied in which a result of voice recognition for a voice uttered by a speaker is acquired as utterance content, the acquired utterance content is analyzed, and a response according to the analysis result is given to the speaker. According to such an interactive system, for example, the customer can automatically present the answer to the question by calling the call center and speaking the question such as unclear points. Further, Non-Patent Document 1 describes that, in the dialog system as described above, the effect that the user feels easy to talk can be obtained when the dialog system side carries out a joint play.

中野幹生他「自然言語処理シリーズ７対話システム」、コロナ社、２０１５年２月１３日（Ｐ２１２−Ｐ２１８）Nakano, Mikio et al., "Natural Language Processing Series 7, Dialog System", Corona Publishing, February 13, 2015 (P212-P218)

上述したような対話システムでは、あたかも人と話しているような自然な応答を発話者に返すことが重要である。 In the dialog system as described above, it is important to return to the speaker a natural response as if talking to a person.

ここで、人間は常に伝えたい内容を整理してよどみなく話せるわけではなく、話の途中で考えたり、言い淀んだりする。そのため、発話者が話している途中に、音声が途切れることがある。 Here, human beings do not always have to sort out what they want to convey and to speak without stagnation. They think and stutter in the middle of the story. Therefore, the voice may be interrupted while the speaker is speaking.

従来の対話システムにおいては、発話者が実際には話し終わっていないにも関わらず、発話者の音声が途切れた時点までの音声認識により得られた中途半端な発話内容の解析結果に応じた応答が行われることがある。このような応答は不適切な応答であり、発話者が不自然さを感じる原因となる。 In a conventional dialogue system, a response according to the analysis result of half-finished utterance content obtained by voice recognition up to the time when the speaker's voice is interrupted, even though the speaker has not actually finished speaking May occur. Such a response is an inappropriate response and causes the speaker to feel unnatural.

また、通常、人と人との対話では、発話者の話が途切れたタイミングなどで、受話者が相槌や頷きを行うことで、発話者は受話者が発話者の話を傾聴していることを感じることができる。上述したように、発話者に対して、対話システム側から相槌や頷きなどの発話者の話を傾聴していることを示す応答を行うことは、発話者が不自然さを感じないようにするために重要である。しかしながら、従来の対話システムにおいては、発話者の話を傾聴していることを示す応答を行うことについて十分な検討がなされていなかった。 Also, in a person-to-person dialogue, usually, the speaker listens to the speaker's talk because the speaker performs a hammer and nods when the talker's talk is interrupted. You can feel As described above, to give a response to the speaker indicating that the dialogue system is listening to the speaker, such as a hammer or nod, prevents the speaker from feeling unnatural. Is important for. However, in the conventional dialogue system, sufficient consideration has not been given to making a response indicating that the speaker is listening.

このように従来の対話システムでは、発話者に対してより適切な応答を行うことができないという問題がある。 As described above, the conventional dialogue system has a problem in that it cannot give a more appropriate response to the speaker.

上記のような問題点に鑑みてなされた本発明の目的は、発話者に対してより適切な応答を行うことができる対話装置、対話装置の制御方法およびプログラムを提供することにある。 An object of the present invention made in view of the above problems is to provide a dialog device, a method for controlling the dialog device, and a program that can make a more appropriate response to a speaker.

上記課題を解決するため、本発明に係る対話装置は、発話者に対する応答を行う応答部を備えた対話装置であって、発話者の発話の音声に対して音声認識を行い、前記音声認識の結果により得られた認識結果メッセージ単位において、話し終わりに用いられる特定の語句が検出されるか否かのみにより、前記発話者が話し終わったか否かを判定する話し終わり判定部を備える話し終わり判定装置と、前記話し終わり判定装置により前記発話者が話し終わったと判定されると、前記発話者の話し終わりまでの発話内容に応じた応答を前記応答部に行わせ、前記話し終わり判定装置により前記発話者が話し終わっていないと判定されると、前記発話者の話を傾聴していることを示す応答を前記応答部に行わせる応答制御部と、を備え、前記応答制御部は、前記判定部により前記発話者が話し終わったと判定されるまで、前記認識結果メッセージを蓄積し、前記判定部により前記発話者が話し終わったと判定されると、前記蓄積したメッセージと前記発話者の話し終わりであると判定された認識結果メッセージとをまとめた意味内容に応じた応答を前記応答部に行わせる。 In order to solve the above-mentioned problems, a dialogue apparatus according to the present invention is a dialogue apparatus including a response unit for making a response to a speaker, wherein speech recognition is performed on a speech of a speaker, and the speech recognition End-of-speech determination provided with a end-of-speech determination unit that determines whether or not the speaker has finished speaking only in the recognition result message unit obtained as a result, whether or not a specific phrase used at the end of speaking is detected. When it is determined by the device and the talk end determination device that the speaker has finished speaking, the response unit is caused to make a response according to the content of the utterance of the speaker until the end of the conversation, and the talk end determination device described above is used. When it is determined that the speaker has not finished speaking, a response control unit that causes the response unit to make a response indicating that the speaker is listening to the talk is provided , and the response control unit is configured to perform the determination. The recognition result message is accumulated until the section determines that the speaker has finished speaking, and when the determination section determines that the speaker has finished speaking, the accumulated message and the end of speaking by the speaker are compared. the responses for the semantic content of the determined recognition results are summarized and a message that there Ru was performed in the response unit.

また、上記課題を解決するため、本発明に係る話し終わり対話装置の制御方法は、発話者の発話の音声に対して音声認識を行うステップと、前記音声認識の結果により得られた認識結果メッセージ単位において、話し終わりに用いられる特定の語句が検出されるか否かのみにより、前記発話者が話し終わったか否かを判定するステップと、前記発話者が話し終わったと判定されると、前記発話者の話し終わりまでの発話内容に応じた応答を前記応答部に行わせ、前記発話者が話し終わっていないと判定されると、前記発話者の話を傾聴していることを示す応答を前記応答部に行わせるステップと、を含み、前記発話者が話し終わったと判定されるまで、前記認識結果メッセージを蓄積し、前記発話者が話し終わったと判定されると、前記蓄積したメッセージと前記発話者の話し終わりであると判定された認識結果メッセージとをまとめた意味内容に応じた応答を前記応答部に行わせる。 In order to solve the above problems, the control method of the interactive device end talks according to the present invention includes the steps of performing speech recognition on the speech utterance of a speaker, the recognition result further obtained results of the speech recognition In a message unit, a step of determining whether or not the speaker has finished speaking only by detecting whether or not a specific phrase used at the end of speaking is detected, and if it is determined that the speaker has finished speaking, When the response section is caused to make a response according to the utterance content until the speaker finishes speaking, and when it is determined that the speaker has not finished speaking, a response indicating that the speaker is listening is given. look including the steps of: causing the response unit, until it is determined that the speaker has finished speaking, the recognition result accumulating messages, when it is determined that the speaker has finished speaking, the message the storage The response unit is caused to make a response in accordance with the meaning and content of the recognition result message determined to be the end of the talk of the speaker .

また、上記課題を解決するため、本発明に係るプログラムは、コンピュータを上記の対
話装置として機能させる。 Also, to solve the above problems, the program according to the present invention causes a computer to function as the interactive device.

本発明に係る対話装置、対話装置の制御方法およびプログラムによれば、発話者に対してより適切な応答を行うことができる。 According to the dialogue device, the control method of the dialogue device, and the program according to the present invention, it is possible to make a more appropriate response to the speaker.

本発明の第１の実施形態に係る対話装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the interactive apparatus which concerns on the 1st Embodiment of this invention. 図１に示す対話装置の動作を概念的に示す図である。It is a figure which shows notionally the operation|movement of the interactive apparatus shown in FIG. 本発明の第２の実施形態に係る対話装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the interactive apparatus which concerns on the 2nd Embodiment of this invention.

以下、本発明を実施するための形態について、図面を参照しながら説明する。 Hereinafter, modes for carrying out the present invention will be described with reference to the drawings.

（第１の実施形態）
図１は、本発明の第１の実施形態に係る対話装置１０の構成例を示すブロック図である。本実施形態に係る対話装置１０は、発話者が発した音声が入力され、入力された音声に応じて、発話者に対して応答を行うものである。なお、以下では、本実施形態に係る対話装置１０は、例えば、人型のロボット装置などに搭載され、あるいは、ロボット装置と一体的に構成され、ロボット装置を制御して、音声出力およびロボット装置を動作させることで、ロボット装置に発話する発話者に対して応答を行うものであるとする。 (First embodiment)
FIG. 1 is a block diagram showing a configuration example of a dialogue apparatus 10 according to the first embodiment of the present invention. The dialogue device 10 according to the present embodiment receives a voice uttered by a speaker and responds to the speaker according to the input voice. In the following, the dialogue device 10 according to the present embodiment is mounted on, for example, a humanoid robot device or is integrally configured with the robot device, and controls the robot device to output the voice and the robot device. Is operated to respond to a speaker who speaks to the robot apparatus.

図１に示す対話装置１０は、話し終わり判定部１１と、応答部１２と、応答制御部１５とを備える。 The dialogue device 10 illustrated in FIG. 1 includes a talk end determination unit 11, a response unit 12, and a response control unit 15.

話し終わり判定部１１は、発話者が発した音声が入力されると、入力音声に対して音声認識を行い、入力音声のテキスト化を行う。そして、話し終わり判定部１１は、音声認識により得られた入力音声をテキスト化した単位メッセージ（認識結果メッセージ）毎に、発話者が話し終わったか否かを判定し、判定結果を応答制御部１５に出力する。話し終わり判定部１１は、例えば、発話者が音声を発しない無音期間が所定時間以上続くことを示す認識結果メッセージが得られると、発話者が対話装置１０からの応答を待っている、すなわち、発話者が話し終えたと判定する。また、話し終わり判定部１１は、例えば、話し終わりに用いられることが多い語句（例えば、「〜でしょうか」、「〜ですが」など）が検出された場合に、発話者が話し終えたと判定する。音声認識の結果に基づく、発話者が話し終わったか否かの判定方法は、これらに限られるものではなく、種々の方法を用いることができる。 When the voice uttered by the speaker is input, the end-of-speech determination unit 11 performs voice recognition on the input voice and converts the input voice into text. Then, the end-of-speech determination unit 11 determines whether or not the speaker has finished speaking for each unit message (recognition result message) in which the input voice obtained by the voice recognition is converted into a text, and the response control unit 15 determines the determination result. Output to. For example, when the recognition result message indicating that the silent period in which the speaker does not speak continues for a predetermined time or more is obtained, the talk end determination unit 11 waits for a response from the dialogue device 10, that is, It is determined that the speaker has finished speaking. Further, the end-of-speech determination unit 11 determines that the speaker has finished speaking, for example, when a phrase that is often used at the end of the conversation (for example, “...?”, “... is”) is detected. To do. The method of determining whether or not the speaker has finished speaking based on the result of voice recognition is not limited to these, and various methods can be used.

応答部１２は、応答制御部１５の制御に従い、音声出力およびロボット装置の動作などにより発話者に対する応答を行う。応答部１２は、音声合成部１３と、動作部１４とを備える。 Under the control of the response control unit 15, the response unit 12 responds to the speaker by voice output and operation of the robot device. The response unit 12 includes a voice synthesis unit 13 and an operation unit 14.

音声合成部１３は、発話者に対する応答として音声出力を行うために、ロボット装置の音声出力部（図示せず）から出力する音声の音声合成を行う。そして、音声合成部１３は、合成音声を音声出力部から出力させる。 The voice synthesis unit 13 performs voice synthesis of the voice output from the voice output unit (not shown) of the robot apparatus in order to output voice as a response to the speaker. Then, the voice synthesis unit 13 causes the voice output unit to output the synthesized voice.

動作部１４は、発話者に対する応答としてロボット装置を動作させるために、ロボット装置の動作を制御する動作指令を生成し、ロボット装置を動作させる動作機構に出力する。なお、発話者に対する応答は、上述した音声出力部からの音声出力、ロボット装置の動作だけに限られない。例えば、ロボット装置に表示部を設け、発話者に対する応答をロボット装置の表示部に表示するようにしてもよい。 The operation unit 14 generates an operation command for controlling the operation of the robot device in order to operate the robot device in response to the speaker, and outputs the operation command to the operation mechanism for operating the robot device. The response to the speaker is not limited to the voice output from the voice output unit and the operation of the robot device described above. For example, a display unit may be provided in the robot device and a response to the speaker may be displayed on the display unit of the robot device.

応答制御部１５は、話し終わり判定部１１の判定結果に応じて、応答部１２による発話者への応答を制御する。 The response control unit 15 controls the response of the response unit 12 to the speaker according to the determination result of the talk end determination unit 11.

具体的には、応答制御部１５は、話し終わり判定部１１により、発話者が話し終わったと判定されるまで、認識結果メッセージ（部分発話内容）を順次蓄積していき、発話者が話し終わったと判定されると、それまでに蓄積した認識結果メッセージと現在の認識結果メッセージ（話し終わりであると判定された認識結果メッセージ）とをまとめて発話内容として取得する。そして、応答制御部１５は、取得した発話内容を予め定められた対話ルールに基づき評価し、対話ルールで記述された条件に応じた応答を応答部１２に行わせる。 Specifically, the response control unit 15 sequentially accumulates recognition result messages (partial utterance contents) until the talk end determination unit 11 determines that the talker has finished speaking, and the speaker finishes speaking. When the determination is made, the recognition result message accumulated so far and the current recognition result message (the recognition result message determined to be the end of talking) are collectively acquired as the utterance content. Then, the response control unit 15 evaluates the acquired utterance content based on a predetermined dialogue rule, and causes the response unit 12 to make a response according to the condition described in the dialogue rule.

また、応答制御部１５は、話し終わり判定部１１により、認識結果メッセージに対して話し終わりでないと判定された場合には、相槌や頷きといった発話者の話を傾聴していることを示す応答を応答部１２に行わせる。 When the end-of-speech determination unit 11 determines that the end-of-speech is not the end of the recognition result message, the response control unit 15 sends a response indicating that the speaker is listening to a talk such as an azuchi or nod. The response unit 12 is made to perform.

図２は、本実施形態に係る対話装置１０の動作を概念的に示す図である。図２においては、無音期間、発話者による「えーっと」という発話、無音期間、「昨日引っ越したので、住所変更をしたいのですが」という発話からなる入力音声が対話装置１０に入力されたとする。 FIG. 2 is a diagram conceptually showing the operation of the dialogue apparatus 10 according to the present embodiment. In FIG. 2, it is assumed that an input voice is input to the dialog device 10 including a silent period, an utterance "Etto" by a speaker, a silent period, and "I want to change my address because I moved yesterday."

上述した入力音声に対する音声認識により、「無音」、「えーっと」、「無音」、「昨日、引越しをしたので、」および「住所変更をしたいのですが」という認識結果メッセージが得られたとする。従来の対話システムでは、「えーっと」、「昨日、引越しをしたので、」、「住所変更をしたいのですが」といった発話の区切り（認識結果メッセージ単位）で発話者の発話内容の解析が行われ、その解析結果に応じた応答が行われていた。そのため、例えば、「えーっと」、「昨日、引越しをしたので、」などの、発話者が用件を話し終わる前の認識結果メッセージに対して、「質問内容が不明です」などの発話者の意図に合致しない不適切な応答が行われることがあった。また、従来の対話システムでは、発話者の話の合間に相槌や頷きなどの、発話者の話を傾聴していることを示す応答が行われることが無く、発話者は自分の話が認識されているのか、不安を感じることがあった。 It is assumed that, by the voice recognition for the input voice described above, the recognition result messages "silence", "umm", "silence", "I moved yesterday," and "I want to change my address" are obtained. In the conventional dialogue system, the utterance content of the utterer is analyzed at the utterance delimiter (recognition result message unit) such as "umm," "I moved yesterday," and "I want to change my address." , The response according to the analysis result was performed. Therefore, for example, in response to a recognition result message before the speaker has finished speaking the message, such as "umm," "I moved yesterday," etc. Sometimes an incorrect response was made that did not match the. In addition, in the conventional dialogue system, the speaker does not receive a response such as an azure or nod, which indicates that he or she is listening to the speaker, and the speaker recognizes his/her own speech. I was a little worried.

本実施形態においては、対話装置１０は、発話者が話し終わっていないと判定した場合には、発話者の話を傾聴していることを示す応答（相槌や頷き）を行う。そして、対話装置１０は、発話者が話し終わったと判定すると、それまでの認識結果メッセージを纏めて発話内容を解析し、その解析結果に応じた応答を行う。 In the present embodiment, when the dialogue device 10 determines that the speaker has not finished speaking, the dialogue device 10 makes a response (azuki and nod) indicating that the speaker is listening. Then, when it is determined that the speaker has finished speaking, the dialog device 10 collects the recognition result messages up to that point, analyzes the utterance content, and makes a response according to the analysis result.

図２に示す例では、例えば、対話装置１０は、「えーっと」という認識結果メッセージに対して話し終わりでないと判定すると、例えば、発話者の話を傾聴していることを示す相槌（例えば、「はい」）を音声出力する。さらに、対話装置１０は、「昨日、引越しをしたので、」という認識結果メッセージに対して話し終わりでないと判定すると、例えば、発話者の話を傾聴していることを示すロボット装置の動作（例えば、頷き）を行う。また、対話装置１０は、話し終わりではないと判定した認識結果メッセージ（「えーっと」および「昨日、引越しをしたので、」）を順次蓄積する。 In the example illustrated in FIG. 2, for example, when the dialogue device 10 determines that the recognition result message “erm” is not the end of the conversation, for example, a dialogue indicating that the speaker is listening to the conversation (for example, “ Yes”) is output as voice. Furthermore, when the dialogue device 10 determines that the speech is not the end of the recognition result message "I moved because I moved yesterday", for example, the operation of the robot device indicating that the speaker is listening (for example, an operation). , Nod). Further, the dialogue device 10 sequentially accumulates the recognition result messages (“Eh” and “Because I moved yesterday,”) that is determined not to be the end of talking.

さらに、対話装置１０は、「住所変更をしたいのですが」という認識結果メッセージに対して話し終わりであると判定すると、これまでに蓄積した認識結果メッセージ（「えーっと」および「昨日、引越しをしたので、」）と、現在の認識結果メッセージ「住所変更をしたいのですが」とを纏めて発話内容として取得する。そして、対話装置１０は、取得した発話内容を解析し、発話内容に対する回答（例えば、住所変更の方法）を音声出力する。 Further, when the dialogue device 10 determines that the recognition result message “I want to change the address” is the end of the talking, the recognition result messages accumulated so far (“Em” and “Yesterday, I moved.”) Therefore, ")" and the current recognition result message "I want to change my address" are collected and acquired as the utterance content. Then, the dialog device 10 analyzes the acquired utterance content and outputs a response to the utterance content (for example, a method of changing the address) by voice.

このように本実施形態においては、対話装置１０は、発話者が話し終わったか否かを判定する話し終わり判定部１１と、話し終わり判定部１１により発話者が話し終わったと判定されると、発話者の話し終わりまでの発話内容に応じた応答を応答部１２に行わせ、話し終わり判定部１１により発話者が話し終わっていないと判定されると、発話者の話を傾聴していることを示す応答を応答部１２に行わせる応答制御部１５と、を備える。 As described above, in the present embodiment, the dialogue device 10 utters the talk when the talker determines that the talker has finished speaking, and when the talk end determination unit 11 determines that the talker has finished speaking. The response unit 12 is caused to make a response according to the utterance content of the speaker until the talk end, and when the talk end judgment unit 11 judges that the talker is not finished talking, it is determined that the talker is listening. A response control unit 15 that causes the response unit 12 to make the response shown.

発話者が話し終わったと判定すると、それまでの発話内容に応じた応答を行い、発話者が話し終わっていないと判定すると、発話者の話を傾聴していることを示す応答を行うことで、話し終わる前の中途半端な発話内容の解析結果に応じた応答が行われる可能性が低減し、また、発話者に対して話を傾聴していることを示すことができるので、発話者に対してより適切な応答を行うことができる。 When it is determined that the speaker has finished speaking, a response according to the content of the utterance up to that point is performed, and when it is determined that the speaker has not finished speaking, a response indicating that the speaker is listening is performed. It reduces the possibility that a response will be made according to the analysis result of the half-finished utterance content before the end of the conversation, and since it can be shown to the speaker that he/she is listening, Can provide a more appropriate response.

（第２の実施形態）
図３は、本発明の第２の実施形態に係る対話装置１０Ａの構成例を示す図である。図３において、図１と同様の構成には同じ符号を付し、説明を省略する。 (Second embodiment)
FIG. 3 is a diagram showing a configuration example of a dialogue device 10A according to the second embodiment of the present invention. In FIG. 3, the same components as those in FIG. 1 are designated by the same reference numerals and the description thereof will be omitted.

図３に示す対話装置１０Ａは、図１に示す対話装置１０と比較して、話し終わり判定部１１を話し終わり判定部１１Ａに変更した点が異なる。 The interactive device 10A shown in FIG. 3 differs from the interactive device 10 shown in FIG. 1 in that the end-of-speech determination unit 11 is changed to a end-of-speech determination unit 11A.

話し終わり判定部１１Ａは、発話時の発話者の音声および発話者を撮影した映像が入力され、入力された音声および映像に基づき、発話者が話し終わったか否かを判定する。例えば、話し終わり判定部１１Ａは、入力音声から発話者が音声を発しない無音期間が所定時間以上続いたことを検出し、かつ、発話時の発話者の映像から、発話者が口を閉じている期間が所定時間以上続いたことを検出すると、発話者が話し終えたと判定する。なお、話し終わり判定部１１Ａは、発話時の発話者の音声および発話者を撮影した映像のいずれか一方から、発話者が話し終えたか否かを判定してもよい。 The talk end determination unit 11A receives the voice of the speaker at the time of utterance and the video image of the speaker, and determines whether or not the speaker has finished speaking, based on the input voice and video. For example, the end-of-speech determination unit 11A detects from the input voice that the silent period in which the speaker does not emit a voice continues for a predetermined time or more, and the speaker closes his mouth from the image of the speaker at the time of utterance. When it is detected that the existing period continues for a predetermined time or more, it is determined that the speaker has finished speaking. Note that the talk end determination unit 11A may determine whether or not the talker has finished talking, from one of the voice of the talker at the time of the talk and the video image of the talker.

第１の実施形態においては、話し終わり判定部１１は、発話者の音声に対する音声認識の結果に基づき、発話者が話し終わったか否かを判定する。一方、本実施形態においては、話し終わり判定部１１Ａは、音声認識を行うことなく、発話者の発話時の音声的情報および視覚的情報の少なくとも一方に基づいて、発話者が話し終わったか否かを判定する。 In the first embodiment, the end-of-speech determination unit 11 determines whether or not the speaker has finished speaking, based on the result of voice recognition of the voice of the speaker. On the other hand, in the present embodiment, the talk end determination unit 11A determines whether or not the talker has finished talking based on at least one of the voice information and the visual information at the time of the talker's utterance, without performing voice recognition. To judge.

応答制御部１５は、第１の実施形態と同様に、発話者の音声に対する音声認識により得られる認識結果メッセージから発話内容を取得し、取得した発話内容を解析し、解析結果に応じた応答を応答部１２に行わせる。 Similarly to the first embodiment, the response control unit 15 acquires the utterance content from the recognition result message obtained by voice recognition of the voice of the speaker, analyzes the acquired utterance content, and gives a response according to the analysis result. The response unit 12 is made to perform.

なお、上述した第１および第２の実施形態においては、発話者に対する応答として音声出力およびロボット装置の動作を行う例を用いて説明したが、これに限られるものではなく、例えば、音声出力のみであってもよい。この場合、コールセンタにおける顧客との対話などに本発明を適用することができる。 In the above-described first and second embodiments, an example in which voice output and the operation of the robot device are performed as a response to the speaker has been described, but the present invention is not limited to this. For example, only voice output is performed. May be In this case, the present invention can be applied to conversations with customers in a call center.

また、発話者が発した音声に対する音声認識には、誤りが発生することがある。そこで、単語の音声認識による認識結果として複数の候補を用意するＮ−ｂｅｓｔ法を用いた処理を行ってもよい。 In addition, an error may occur in the voice recognition of the voice uttered by the speaker. Therefore, a process using the N-best method that prepares a plurality of candidates as the recognition result by the voice recognition of a word may be performed.

実施形態では特に触れていないが、対話装置１０，１０Ａとして機能するコンピュータが行う各処理を実行するためのプログラムが提供されてもよい。また、プログラムは、コンピュータ読取り可能媒体に記録されていてもよい。コンピュータ読取り可能媒体を用いれば、コンピュータにインストールすることが可能である。ここで、プログラムが記録されたコンピュータ読取り可能媒体は、非一過性の記録媒体であってもよい。非一過性の記録媒体は、特に限定されるものではないが、例えば、ＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなどの記録媒体であってもよい。 Although not particularly mentioned in the embodiment, a program for executing each process performed by the computer functioning as the dialogue device 10 or 10A may be provided. Further, the program may be recorded in a computer-readable medium. The computer readable medium can be used for installation on a computer. Here, the computer-readable medium in which the program is recorded may be a non-transitory recording medium. The non-transitory recording medium is not particularly limited, but may be a recording medium such as a CD-ROM or a DVD-ROM.

上述の実施形態は代表的な例として説明したが、本発明の趣旨および範囲内で、多くの変更および置換が可能であることは当業者に明らかである。したがって、本発明は、上述の実施形態によって制限するものと解するべきではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。例えば、実施形態の構成図に記載の複数の構成ブロックを１つに組み合わせたり、あるいは１つの構成ブロックを分割したりすることが可能である。 Although the above embodiments have been described as representative examples, it will be apparent to those skilled in the art that many modifications and substitutions can be made within the spirit and scope of the present invention. Therefore, the present invention should not be construed as being limited by the above-described embodiments, and various modifications and changes can be made without departing from the scope of the claims. For example, it is possible to combine a plurality of configuration blocks described in the configuration diagram of the embodiment into one or to divide one configuration block.

１０，１０Ａ対話装置
１１，１１Ａ話し終わり判定部
１２応答部
１３音声合成部
１４動作部
１５応答制御部 10, 10A Dialogue device 11, 11A End-of-speech determination unit 12 Response unit 13 Speech synthesis unit 14 Operation unit 15 Response control unit

Claims

発話者に対する応答を行う応答部を備えた対話装置であって、
発話者の発話の音声に対して音声認識を行い、前記音声認識の結果により得られた認識結果メッセージ単位において、話し終わりに用いられる特定の語句が検出されるか否かのみにより、前記発話者が話し終わったか否かを判定する話し終わり判定部を備える話し終わり判定装置と、
前記話し終わり判定装置により前記発話者が話し終わったと判定されると、前記発話者の話し終わりまでの発話内容に応じた応答を前記応答部に行わせ、前記話し終わり判定装置により前記発話者が話し終わっていないと判定されると、前記発話者の話を傾聴していることを示す応答を前記応答部に行わせる応答制御部と、を備え、
前記応答制御部は、前記判定部により前記発話者が話し終わったと判定されるまで、前記認識結果メッセージを蓄積し、前記判定部により前記発話者が話し終わったと判定されると、前記蓄積したメッセージと前記発話者の話し終わりであると判定された認識結果メッセージとをまとめた意味内容に応じた応答を前記応答部に行わせることを特徴とする対話装置。 A dialogue device having a response unit for making a response to a speaker,
Speech recognition is performed on the speech of the speaker, and in the recognition result message unit obtained by the result of the speech recognition, the speaker is determined only by whether or not a specific phrase used at the end of the speech is detected. An end-of-speech determination device that includes an end-of-speech determination unit that determines whether or not
When it is determined that the speaker has finished speaking by the end-of-speech determination device, the response unit is caused to make a response according to the content of the utterance until the end of the speech of the speaker, and the speaker is determined by the end-of-speak determination device. When it is determined that the talk is not finished, a response control unit that causes the response unit to make a response indicating that the speaker is listening to the talk ,
The response control unit stores the recognition result message until the determination unit determines that the speaker has finished speaking, and when the determination unit determines that the speaker has finished speaking, the stored message dialogue system according to claim Rukoto the responses for semantic content of at end talk to the determined recognition result summarizing the message the speaker to perform the response unit and.

発話者に対する応答を行う応答部を備えた対話装置の制御方法であって、
発話者の発話の音声に対して音声認識を行うステップと、
前記音声認識の結果により得られた認識結果メッセージ単位において、話し終わりに用いられる特定の語句が検出されるか否かのみにより、前記発話者が話し終わったか否かを判定するステップと、
前記発話者が話し終わったと判定されると、前記発話者の話し終わりまでの発話内容に応じた応答を前記応答部に行わせ、前記発話者が話し終わっていないと判定されると、前記発話者の話を傾聴していることを示す応答を前記応答部に行わせるステップと、を含み、
前記発話者が話し終わったと判定されるまで、前記認識結果メッセージを蓄積し、前記発話者が話し終わったと判定されると、前記蓄積したメッセージと前記発話者の話し終わりであると判定された認識結果メッセージとをまとめた意味内容に応じた応答を前記応答部に行わせることを特徴とする制御方法。 A method for controlling a dialogue device including a response unit for responding to a speaker, comprising:
A step of performing voice recognition on the voice of the speaker's utterance,
In the recognition result message units more obtained result of speech recognition, the only whether certain words used in the end talk is detected, determining whether the speaker has finished speaking,
When it is determined that the speaker has finished speaking, the response unit is caused to make a response according to the content of the utterance until the speaker finishes speaking, and when it is determined that the speaker has not finished speaking, the utterance viewed contains a step to perform, to the response unit a response indicating that you are listening to talking about's
The recognition result message is accumulated until it is determined that the speaker has finished speaking, and when it is determined that the speaker has finished speaking, the accumulated message and recognition determined to be the end of speaking by the speaker A control method comprising causing the response unit to make a response in accordance with the meaning and content in which a result message is collected .

コンピュータを請求項１に記載の対話装置として機能させるためのプログラム。 A program for causing a computer to function as the dialog device according to claim 1 .