JP2008249893A

JP2008249893A - Speech response device and its method

Info

Publication number: JP2008249893A
Application number: JP2007089640A
Authority: JP
Inventors: Kazuhiko Abe; 一彦阿部
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2007-03-29
Filing date: 2007-03-29
Publication date: 2008-10-16

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech response device capable of performing response corresponding to user's action. <P>SOLUTION: The speech response device 10 comprises an input acquiring section 101, a response content generation section 102, a response output section 103, a peripheral sound acquiring section 104 and a listening status judging section 105. User's speech, response speech and environment sound included in peripheral sound information are analyzed. (1) When the user answers the speech response, new speech response to the answer is generated. (2) When the user does not hear the response speech, the speech response is repeated. (3) When the user cannot answer the speech response, the speech response is suspended. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音声応答機能を持つテレビなどの家電機器に好適な音声応答装置及びその方法に関する。 The present invention relates to a voice response apparatus and method suitable for home appliances such as a television having a voice response function.

近年、音声インターフェイスを備えた一般家庭向けの製品が数多く商品化されている。特に、音声による使用方法のガイダンスにより、複雑な操作が必要となる家電機器も使いやすくなり、様々な機能を有効に活用することができるようになってきている。しかしながら、必ずしも利用者がガイダンスなどのシステム応答を聞き取り、適切な入力を行う状況であるとは限らず、利用者の状況に応じた応答処理を行う技術が必要となる。 In recent years, many products for households equipped with a voice interface have been commercialized. In particular, the usage guidance by voice makes it easier to use home appliances that require complicated operations, and various functions can be used effectively. However, this is not necessarily the situation where the user listens to a system response such as guidance and makes an appropriate input, and a technique for performing a response process according to the situation of the user is required.

そこで、特許文献１では、利用者入力の複雑さや入力までの時間などから利用者の作業負荷を推定してシステムの応答内容を変更する処理方法が提案されている。 Therefore, in Patent Document 1, a processing method is proposed in which the user's workload is estimated from the complexity of user input and the time until input, and the response content of the system is changed.

また、特許文献２では、音声対話インタフェースを持つカーナビゲーションシステムなどにおいて、車両情報より利用者の状況を判断し、応答処理を変更する機能が提案されている。 Patent Document 2 proposes a function of determining a user's situation from vehicle information and changing a response process in a car navigation system having a voice interaction interface.

また、特許文献３では、聞き手の「笑い声」や「拍手」などの情報によって応答内容を変更する機能が提案されている。 Further, Patent Document 3 proposes a function of changing the response content according to information such as “laughter” or “applause” of the listener.

また、特許文献４では利用者の入力音声から、その雑音レベルを推定し、応答内容を変更する方式が提案されている。 Patent Document 4 proposes a method of estimating the noise level from the input voice of the user and changing the response content.

また、特許文献５では、装置の音声出力に対して利用者が一定時間回答や操作の実行を行わなかった場合には、再度、入力を促す応答を出力する応答処理を行うことにより、利用者の回答を支援する方式が提案されている。
特開２００３−１０８１９１公報特開２００４−２３３６７６公報特開２００４− ２４８６７公報米国公開特許２００２−９５２９５公報特開２００５−２２０６５公報 Further, in Patent Document 5, when the user does not reply or execute an operation for a certain period of time with respect to the sound output of the apparatus, the user performs a response process for outputting a response prompting the user again, thereby performing A method to support the answer is proposed.
JP 2003-108191 A JP 2004-233676 A JP 2004-24867 A US Published Patent 2002-95295 Japanese Patent Laid-Open No. 2005-22065

上記の背景技術のような音声応答機能や音声対話機能を備えた家電機器においては、音声応答出力時は、利用者がその応答を聞き、必要な回答や操作を実行できる状態であることが前提となっている。 For home appliances with voice response function and voice interaction function as in the background art above, it is assumed that the user can listen to the response and execute the necessary answer or operation when outputting the voice response It has become.

しかし、家庭内における利用状況を考えると、家庭内の様々な雑音により、装置の応答音声を利用者が聞き取れない状況や、装置とのやり取り以外の行為を利用者が行っていて装置の応答に対する回答や操作の実行ができない状況を考慮しなければならない。 However, considering the usage situation in the home, various noises in the home cause the situation where the user cannot hear the response voice of the device, or the user is performing actions other than the exchange with the device and the response to the device response You must consider situations where you cannot perform answers or operations.

例えば、音声応答機能を持つテレビの操作を考えると、音声応答が出力されたときに、利用者の周辺において他の家電機器の動作音があり音声応答装置の音声が聞こえない場合が考えられる。また、応答出力が行われている途中で利用者が携帯電話で話し始めるなど、装置の応答内容に対して回答や操作の実行ができない場合が考えられる。 For example, when considering an operation of a television having a voice response function, when a voice response is output, there may be a case where there is an operation sound of other home appliances around the user and the voice of the voice response device cannot be heard. In addition, there may be a case where an answer or operation cannot be executed with respect to the response content of the apparatus, such as when the user starts talking on the mobile phone while the response output is being performed.

このように、応答出力が聞こえず利用者が入力できない場合において、音声応答装置が処理を中断せずに進めてしまうと、利用者にとって意図しない動作が実行されることがある。また、音声応答装置に関係しない行為を行っている場合に応答出力を繰り返し行うと、利用者には不要な応答出力が必要となる。 As described above, when the response output cannot be heard and the user cannot input, if the voice response device proceeds without interrupting the process, an operation unintended for the user may be executed. Further, if the response output is repeated when an action not related to the voice response device is performed, an unnecessary response output is required for the user.

そこで、本発明は、上記問題点に鑑み利用者の行動に対応した反応が可能な音声応答装置及びその方法を提供する。 In view of the above problems, the present invention provides a voice response device capable of reacting to a user's action and a method thereof.

本発明は、利用者に対して音声応答する音声応答装置において、前記音声応答を行った時の利用者の周囲の音に関する周囲音情報を取得する周囲音声取得部と、前記周囲音情報に含まれる利用者音声、応答音声、環境音を分析し、（１）前記利用者が前記応答音声の回答している状態、（２）前記利用者に前記応答音声が聞こえていない状態、または、（３）前記利用者が前記音声応答に回答できない状態かを判断する聴取状態判断部と、（１）前記利用者が前記音声応答に回答している場合は、前記回答に対する新たな音声応答を生成し、（２）前記利用者に前記応答音声が聞こえていない場合は、前記音声応答を繰り返し、（３）前記利用者が前記音声応答に回答できない場合は、前記音声応答を中断する応答内容生成部と、を具備する音声応答装置である。 The present invention provides a voice response device for voice response to a user, an ambient voice acquisition unit that acquires ambient sound information related to sounds around the user when the voice response is made, and the ambient sound information (1) a state in which the user answers the response voice, (2) a state in which the user does not hear the response voice, or ( 3) a listening state determination unit that determines whether the user cannot answer the voice response; and (1) if the user answers the voice response, a new voice response is generated for the answer. (2) If the user does not hear the response voice, repeat the voice response. (3) If the user cannot answer the voice response, generate a response content that interrupts the voice response. And comprising It is a voice response unit.

本発明によれば、利用者の聴取状態を判断することにより利用者が音声応答装置への回答が困難な場合でも、利用者の状況に柔軟な応答の制御が可能となる。 According to the present invention, even when it is difficult for the user to answer the voice response device by determining the listening state of the user, it is possible to control the response flexibly according to the situation of the user.

以下図面を参照して、本発明の一実施形態に係る音声応答装置について説明する。 A voice response device according to an embodiment of the present invention will be described below with reference to the drawings.

（第１の実施形態）
本発明の第１の実施形態に係る音声応答装置１０について図１〜図８に基づいて説明する。 (First embodiment)
A voice response device 10 according to a first embodiment of the present invention will be described with reference to FIGS.

（１）音声応答装置の構成
図１は、本実施形態に係る音声応答装置１０の構成例である。音声応答装置１０は、入力取得部１０１、応答内容生成部１０２、応答出力部１０３、周囲音取得部１０４、聴取状態判断部１０５から構成される。 (1) Configuration of Voice Response Device FIG. 1 is a configuration example of a voice response device 10 according to the present embodiment. The voice response device 10 includes an input acquisition unit 101, a response content generation unit 102, a response output unit 103, an ambient sound acquisition unit 104, and a listening state determination unit 105.

（１−１）入力取得部１０１
入力取得部１０１は、利用者からの入力を取得する。例えば、マイクなどによって利用者の発した音声信号を取り込み、音声判断処理を行い、テキスト情報に変換して出力する。あるいは、リモートコントローラの赤外線信号などを受信し、受信内容を出力する。 (1-1) Input acquisition unit 101
The input acquisition unit 101 acquires input from the user. For example, a voice signal emitted by a user is captured by a microphone, etc., voice determination processing is performed, converted into text information, and output. Alternatively, it receives an infrared signal from a remote controller and outputs the received content.

（１−２）応答内容生成部１０２
応答内容生成部１０２は、入力取得部１０１で取得した利用者の入力情報に基づき、応答内容を決定する。 (1-2) Response content generation unit 102
The response content generation unit 102 determines the response content based on the user input information acquired by the input acquisition unit 101.

応答内容生成部１０２は、アプリケーションに入力情報を出力し、アプリケーションがデータベースシステムの場合は、入力された検索要求に対応する検索結果を取得し、検索結果に基づき、応答内容を生成する。 The response content generation unit 102 outputs input information to the application. When the application is a database system, the response content generation unit 102 acquires a search result corresponding to the input search request, and generates a response content based on the search result.

また、応答内容生成部１０２は利用者の入力の有無に係らず、設定した時刻やアプリケーションにおいて状況が変化した場合に、応答音声が必要となる場合は、応答内容を生成し出力することができる。 In addition, the response content generation unit 102 can generate and output a response content when a response voice is required when the situation changes in the set time or application regardless of whether or not the user has input. .

（１−３）応答出力部１０３
応答出力部１０３は、応答内容生成部２にて生成した応答内容を、例えば、テキストとして取得し、合成音に変換して出力する。なお、出力する音声は、予め録音をしていた音声を再生するだけでもよい。 (1-3) Response output unit 103
The response output unit 103 acquires the response content generated by the response content generation unit 2 as, for example, text, converts it into synthesized sound, and outputs it. It should be noted that the output sound may be simply reproduction of a previously recorded sound.

（１−４）周囲音取得部１０４
周囲音取得部１０４は、利用者のそばにあるマイク、利用者のそばにあるリモートコントローラなどの装置に付属されたマイク、利用者の傍にいるロボットに付属されたマイク、または、壁や天井に備え付けられたマイクに入力された周囲音情報を取得する。 (1-4) Ambient sound acquisition unit 104
The ambient sound acquisition unit 104 may be a microphone near the user, a microphone attached to a device such as a remote controller near the user, a microphone attached to a robot near the user, or a wall or ceiling. Ambient sound information input to the microphone provided in the is acquired.

（１−５）聴取状態判断部１０５
聴取状態判断部１０５は、周囲音取得部１０４の出力である周囲音情報を分析し、分析結果を利用して予め決められた聴取状態の判断基準に基づき利用者の聴取状態を判断し、その判断結果を聴取状態判断結果として出力する。 (1-5) Listening state determination unit 105
The listening state determination unit 105 analyzes the ambient sound information that is the output of the ambient sound acquisition unit 104, determines the listening state of the user based on a predetermined determination criterion of the listening state using the analysis result, and The determination result is output as a listening state determination result.

（２）聴取状態判断方法
聴取状態の判断は、利用者の周囲音の分析を行い、決めたルールに基づき聴取判断内容を決定する。以下、具体例を説明する。 (2) Listening state determination method The listening state is determined by analyzing the ambient sounds of the user and determining the listening determination content based on the determined rules. Specific examples will be described below.

（２−１）第１の例
図２は、聴取状態判断方法の第１の例を示した図である。 (2-1) First Example FIG. 2 is a diagram illustrating a first example of the listening state determination method.

聴取状態判断部１０５では、応答出力部１０３より出力された応答音声の発声開始時刻及び発声終了時刻情報を取得し、取得した応答音声の発声開始時刻から応答音声の発声終了時刻までの周囲音情報を分析し、平均音量情報を取得する。平均音量が基準値を上回った場合には、聴取状態が「聞こえている」とし平均音量が一定値を下回った場合には聴取状態が「聞こえていない」とし、この聴取状態判断結果を出力する。この基準値は、予め決められた値でもよく、また、過去の装置利用時において、応答音声に対する利用者の回答が行われた時点の周囲音情報における装置応答音声の音量の平均値を元に算出してもよい。 The listening state determination unit 105 acquires the utterance start time and utterance end time information of the response sound output from the response output unit 103, and ambient sound information from the utterance start time of the acquired response sound to the utterance end time of the response sound To obtain average volume information. When the average volume exceeds the reference value, the listening state is “sounding”, and when the average volume is below a certain value, the listening state is “not heard”, and this listening state determination result is output. . This reference value may be a predetermined value, or based on the average value of the volume of the device response voice in the ambient sound information at the time when the user's response to the response voice was made in the past use of the device. It may be calculated.

なお、本実施形態では、平均音量を利用したが、周囲音情報より、ノイズレベルを計測し、所定の値を上回った場合には、「聞こえていない」とし、聴取状態判断結果としても良い。例えば、音声信号とノイズレベルの比率を計算し、その値が０ｄB以上であった場合には、「聞こえていない」と判断してもよい。 In the present embodiment, the average sound volume is used. However, when the noise level is measured from the ambient sound information and exceeds a predetermined value, “not audible” may be determined, and the listening state determination result may be used. For example, the ratio between the audio signal and the noise level may be calculated, and if the value is 0 dB or more, it may be determined that “not audible”.

（２−２）第２の例
図３は、聴取状態判断方法の第２の例を示した図である。 (2-2) Second Example FIG. 3 is a diagram showing a second example of the listening state determination method.

聴取状態判断部１０５では、周囲音情報より、発声区間及び発声区間の発声者を取得する。すなわち、応答出力部１０３より出力された応答音声の発声時刻及び発声終了時刻情報を取得し、応答音声の発声時間内において、利用者自身の発声と利用者以外の発声が含まれていることが確認された場合には、「利用者が他の話者に話していて回答できない」という判断を行い、聴取状態判断結果として出力する。 The listening state determination unit 105 acquires the utterance section and the utterer of the utterance section from the ambient sound information. That is, the utterance time and utterance end time information of the response sound output from the response output unit 103 is acquired, and the user's own utterance and utterances other than the user are included within the utterance time of the response sound. If it is confirmed, a determination is made that “the user is speaking to another speaker and cannot answer”, and the result is output as a listening state determination result.

また、応答音声の発声時間内において、利用者音声区間の割合が例えば５０%といった、ある値より大きかった場合には、「利用者が他の話者と話していて回答できない」という判断をしてもよい。 Also, if the percentage of the user voice interval is larger than a certain value, for example, 50% within the response voice utterance time, it is judged that the user is speaking with another speaker and cannot answer. May be.

（２−３）第３の例
図４は、聴取状態判断方法の第３の例を示した図である。 (2-3) Third Example FIG. 4 is a diagram showing a third example of the listening state determination method.

聴取状態判断部１０５では、周囲音情報より、発声区間及び発声区間の発声者を取得する。すなわち、応答出力部１０３より出力された応答音声の発声時刻及び発声終了時刻情報を取得し、周囲音情報より取得した利用者の発声区間に、応答音声の発声開始時刻、または、発声終了時刻が含まれている場合には、「利用者が他の話者と話していて回答できない」と判断し、判断結果を出力する。 The listening state determination unit 105 acquires the utterance section and the utterer of the utterance section from the ambient sound information. That is, the utterance time and utterance end time information of the response voice output from the response output unit 103 is acquired, and the utterance start time of the response voice or the utterance end time is included in the utterance section of the user acquired from the ambient sound information. If it is included, it is determined that “the user is talking to another speaker and cannot answer”, and the determination result is output.

（２−４）第４の例
図５は、聴取状態判断方法の第４の例を示した図である。 (2-4) Fourth Example FIG. 5 is a diagram showing a fourth example of the listening state determination method.

聴取状態判断部１０５では、周囲音情報より、装置利用に関係しない人間の発声以外の音の有無を判定する。有無を判断する音は、物がぶつかる時の音などの人間の発声以外の音や、電話やインターホンの呼び出し音など装置利用環境で発生しうる音で、過去の装置利用時には現れなかった音や、過去の装置利用時において、応答装置に対する回答が行なわれなかった場合に発生していた音などである。 The listening state determination unit 105 determines the presence / absence of sounds other than human utterances that are not related to the use of the device from ambient sound information. Sounds that determine presence / absence are sounds that occur in the device usage environment, such as sounds other than human speech, such as the sound of an object colliding, and sounds that may have occurred in the device usage environment, such as telephone and intercom ringing sounds. The sound generated when no response is made to the response device when the device is used in the past.

このような人間の発声以外の音の音響的な特徴が周囲音情報に含まれている場合には、装置利用に関連しない特定の音があると判断する。 When the ambient sound information includes such acoustic features other than human speech, it is determined that there is a specific sound that is not related to the use of the device.

すなわち、応答出力部１０３より出力された応答音声の発声時刻及び発声終了時刻情報を取得し、応答音声の発声時間内あるいは、発声開始時刻の１０秒前から発声終了時刻までといった一定の区間内において装置利用に関連しない特定の音が検出された場合には、利用者が「他のことをしていて回答できない」と判断し、判断結果を出力する。 That is, the utterance time and utterance end time information of the response voice output from the response output unit 103 is acquired, and within the utterance time of the response voice or within a certain interval such as from 10 seconds before the utterance start time to the utterance end time. When a specific sound that is not related to the use of the device is detected, the user determines that “other things cannot be answered” and the determination result is output.

（３）応答内容の生成
聴取状態判断部１０５によって取得された聴取状態判断結果に基づき、応答内容生成部１０２は、応答内容を生成する。 (3) Generation of Response Content Based on the listening state determination result acquired by the listening state determination unit 105, the response content generation unit 102 generates response content.

「聞こえている」と判断され、利用者からその回答があった場合には、その回答に対する応答を生成する。 If it is determined that the user is “sounding” and a response is received from the user, a response to the response is generated.

一方、聴取状態が「聞こえていない」と判断された場合には、同一内容の応答内容を、音量を大きくして出力するか、聞こえやすいように音質（例えば、男性の声から女性の声）を変えて出力する。 On the other hand, if it is determined that the listening state is “not heard”, the response content with the same content is output at an increased volume, or the sound quality (for example, a male voice to a female voice) is easily heard. Change the output.

また、聴取状態が「利用が他の話者と話していて回答できない」あるいは「他のことをしていて回答できない」と判断された場合には、音声応答装置１０の中断を示す応答内容を生成し、出力すると共に、音声応答装置１０はその動作を中断する。 If it is determined that the listening state is “cannot answer because the user is talking to another speaker” or “cannot answer because he / she is doing other things”, the response content indicating interruption of the voice response device 10 is displayed. While generating and outputting, the voice response device 10 interrupts its operation.

なお、応答音声終了時刻から一定時間経過後、聴取状態判断結果に基づいた応答内容の生成を行うように設定してもよい。 In addition, you may set so that the response content based on a listening state judgment result may be produced | generated after progress for a fixed time from response voice end time.

また、音声応答装置１０の動作中断時においても、一定区間毎に利用者の聴取状態を判断し、「他のことをしていて回答できない」「利用者が他の話者と話していて回答できない」という判断がなされなくなった場合には、応答内容生成部１０２は再度中断状態が終了を示す応答内容を生成し出力すると共に、音声応答装置１０の動作を再開する。 Even when the operation of the voice response device 10 is interrupted, the user's listening status is determined for each predetermined section, and “other things cannot be answered” and “the user is speaking with another speaker and responding” When the determination “cannot be made” is no longer made, the response content generation unit 102 generates and outputs the response content indicating that the interrupted state is finished again, and restarts the operation of the voice response device 10.

以上のような聴取状態判断結果に基づく応答内容の生成は、予め聴取状態判断に対応した応答生成方法を規定しておくことで、実現することができる。 The generation of the response content based on the listening state determination result as described above can be realized by prescribing a response generation method corresponding to the listening state determination.

（４）処理手順
以下、音声応答装置１０の処理手順Ａの詳細に説明する。なお、図６は処理手順Ａの処理内容を説明するフローチャートである。 (4) Processing Procedure Hereinafter, the processing procedure A of the voice response device 10 will be described in detail. FIG. 6 is a flowchart for explaining the processing contents of processing procedure A.

ステップＡ１において、周囲音抽出を開始する。 In step A1, ambient sound extraction is started.

ステップＡ２において、音声応答装置１０が応答音声を出力する。 In step A2, the voice response device 10 outputs a response voice.

ステップＡ３において、応答音声出力後の聴取状態判断を行い、聴取状態判断結果が「聞こえている」と判断される場合にはステップＡ４へ進み、聴取状態判断結果が「聞こえていない」「他の人と話していて回答できない」「他のことをしていて回答できない」といった装置応答音声が伝達されていないと判断される場合にはステップＡ５へ進む。 In step A3, the listening state is determined after the response voice is output. If it is determined that the listening state determination result is “listening”, the process proceeds to step A4, and the listening state determination result is “not heard” or “other”. If it is determined that the device response voice such as “I cannot answer by talking to a person” or “I cannot answer by doing other things” is not transmitted, the process proceeds to Step A5.

ステップＡ４において、所定時間内に利用者の入力があった場合には、利用者の入力に対する処理を実行する。所定時間内に利用者の入力がなかった場合には再度応答内容を出力するためステップＡ２へ進む。 In step A4, if there is a user input within a predetermined time, processing for the user input is executed. If there is no user input within the predetermined time, the process proceeds to step A2 in order to output the response contents again.

ステップＡ５において、聴取状態判断結果が「聞こえていない」と判断された場合には、再度応答出力を行うためステップＡ２へ進む。この場合、応答出力の音量が大きくなるように、または、小さくなるように変更してもよい。また、聴取状態判断結果が「他の人と話していて回答できない」「他のことをしていて回答できない」などのように装置応答に対する入力を行うことが困難な聴取状態である場合には、応答装置の動作を中断する。 If it is determined in step A5 that the listening state determination result is “not audible”, the process proceeds to step A2 to output a response again. In this case, the response output volume may be increased or decreased. Also, if the listening state judgment result is a listening state in which it is difficult to input the device response, such as "I can't answer because I'm talking to another person" or "I can't answer because I'm doing other things" The operation of the response device is interrupted.

（５）具体例
上記処理について、音声応答装置１０を内蔵したテレビを具体例にして詳しく説明する。 (5) Specific Example The above process will be described in detail with a specific example of a television incorporating the voice response device 10.

図７は、音声応答装置１０を内蔵したテレビと利用者のやり取りの例であり、野球の結果をデータベースとして管理していて、利用者からの問い合わせに回答すると共に、関連する映像再生が可能なテレビを想定している。 FIG. 7 is an example of user interaction with a television with a built-in voice response device 10, in which baseball results are managed as a database, answers to inquiries from users, and related video playback is possible. A television is assumed.

図７に示した対話例では、装置利用時において、装置応答２の出力中に利用者が携帯電話で他の人と話し出し回答ができなくなった場面を想定したものであり、このような場面での処理について説明する。 The dialogue example shown in FIG. 7 assumes a situation in which the user cannot speak and answer with another person on the mobile phone while the device response 2 is being output. The process will be described.

音声応答装置１０は、利用者に「ホームランシーンを再生しますか」と問い合わせる装置応答２が再生終了後、聴取状態判断部１０５は周囲音声取得部１０３の出力を受け、聴取状態判断結果を出力する。 The voice response device 10 asks the user “Do you want to play the home run scene?” After the device response 2 finishes playing, the listening state determination unit 105 receives the output of the surrounding voice acquisition unit 103 and outputs the listening state determination result To do.

聴取状態を判断するため、ここでは、聴取状態判断ルールを用いている。図８は聴取状態判断ルールの一例である。 Here, in order to determine the listening state, a listening state determination rule is used. FIG. 8 is an example of the listening state determination rule.

例えば、携帯電話で話をしていて、応答音声再生中の利用者発声時間が基準値を超えた場合、聴取状態判断部１０５は聴取状態判断ルールに基づき「他の人と話していて回答できない」という判断結果として出力する。 For example, when talking on a mobile phone and the user utterance time during response voice playback exceeds a reference value, the listening state determination unit 105 determines that “Speaking with another person and answering is not possible” based on the listening state determination rule. Is output as a determination result.

「他の人と話していて回答できない」という聴取状態判断結果を取得した応答内容生成部１０２は、装置応答２に対する入力を行うことが困難として、応答処理を中断することにより、音声応答装置１０の不要な応答出力を中断することができる。また、このとき装置応答３として、「応答を中断します。」と応答してもよい。 The response content generation unit 102 that has acquired the listening state determination result “speaking with another person and cannot answer” makes it difficult to input the device response 2, and the voice response device 10 is interrupted by interrupting the response process. Unnecessary response output can be interrupted. Further, at this time, as the device response 3, “response is interrupted” may be replied.

応答処理の動作は中断するが、周囲音情報を一定間隔で取得し、利用者の発声が検出されなくなった場合は、中断していた応答処理を再開し、「Ａチームのホームランシーンが再生できます」といった応答出力を行うことにより、利用者が継続してテレビとのやり取りを続けることができる。 Although the response processing operation is interrupted, ambient sound information is acquired at regular intervals, and if the user's utterance is no longer detected, the interrupted response processing is resumed and “Team A's home run scene can be reproduced. The user can continue to interact with the television by performing a response output such as “Masu”.

なお、このような聴取状態判断結果に対する処理は予め規定されており、その処理方法に基づいて動作すればよく、聴取状態判断結果が「他の人と話していて回答できない」である場合でも、応答の重要性によっては、再度応答を繰り返すように処理を行っても良い。 It should be noted that the processing for such a listening state determination result is prescribed in advance, and it is only necessary to operate based on the processing method, and even when the listening state determination result is `` speaking with another person and cannot answer '', Depending on the importance of the response, processing may be performed so that the response is repeated again.

（６）効果
このように本実施形態によれば、利用者の聴取状態に応じた音声応答の制御が可能となり、利用者が音声応答装置への回答ができない状態になっても、音声応答装置１０が意図しない動作をせず利用者に余分な負担を生じない。 (6) Effect As described above, according to the present embodiment, it is possible to control the voice response according to the listening state of the user, and the voice response device even when the user cannot answer the voice response device. 10 does not operate unintentionally and does not cause an extra burden on the user.

また、利用者が音声応答装置１０とは関係のない行為を行っている場合に、音声応答装置１０が利用者の行為を阻害する応答を出力することがなく、利用者に負担をかけない。 In addition, when the user is performing an action unrelated to the voice response device 10, the voice response device 10 does not output a response that hinders the user's action, and the user is not burdened.

（第２の実施形態）
次に、本発明の第２の実施形態に係る音声応答装置２０について図９に基づいて説明する。 (Second Embodiment)
Next, a voice response device 20 according to a second embodiment of the present invention will be described with reference to FIG.

（１）音声応答装置の構成
図９は、本実施形態に係る音声応答装置２０の構成例である。 (1) Configuration of Voice Response Device FIG. 9 is a configuration example of the voice response device 20 according to the present embodiment.

音声応答装置２０は、入力取得部２０１、応答内容生成部２０２、応答出力部２０３から構成される。聴取状態判断装置２１は、周囲音取得部２１１、聴取状態判断部２１２から構成される。音声応答装置２０と聴取状態判断装置２１は、例えば無線ＬＡＮなどのネットワークで接続されており、必要な情報のやり取りが自由に行えるようになっている。 The voice response device 20 includes an input acquisition unit 201, a response content generation unit 202, and a response output unit 203. The listening state determination device 21 includes an ambient sound acquisition unit 211 and a listening state determination unit 212. The voice response device 20 and the listening state determination device 21 are connected via a network such as a wireless LAN so that necessary information can be exchanged freely.

（１−１）入力取得部２０１
入力取得部２０１は、例えば、マイクなどによって利用者の発した音声信号を取り込み、音声認識処理を行い、テキスト情報に変換して出力する。あるいは、リモートコントローラの赤外線信号などを受信し、受信内容を出力する。 (1-1) Input acquisition unit 201
For example, the input acquisition unit 201 captures a voice signal emitted by a user with a microphone or the like, performs voice recognition processing, converts it into text information, and outputs the text information. Alternatively, it receives an infrared signal from a remote controller and outputs the received content.

（１−２）応答内容生成部２０２
応答内容生成部２０２は、入力取得部２０１で取得した利用者の入力情報に基づき、応答内容を決定する。 (1-2) Response content generation unit 202
The response content generation unit 202 determines the response content based on the user input information acquired by the input acquisition unit 201.

例えば、応答内容生成部２０２は、アプリケーションに入力情報を出力し、例えば、アプリケーションがデータベースシステムの場合は、入力された検索要求に対応する検索結果を取得し、検索結果に基づき、応答内容を生成する。 For example, the response content generation unit 202 outputs input information to the application. For example, when the application is a database system, the response content generation unit 202 acquires a search result corresponding to the input search request, and generates a response content based on the search result. To do.

応答内容生成部２０２は利用者の入力の有無に係らず、設定した時刻やアプリケーションにおいて状況が変化した場合に、応答音声が必要となる場合は、応答内容を生成し出力することが可能でなる。 The response content generation unit 202 can generate and output the response content when a response voice is required when the situation changes at the set time or application regardless of whether or not the user has input. .

（１−３）応答出力部２０３
応答出力部２０３は、応答内容生成部２０２にて生成した応答内容を、例えば、テキストとして取得し、合成音に変換して出力する。なお、出力する音声は予め録音をしていた音声を再生するだけでもよい。 (1-3) Response output unit 203
The response output unit 203 acquires the response content generated by the response content generation unit 202 as, for example, text, converts it into synthesized sound, and outputs it. It should be noted that the output sound may be simply reproduction of a previously recorded sound.

（１−４）周囲音取得部２０４
周囲音取得部２０４は、利用者のそばにあるマイク、利用者の傍にいるロボットに付属されたマイク、または、壁や天井に備え付けられたマイクに入力された周囲音情報を取得する。 (1-4) Ambient sound acquisition unit 204
The ambient sound acquisition unit 204 acquires ambient sound information input to a microphone near the user, a microphone attached to the robot near the user, or a microphone provided on the wall or ceiling.

（１−５）聴取状態判断部２０５
聴取状態判断部２０５は、周囲音取得部１０４の出力である周囲音情報を分析し、分析結果を利用して予め決められた聴取状態の判断基準に基づき利用者の聴取状態を判断し、聴取状態判断結果を音声応答装置２０へと出力する。 (1-5) Listening state determination unit 205
The listening state determination unit 205 analyzes the ambient sound information that is the output of the ambient sound acquisition unit 104, determines the listening state of the user based on a predetermined determination criterion of the listening state using the analysis result, and listens The state determination result is output to the voice response device 20.

聴取状態判断部２０５において、聴取状態を判断するために音声応答装置２０の応答発声開始時刻が必要な場合は、音声応答装置２０より取得することが可能である。 When the listening state determination unit 205 needs the response utterance start time of the voice response device 20 to determine the listening state, it can be acquired from the voice response device 20.

なお、音声応答装置２０の発声開始時刻が取得できない場合でも、周囲音取得情報を分析し、音声応答装置２０からの応答音声区間を特定することにより聴取状態判断が可能となる。 Even when the voice start time of the voice response device 20 cannot be acquired, it is possible to determine the listening state by analyzing the ambient sound acquisition information and specifying the response voice section from the voice response device 20.

（２）応答の内容
応答内容生成部２０２は応答処理を出力後、聴取状態判断部２０５に聴取状態判断結果を問い合わせ、聴取状態判断結果に応じて応答処理内容を決定する。なお、応答内容生成部２０２は所定時間利用者からの入力が行われなかった場合に、聴取状態判断結果を取得してもよい。 (2) Response content The response content generation unit 202 outputs a response process, inquires the listening state determination unit 205 about the listening state determination result, and determines the response processing content according to the listening state determination result. Note that the response content generation unit 202 may acquire the listening state determination result when there is no input from the user for a predetermined time.

（３）効果
以上のような、本実施形態の構成により、音声応答装置２０が周囲音情報を取得するためのマイクを備えていなくても、聴取状態判断結果に基づいた応答内容生成の制御が可能となる。 (3) Effect With the configuration of the present embodiment as described above, the response content generation control based on the listening state determination result is possible even if the voice response device 20 does not include a microphone for acquiring ambient sound information. It becomes possible.

また、聴取状態判断部１０５の処理量が膨大である場合にも、音声応答装置２０には負荷がかからないため、音声応答装置２０の処理に影響与えることなく応答内容生成の制御が可能となる。 In addition, even when the processing amount of the listening state determination unit 105 is enormous, since the voice response device 20 is not loaded, the response content generation can be controlled without affecting the processing of the voice response device 20.

（変更例）
本発明は上記各実施形態に限らず、その主旨を逸脱しない限り種々に変更することができる。 (Example of change)
The present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the gist thereof.

上記実施形態は、音声応答機能を持つテレビとして説明したが、例えば、音声応答による受付を行うシステムやカーナビゲーションなどの音声応答機能を持つ機器において適用することが可能である。 Although the above embodiment has been described as a television having a voice response function, for example, the present invention can be applied to a system that accepts a voice response or a device having a voice response function such as car navigation.

また、上記実施形態は、利用者からの入力を音声による入力として説明したが、利用者入力がタッチパネルやリモコン入力であることを前提とした音声応答装置にも適用することができる。 Moreover, although the said embodiment demonstrated the input from a user as a voice input, it can be applied also to the voice response apparatus on the assumption that a user input is a touch panel or remote control input.

なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

本発明の第１の実施形態に係る音声応答装置のブロック図である。1 is a block diagram of a voice response device according to a first embodiment of the present invention. 聴取状態判断方法の第１の例を示した図である。It is the figure which showed the 1st example of the listening state judgment method. 聴取状態判断方法の第２の例を示した図である。It is the figure which showed the 2nd example of the listening state judgment method. 聴取状態判断方法の第３の例を示した図である。It is the figure which showed the 3rd example of the listening state judgment method. 聴取状態判断方法の第４の例を示した図である。It is the figure which showed the 4th example of the listening state judgment method. 第１の実施形態に係る処理手順Ａの処理内容を説明するフローチャートである。It is a flowchart explaining the processing content of the process procedure A which concerns on 1st Embodiment. 第１の実施形態に係る音声応答装置を内蔵したテレビと利用者のやり取りの例である。It is an example of a user's interaction with the television incorporating the voice response device according to the first embodiment. 第１の実施形態に係る聴取状態判断ルールの一例である。It is an example of the listening state judgment rule which concerns on 1st Embodiment. 本発明の第２の実施形態の音声応答装置のブロック図である。It is a block diagram of the voice response device of the 2nd Embodiment of this invention.

符号の説明Explanation of symbols

１０音声応答装置
１０１入力取得部
１０２応答内容生成部
１０３応答出力部
１０４周囲音取得部
１０５聴取状態判断部 DESCRIPTION OF SYMBOLS 10 Voice response apparatus 101 Input acquisition part 102 Response content generation part 103 Response output part 104 Ambient sound acquisition part 105 Listening state judgment part

Claims

利用者に対して音声応答する音声応答装置において、
前記音声応答を行った時の利用者の周囲の音に関する周囲音情報を取得する周囲音声取得部と、
前記周囲音情報に含まれる利用者音声、応答音声、環境音を分析し、（１）前記利用者が前記応答音声の回答している状態、（２）前記利用者に前記応答音声が聞こえていない状態、または、（３）前記利用者が前記音声応答に回答できない状態かを判断する聴取状態判断部と、
（１）前記利用者が前記音声応答に回答している場合は、前記回答に対する新たな音声応答を生成し、（２）前記利用者に前記応答音声が聞こえていない場合は、前記音声応答を繰り返し、（３）前記利用者が前記音声応答に回答できない場合は、前記音声応答を中断する応答内容生成部と、
を具備する音声応答装置。 In a voice response device for voice response to a user,
An ambient sound acquisition unit that acquires ambient sound information related to sounds around the user when performing the audio response;
Analyzing user voice, response voice, and environmental sound included in the ambient sound information, (1) the user is answering the response voice, (2) the user is hearing the response voice Or (3) a listening state determination unit that determines whether the user cannot answer the voice response;
(1) When the user answers the voice response, a new voice response is generated for the answer. (2) When the user does not hear the response voice, the voice response is generated. (3) If the user cannot answer the voice response, a response content generation unit that interrupts the voice response;
A voice response device comprising:

前記利用者が発声した音声を認識する音声認識部を具備し、
前記応答内容生成部は、前記認識した利用者音声に基づいて前記音声応答を生成する、
請求項１記載の音声応答装置。 Comprising a voice recognition unit for recognizing voice uttered by the user;
The response content generation unit generates the voice response based on the recognized user voice.
The voice response device according to claim 1.

前記聴取状態判断部は、前記周囲音情報に含まれる前記応答音声の音量が基準値に到達していないときは、前記音声応答が前記利用者に聞こえていないと判断する、
請求項１記載の音声応答装置。 The listening state determination unit determines that the voice response is not heard by the user when the volume of the response voice included in the ambient sound information does not reach a reference value.
The voice response device according to claim 1.

前記聴取状態判断部が、前記応答音声が聞こえていないと判断したときは、
前記応答内容生成部は、前記応答音声の音量を大きくして繰り返す、
請求項１記載の音声応答装置。 When the listening state determination unit determines that the response voice is not heard,
The response content generation unit repeats with the volume of the response voice increased.
The voice response device according to claim 1.

前記聴取状態判断部は、前記応答音声の発声時間内の前記周囲音情報に含まれる前記応答音声以外の音に基づいて、前記利用者が前記音声応答に回答できない状態であるか否かを判断する、
請求項１記載の音声応答装置。 The listening state determination unit determines whether or not the user cannot answer the voice response based on a sound other than the response voice included in the ambient sound information within the utterance time of the response voice. To
The voice response device according to claim 1.

前記聴取状態判断部は、前記応答音声の発声時間内の前記周囲音情報に前記利用者音声と前記利用者以外の音声が含まれているときは、前記利用者が前記音声応答に回答できない状態と判断する、
請求項５記載の音声応答装置。 The listening state determination unit is in a state where the user cannot answer the voice response when the ambient sound information within the utterance time of the response voice includes the user voice and voices other than the user. To judge,
The voice response device according to claim 5.

前記聴取状態判断部は、前記周囲音情報より取得した前記利用者の発声区間に、前記応答音声の発声開始時刻、または、発声終了時刻が含まれているときは、前記利用者が前記音声応答に回答できない状態と判断する、
請求項５記載の音声応答装置。 When the utterance start time or the utterance end time of the response voice is included in the utterance section of the user acquired from the ambient sound information, the listening state determination unit determines that the user Judge that it is not possible to answer
The voice response device according to claim 5.

前記聴取状態判断部は、前記応答音声の発声時間内の前記周囲音情報に任意の物理音が含まれているときは、前記利用者が前記音声応答に回答できない状態と判断する、
請求項５記載の音声応答装置。 The listening state determination unit determines that the user is unable to answer the voice response when any physical sound is included in the ambient sound information within the utterance time of the response voice.
The voice response device according to claim 5.

前記中断後に取得された周囲音情報に基づき、前記音声応答の再開処理を行う応答再開処理部を具備する、
請求項１記載の音声応答装置。 Based on ambient sound information acquired after the interruption, comprising a response restart processing unit for restarting the voice response;
The voice response device according to claim 1.

利用者に対して音声応答する音声応答方法において、
前記音声応答を行った時の利用者の周囲の音に関する周囲音情報を取得する周囲音声取得ステップと、
前記周囲音情報に含まれる利用者音声、応答音声、環境音を分析し、（１）前記利用者が前記応答音声の回答している状態、（２）前記利用者に前記応答音声が聞こえていない状態、または、（３）前記利用者が前記音声応答に回答できない状態かを判断する聴取状態判断ステップと、
（１）前記利用者が前記音声応答に回答している場合は、前記回答に対する新たな音声応答を生成し、（２）前記利用者に前記応答音声が聞こえていない場合は、前記音声応答を繰り返し、（３）前記利用者が前記音声応答に回答できない場合は、前記音声応答を中断する応答内容生成ステップと、
を具備する音声応答方法。 In the voice response method for voice response to the user,
Ambient voice acquisition step of acquiring ambient sound information related to a user's ambient sound when performing the voice response;
Analyzing user voice, response voice, and environmental sound included in the ambient sound information, (1) the user is answering the response voice, (2) the user is hearing the response voice Or (3) a listening state determination step for determining whether the user cannot answer the voice response;
(1) When the user answers the voice response, a new voice response is generated for the answer. (2) When the user does not hear the response voice, the voice response is generated. Repetitively, (3) when the user cannot answer the voice response, a response content generation step for interrupting the voice response;
A voice response method comprising:

利用者に対して音声応答するためのコンピュータで実現される音声応答プログラムにおいて、
前記音声応答を行った時の利用者の周囲の音に関する周囲音情報を取得する周囲音声取得機能と、
前記周囲音情報に含まれる利用者音声、応答音声、環境音を分析し、（１）前記利用者が前記応答音声の回答している状態、（２）前記利用者に前記応答音声が聞こえていない状態、または、（３）前記利用者が前記音声応答に回答できない状態かを判断する聴取状態判断機能と、
（１）前記利用者が前記音声応答に回答している場合は、前記回答に対する新たな音声応答を生成し、（２）前記利用者に前記応答音声が聞こえていない場合は、前記音声応答を繰り返し、（３）前記利用者が前記音声応答に回答できない場合は、前記音声応答を中断する応答内容生成機能と、
を実現する音声応答プログラム。 In a voice response program realized by a computer for voice response to a user,
Ambient voice acquisition function for acquiring ambient sound information related to the surrounding sounds of the user when performing the voice response;
Analyzing user voice, response voice, and environmental sound included in the ambient sound information, (1) the user is answering the response voice, (2) the user is hearing the response voice Or (3) a listening state determination function for determining whether the user cannot answer the voice response;
(1) When the user answers the voice response, a new voice response is generated for the answer. (2) When the user does not hear the response voice, the voice response is generated. (3) If the user cannot answer the voice response, a response content generation function for interrupting the voice response;
Voice response program that realizes.