JP2007072351A

JP2007072351A - Speech recognition device

Info

Publication number: JP2007072351A
Application number: JP2005261782A
Authority: JP
Inventors: Michihiro Yamazaki; 道弘山崎
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2005-09-09
Filing date: 2005-09-09
Publication date: 2007-03-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition device capable of suppressing incorrect recognition even when a voice input part is movable. <P>SOLUTION: The speech recognition device is provided with a speaker for outputting voice to an audio/visual space, a microphone taking the voice in the audio/visual space and outputting an input voice signal of the taken input voice, an acceleration sensor for monitoring the moving state of the microphone, and a voice processing part for canceling a spatial echo of the voice output from the speaker from the input voice signal and executing voice processing for performing speech recognition of the input voice signal in which the spatial echo is canceled according to the moving state of the microphone monitored by the acceleration sensor. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、音声を視聴空間に出力すると共に、視聴空間の音声を取り入れ、取り入れた入力音声の音声認識をする音声認識装置に関するものである。 The present invention relates to a speech recognition apparatus that outputs audio to a viewing space, takes in audio from the viewing space, and performs speech recognition of the incorporated input speech.

自装置から音を出力しながら音声認識を行う音声認識装置においては、マイクによって取り込まれる音声には、ユーザが音声認識用に発話した音声に加えて、自装置から出力されているオーディオ信号に対応した音声が空間エコーとして取り込まれる。このため、従来の音声認識装置は、自装置から出力されているオーディオ信号を参照信号として、マイクから取り込まれた入力音声信号からオーディオ信号の空間エコーをキャンセルし、キャンセルした入力音声信号の音声認識を行うことで、音声認識精度を向上させる（例えば、特許文献１参照）。 In a speech recognition device that performs speech recognition while outputting sound from its own device, the audio captured by the microphone corresponds to the audio signal output from the own device in addition to the speech uttered by the user for speech recognition Voice is captured as a spatial echo. For this reason, the conventional speech recognition device cancels the spatial echo of the audio signal from the input speech signal captured from the microphone using the audio signal output from the device as a reference signal, and recognizes the canceled input speech signal. To improve the voice recognition accuracy (see, for example, Patent Document 1).

特開２００１−１００７８５公報（段落番号［００６１］から［００７６］、図２）JP 2001-1000078 (paragraph numbers [0061] to [0076], FIG. 2)

しかしながら、従来の音声認識装置では、例えば、音声認識装置のマイクを取り付けたＡＶ機器のリモコンのように、マイクが移動可能な場合に、ＡＶ機器からの出力音声のエコーパスが変化しエコーキャンセルを失敗し、音声認識を誤ってしまうという問題点があった。また、エコーキャンセルを失敗した原因がユーザにわからないため、誤認識の改善ができないという問題点があった。 However, in the conventional speech recognition device, when the microphone is movable, for example, as in the remote control of the AV device to which the microphone of the speech recognition device is attached, the echo path of the output speech from the AV device changes and the echo cancellation fails. However, there is a problem that voice recognition is mistaken. Further, since the user does not know the cause of the echo cancellation failure, there is a problem that the erroneous recognition cannot be improved.

本発明は上記のような問題を解決するためになされたもので、視聴空間の音声を取り入れるマイクが移動可能な場合でも音声認識の誤認識を抑えられる音声認識装置を得ることを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a speech recognition device that can suppress erroneous recognition of speech recognition even when a microphone that captures speech in a viewing space is movable.

この発明に係る音声認識装置は、スピーカから出力された音声の空間エコーを入力音声信号からキャンセルし、当該空間エコーをキャンセルした入力音声信号の音声認識をする音声処理を、加速度センサによって監視されたマイクの移動状態に応じて実行するものである。 In the speech recognition device according to the present invention, the speech processing for canceling the spatial echo of the speech output from the speaker from the input speech signal and performing speech recognition of the input speech signal with the spatial echo canceled is monitored by the acceleration sensor. This is executed according to the movement state of the microphone.

この発明によれば、マイクの移動を監視し、マイクの移動状態に応じて音声処理を実行することにより、マイクが移動してエコーパスが変わることによるエコーキャンセルの失敗で音声認識を誤認識することを抑えることができる効果がある。 According to the present invention, the movement of the microphone is monitored, and the voice processing is executed according to the movement state of the microphone, so that the voice recognition is erroneously recognized due to the echo cancellation failure due to the movement of the microphone and the echo path changing. There is an effect that can be suppressed.

実施の形態１．
図１は、この発明を実施するための実施の形態１における音声認識装置を示すブロック図である。例えば、図示しないＡＶ機器、ＴＶ等と音声認識装置とを接続し、音声認識装置の音声認識結果に応じて、図示しないＡＶ機器、ＴＶ等をユーザの発話に応じて動作させるように構成することができる。
図１において、音声出力部１は、視聴空間に出力される音声のオーディオ信号を出力する。スピーカ２は、上記音声出力部１から出力されたオーディオ信号に対応した音声を視聴空間に出力する。マイク３は、視聴空間の音声を取り入れ、取り入れた入力音声の入力音声信号を出力する。加速度センサ４は、上記マイク３の移動状態を監視する。例えば、移動開始・停止、移動速度、移動方向、停止時間等の移動状態を監視する。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a speech recognition apparatus according to Embodiment 1 for carrying out the present invention. For example, an AV device, TV, etc. (not shown) are connected to a voice recognition device, and the AV device, TV, etc. (not shown) are operated according to the user's utterance according to the voice recognition result of the voice recognition device. Can do.
In FIG. 1, an audio output unit 1 outputs an audio signal of audio output to the viewing space. The speaker 2 outputs audio corresponding to the audio signal output from the audio output unit 1 to the viewing space. The microphone 3 takes in the audio of the viewing space and outputs an input audio signal of the taken input audio. The acceleration sensor 4 monitors the moving state of the microphone 3. For example, the movement state such as movement start / stop, movement speed, movement direction, and stop time is monitored.

音声処理部５は、上記スピーカ２から出力された音声の空間エコーを上記入力音声信号からキャンセルし、当該空間エコーをキャンセルした入力音声信号の音声認識をする音声処理を、上記加速度センサ６によって監視された上記マイク３の移動状態に応じて実行するものであり、ここでは上記マイク３の移動状態に基づいて処理を起動・停止するように構成されている。 The audio processing unit 5 cancels the spatial echo of the audio output from the speaker 2 from the input audio signal, and the acceleration sensor 6 monitors audio processing for recognizing the input audio signal from which the spatial echo is canceled. The process is executed in accordance with the movement state of the microphone 3, and the process is configured to start and stop based on the movement state of the microphone 3.

なお、上記音声処理部５は、エコーキャンセラ６と音声認識部７とで構成され、エコーキャンセラ６は、上記スピーカ２で出力されたオーディオ信号に対応した音声がマイク３に回り込むまでの空間エコーの経路（エコーパス）の学習をし、その学習結果を用いて、上記スピーカ２から出力された音声の空間エコーを上記入力音声信号からキャンセルする。ここでは、参照信号として、上記音声出力部１から出力されたオーディオ信号を用いて、上記スピーカ２から出力された音声の空間エコーをキャンセルする。また、音声認識部７は、上記エコーキャンセラ６で空間エコーをキャンセルした入力音声信号の音声認識の処理を行う。 The voice processing unit 5 includes an echo canceller 6 and a voice recognition unit 7, and the echo canceller 6 performs spatial echo until the voice corresponding to the audio signal output from the speaker 2 wraps around the microphone 3. The path (echo path) is learned, and the spatial echo of the voice output from the speaker 2 is canceled from the input voice signal using the learning result. Here, the spatial echo of the sound output from the speaker 2 is canceled using the audio signal output from the sound output unit 1 as a reference signal. The voice recognition unit 7 performs voice recognition processing on the input voice signal in which spatial echo is canceled by the echo canceller 6.

次に動作について説明する。
音声出力部１が、視聴空間に出力される音声のオーディオ信号、例えば、ＴＶ放送音声、ＶＴＲ・ＤＶＤ再生音声、外部入力機器からの入力音声、操作案内用出力音声等のＡＶ機器からの音声等に対応するオーディオ信号を出力すると、スピーカ２は、音声出力部１から出力されたオーディオ信号に対応する音声を視聴空間へ出力する。これにより、ユーザは、上記オーディオ信号に対応した音声を聞くことが可能となる。 Next, the operation will be described.
Audio output unit 1 outputs audio signals of audio output to the viewing space, for example, TV broadcast audio, VTR / DVD playback audio, audio input from external input devices, audio from AV devices such as operation guidance output audio, etc. When the audio signal corresponding to the audio signal is output, the speaker 2 outputs the audio corresponding to the audio signal output from the audio output unit 1 to the viewing space. Thereby, the user can hear the sound corresponding to the audio signal.

一方、ユーザが発話をすると、例えば、ユーザがＡＶ機器へ音声による指示を行うための発話をすると、マイク３は、視聴空間に発せられたユーザが発話した音声（ユーザ発話）を取り入れ、取り入れた入力音声の入力音声信号を出力する。この時、スピーカ２が音声を出力していると、マイク３では、上記ユーザ発話に加えて、上記スピーカ２が出力した音声を空間エコーとして取り込み、これらを合わせたものが入力音声となる。 On the other hand, when the user utters, for example, when the user utters to give an audio instruction to the AV device, the microphone 3 takes in the voice (user utterance) uttered by the user uttered in the viewing space. Outputs the input audio signal of the input audio. At this time, if the speaker 2 is outputting sound, the microphone 3 captures the sound output from the speaker 2 as a spatial echo in addition to the user utterance, and the combined sound becomes the input sound.

また、この時、加速度センサ４は、上記マイク３の移動状態を監視している。例えば、マイク３の移動開始・停止、移動速度、移動方向、停止時間等の移動状態を監視している。音声処理部５は、上記加速度センサ４によって監視された上記マイク３の移動状態に応じて音声処理を実行するように構成されており、ここでは、上記マイク３の移動状態に基づいて、認識状態の遷移判定を行い、遷移判定の結果に応じて、音声処理を実行する。音声処理としては、例えば、エコーパス学習、エコーキャンセル、音声認識等の処理がある。また、認識状態の遷移判定は、例えば、一定時間毎に行い、判定結果として、認識状態、例えば、認識停止状態、認識待ち状態、認識中状態の各状態に応じて音声処理の各処理を実施すると共に、各処理の起動・停止を行う。 At this time, the acceleration sensor 4 monitors the moving state of the microphone 3. For example, the movement state of the microphone 3 such as movement start / stop, movement speed, movement direction, and stop time is monitored. The voice processing unit 5 is configured to perform voice processing according to the movement state of the microphone 3 monitored by the acceleration sensor 4. Here, based on the movement state of the microphone 3, the recognition state Transition determination is performed, and audio processing is executed according to the result of the transition determination. Examples of voice processing include processes such as echo path learning, echo cancellation, and voice recognition. In addition, the recognition state transition determination is performed, for example, at regular intervals, and the speech processing is performed according to the recognition state, for example, the recognition stop state, the recognition waiting state, and the recognition state as the determination result. At the same time, each process is started and stopped.

ここで、加速度センサ４の監視によりマイク３が移動中の状態にあるということは、即ちマイク３が移動している事を意味している。この場合スピーカ２で出力されたオーディオ信号に対応した音声がマイク３に回り込むまでの空間エコーの経路（エコーパス）が変化している事を意味しており、エコーキャンセラ６がエコーパスの変化に追従できずにエコーキャンセルに失敗することになる。このため、エコーキャンセラ６の出力は、ユーザ発話にスピーカ２で出力されたオーディオ信号に対応した音声の空間エコーがキャンセルされずに重畳されＳ／Ｎが悪くなった信号となり、音声認識部７で誤認識を起こしやすくなり、音声認識結果に基づく動作の誤動作の原因となる。このため、加速度センサ４でマイク３の移動を検出した場合には、音声処理部５での音声認識を停止させることにより、音声認識の誤認識による動作の誤動作を防ぐことができる。加速度センサ４がマイク３の停止を検出した場合は、エコーキャンセラ６でエコーパスの学習に関する処理を行い、エコーパス学習に必要な時間を置いた後、入力音声が有れば、エコーキャンセラ６でのエコーキャンセル、音声認識部７での音声認識の各処理を開始する。 Here, the fact that the microphone 3 is in a moving state as monitored by the acceleration sensor 4 means that the microphone 3 is moving. In this case, it means that the path of the spatial echo (echo path) until the sound corresponding to the audio signal output from the speaker 2 wraps around the microphone 3 is changed, and the echo canceller 6 can follow the change of the echo path. Echo cancellation will fail. Therefore, the output of the echo canceller 6 is a signal in which the spatial echo of the voice corresponding to the audio signal output from the speaker 2 is superimposed on the user's utterance without being canceled and the S / N is deteriorated. Misrecognition is likely to occur, causing a malfunction of the operation based on the speech recognition result. For this reason, when the movement of the microphone 3 is detected by the acceleration sensor 4, the malfunction of the operation due to the misrecognition of the speech recognition can be prevented by stopping the speech recognition in the speech processing unit 5. When the acceleration sensor 4 detects the stop of the microphone 3, the echo canceller 6 performs processing related to the learning of the echo path, waits for the time required for the echo path learning, and if there is an input voice, the echo in the echo canceller 6 Each process of cancellation and voice recognition in the voice recognition unit 7 is started.

またここで、認識状態の遷移判定について、認識停止状態とは、エコーキャンセル、音声認識等、音声認識に関する処理を停止する又は行わない状態である。例えば、マイク３が移動中、エコーパスの学習中に認識停止状態となる。認識待ち状態とは、エコーキャンセル、音声認識等、音声認識に関する処理を行うことが可能な状態であり、音声の入力があれば処理を実行可能な状態である。認識中状態とは、エコーキャンセル、音声認識等、音声認識に関する処理を行っている途中の状態である。 Here, regarding the recognition state transition determination, the recognition stop state is a state in which processing related to speech recognition such as echo cancellation and speech recognition is stopped or not performed. For example, the recognition is stopped while the microphone 3 is moving and the echo path is being learned. The recognition waiting state is a state in which processing relating to voice recognition such as echo cancellation and voice recognition can be performed, and is a state in which processing can be executed if there is a voice input. The in-recognition state is a state in the middle of performing processing related to speech recognition such as echo cancellation and speech recognition.

上記音声処理５が処理を行う場合、認識停止状態の時にエコーパスの学習をし、認識待ち状態の時に音声入力が有ると、上記音声処理部５のエコーキャンセルの処理として、エコーキャンセラ６では、マイク３が出力した入力音声信号のうち、スピーカ２から出力されたオーディオ信号に対応した音声の空間エコーをキャンセルし、空間エコーをキャンセルした後の入力音声信号を出力する。この時、エコーキャンセラ６は、空間エコーをキャンセルするための参照信号として、音声出力部１から出力されたオーディオ信号を使用する。続いて、上記音声処理部５の音声認識の処理として、音声認識部７では、上記エコーキャンセラ６で空間エコーをキャンセルした後の入力音声信号を入力として、音声認識を行う。 When the voice processing 5 performs processing, when the echo path is learned in the recognition stop state and there is a voice input in the recognition waiting state, the echo canceller 6 uses the microphone as an echo cancellation process of the voice processing unit 5. 3 cancels the spatial echo of the audio corresponding to the audio signal output from the speaker 2 among the input audio signals output by 3 and outputs the input audio signal after canceling the spatial echo. At this time, the echo canceller 6 uses the audio signal output from the audio output unit 1 as a reference signal for canceling the spatial echo. Subsequently, as a speech recognition process of the speech processing unit 5, the speech recognition unit 7 performs speech recognition using the input speech signal after the spatial echo is canceled by the echo canceller 6 as an input.

このように、視聴空間の音声、例えば、ユーザ発話によるＡＶ機器への指示が音声認識され、この音声認識の結果に基づいて動作、例えば、ＡＶ機器を動作させることが可能となる。 As described above, the voice in the viewing space, for example, an instruction to the AV device by the user's utterance is recognized as voice, and the operation, for example, the AV device can be operated based on the result of the voice recognition.

図２は、この認識状態の遷移判定の一例を示す説明図である。
図２において、遷移判定では、まず、移動判定（ステップ１０１）で加速度センサ４からの移動状態を取得し、マイク３が移動しているか否かを判定する。マイク３が移動していると判定された場合は、現在の認識状態によらず認識停止状態（ステップ１０６ａ）に遷移する。このように、マイク３が移動している場合は、認識状態を認識停止状態にし、音声認識に関する処理を停止することにより、エコーキャンセルの失敗による音声認識の誤認識を防ぐことができる。一方、移動判定（ステップ１０１）で停止中と判断された場合は、現在状態判定（ステップ１０２）で現在の認識状態の判定を行なう。 FIG. 2 is an explanatory diagram showing an example of the recognition state transition determination.
In FIG. 2, in the transition determination, first, the movement state from the acceleration sensor 4 is acquired in the movement determination (step 101), and it is determined whether or not the microphone 3 is moving. When it is determined that the microphone 3 is moving, the state transitions to the recognition stop state (step 106a) regardless of the current recognition state. As described above, when the microphone 3 is moving, the recognition state is set to the recognition stop state, and the processing related to the voice recognition is stopped, thereby preventing the erroneous recognition of the voice recognition due to the echo cancellation failure. On the other hand, when it is determined that the vehicle is stopped in the movement determination (step 101), the current recognition state is determined in the current state determination (step 102).

現在状態判定（ステップ１０２）で現在の状態が認識停止状態であった場合は、引き続き認識停止状態（ステップ１０６ａ）とするか認識待ち状態（ステップ１０６ｂ）に移行するかを待ち開始判定（ステップ１０３）で判定する。この待ち開始判定（ステップ１０３）では、移動停止となってからの経過時間が一定時間以上経過していれば認識待ち状態（ステップ１０６ｂ）に移行すると判定する。これは、移動停止直後ではエコーキャンセラ６でのエコーパスの学習が終了していないため、エコーパスが学習するための時間経過後に認識待ち状態とするためである。 If the current state is a recognition stop state in the current state determination (step 102), whether to continue the recognition stop state (step 106a) or shift to the recognition wait state (step 106b) is determined to start waiting (step 103). ). In this waiting start determination (step 103), it is determined that the state shifts to the recognition waiting state (step 106b) if the elapsed time since the movement stop has elapsed for a certain time or more. This is because the learning of the echo path in the echo canceller 6 is not completed immediately after the movement is stopped, so that the recognition wait state is entered after the elapse of time for the echo path to learn.

また、現在状態判定（ステップ１０２）で現在の状態が認識待ち状態であった場合は、音声入力有無判定（ステップ１０４）で音声の入力があるか否かを判定し、音声の入力が無の状態であれば認識待ち状態（ステップ１０６ｂ）とする。一方、音声入力が開始され、音声の入力が有の状態であれば、音声処理部５はエコーキャンセル、音声認識等の音声認識に関する処理を行う。 Also, if the current state is a recognition waiting state in the current state determination (step 102), it is determined whether or not there is a voice input in the voice input presence / absence determination (step 104), and there is no voice input. If it is in a state, it is set as a recognition waiting state (step 106b). On the other hand, if voice input is started and voice input is present, the voice processing unit 5 performs processing related to voice recognition such as echo cancellation and voice recognition.

また、現在状態判定（ステップ１０２）で現在の状態が認識中状態であった場合は、音声処理部５はエコーキャンセル、音声認識等の音声認識に関する処理を行っている途中である。 If the current state is a recognizing state in the current state determination (step 102), the speech processing unit 5 is in the middle of performing processing related to speech recognition such as echo cancellation and speech recognition.

認識継続判定（ステップ１０５）で音声認識に関する処理が継続するか否かを判定する。例えば音声の入力が現時点で終了していない場合は音声認識に関する処理を継続するものとし、認識状態を認識中状態（ステップ１０６ｃ）とする。一方、音声の入力が終了したと判定した場合は、認識結果を出力し、認識待ち状態（ステップ１０６ｂ）に移行する。 In the recognition continuation determination (step 105), it is determined whether or not the processing related to the voice recognition is continued. For example, if the input of voice has not ended at this time, the process related to voice recognition is continued, and the recognition state is set to the recognition state (step 106c). On the other hand, if it is determined that the voice input has been completed, the recognition result is output and the process proceeds to a recognition wait state (step 106b).

図２の認識状態の遷移では、認識中状態で音声発話が継続している場合でもマイクが移動していると判定されると認識が中断し認識停止状態に移行していたが、認識中の場合は認識を継続するようにしてもよい。 In the transition of the recognition state in FIG. 2, even if the voice utterance is continued in the recognition state, if it is determined that the microphone is moving, the recognition is interrupted and the recognition is stopped. In this case, recognition may be continued.

図３は、認識状態の遷移判定の一例であり、音声認識に関する処理を行っている時は、その処理を継続する場合の例を示す説明図である。なお、図２と同一又は同様の処理に同一の符号を付し、説明を省略する。 FIG. 3 is an example of the determination of the transition of the recognition state, and is an explanatory diagram showing an example of continuing the process when the process related to speech recognition is performed. In addition, the same code | symbol is attached | subjected to the same or similar process as FIG. 2, and description is abbreviate | omitted.

図３に示した認識状態の遷移判定の例は、図２に示した認識状態の遷移判定の例に対して、移動判定（ステップ１０１）の前に、現在状態判定（ステップ１０２ｂ）を行い、現在状態が認識中状態であった場合には、移動判定（ステップ１０１）を行わずに音声認識に関する処理を行い、認識継続判定（ステップ１０５）行うようにしたものである。以後の判定方法は図２と同じであるが、現在状態判定（ステップ１０２ａ）では現在状態が認識中状態の場合は無い為、認識停止状態と認識待ち状態の判定のみになる。 The recognition state transition determination example shown in FIG. 3 performs the current state determination (step 102b) before the movement determination (step 101), compared to the recognition state transition determination example shown in FIG. If the current state is the recognizing state, the process relating to voice recognition is performed without performing the movement determination (step 101), and the recognition continuation determination (step 105) is performed. The subsequent determination method is the same as that in FIG. 2, but in the current state determination (step 102a), there is no case where the current state is in the recognition state, so only the recognition stop state and the recognition wait state are determined.

また図４は、認識状態の遷移判定の一例として、空間エコーの漏れ込み推定量が小さい時は認識停止を行わないようにし、さらに、すでに音声認識に関する処理を行っている時は認識停止を行いにくくする場合の例を示す説明図である。なお、前述の図と同一又は同様の処理に同一の符号を付し、説明を省略する。 As an example of recognition state transition determination, FIG. 4 does not stop recognition when the estimated amount of leakage of spatial echo is small, and further stops recognition when processing related to speech recognition has already been performed. It is explanatory drawing which shows the example in the case of making it difficult. In addition, the same code | symbol is attached | subjected to the process same as the above-mentioned figure, or description is abbreviate | omitted.

図４に示した認識状態の遷移判定の例は、図２に示した認識状態の遷移判定の例に対して、漏れ込み推定量判定（ステップ１０７）を追加し、空間エコーの漏れ込み推定量が大きいか否かを判定し、漏れ込み推定量が小さい場合には移動判定（ステップ１０１）を行わないようにしたものである。 The recognition state transition determination example shown in FIG. 4 adds a leak estimation amount determination (step 107) to the recognition state transition determination example shown in FIG. The movement determination (step 101) is not performed when the estimated leakage amount is small.

ここで、漏れ込み推定量判定（ステップ１０７）では、例えば以下の様な判定を行う。
（１）音声出力部１から出力されるオーディオ信号の信号レベルが高いにもかかわらず、マイク３に漏れ込んでくる空間エコーの量が小さい場合、例えばボリュームを絞っている場合や、ヘッドホンで聴いている場合など、オーディオ信号の信号レベルは高くてもマイク３に漏れ込む漏れ込み量は小さくなる。この様な場合で漏れ込み推定量が閾値(Plth)以下に収まると判断される場合には、漏れ込み推定量は小さいと判定する。
（２）音声出力部１から出力されるオーディオ信号が高く、マイク３に漏れ込みがある場合には、一時的にオーディオ信号の信号レベルが下がっていても信号レベルが上がった際にはマイク３に漏れ込む漏れ込み量は大きくなる。この様な場合で漏れ込み推定量が閾値(Plth)より大きくなると判断される場合は、漏れ込み推定量は大きいと判定する。 Here, in the leakage estimation amount determination (step 107), for example, the following determination is performed.
(1) Even when the signal level of the audio signal output from the audio output unit 1 is high, the amount of spatial echo leaking into the microphone 3 is small, for example, when the volume is reduced or listening with headphones For example, even if the signal level of the audio signal is high, the leakage amount that leaks into the microphone 3 is small. In such a case, if it is determined that the estimated leak amount falls below the threshold (Plth), it is determined that the estimated leak amount is small.
(2) When the audio signal output from the audio output unit 1 is high and the microphone 3 is leaked, the microphone 3 is displayed when the signal level rises even if the signal level of the audio signal is temporarily lowered. The amount of leakage that leaks into the chamber increases. In such a case, if it is determined that the estimated leak amount is greater than the threshold (Plth), it is determined that the estimated leak amount is large.

なお、漏れ込み推定量判定（ステップ１０７）では現在の認識状態により、閾値(Plth)を変更しても構わない。例えば、認識状態が認識中状態の場合でユーザの発話の途中で音声認識に関する処理を停止するケースを減らすため、認識中状態の場合は閾値(Plth)を高く設定するようにしてもよい。 In the leakage estimation amount determination (step 107), the threshold (Plth) may be changed according to the current recognition state. For example, the threshold (Plth) may be set higher in the recognition state in order to reduce the case where the process related to speech recognition is stopped during the user's utterance when the recognition state is the recognition state.

また図５は、認識状態の遷移判定の一例として、漏れ込み推定量による判定を行う場合でも、現在の認識状態が認識中状態の場合には、漏れ込み推定量判定、移動判定を行わないようにする場合の例を示す説明図である。なお、前述の図と同一又は同様の処理に同一の符号を付し、説明を省略する。 Further, FIG. 5 shows an example of the determination of the transition of the recognition state. Even when the determination based on the leakage estimation amount is performed, the leakage estimation amount determination and the movement determination are not performed when the current recognition state is the recognition state. It is explanatory drawing which shows the example in the case of making. In addition, the same code | symbol is attached | subjected to the process same as the above-mentioned figure, or description is abbreviate | omitted.

図５に示した認識状態の遷移判定の例は、図４に示した認識状態の遷移判定の例について、現在状態判定（ステップ１０２）を（ステップ１０２ａ）と（ステップ１０２ｂ）の２箇所に分離し、現在状態判定（ステップ１０２ｂ）で認識中状態と判定された場合には、音声認識に関する処理を継続して行うようにしたものである。 In the example of the recognition state transition determination shown in FIG. 5, the current state determination (step 102) is separated into two parts (step 102a) and (step 102b) in the example of the recognition state transition determination shown in FIG. However, if it is determined that the current state is being determined (step 102b), the process relating to voice recognition is continuously performed.

また図６は、認識状態の遷移判定の一例として、マイク３が移動中の場合でも、音声認識に関する処理を行っている場合には直ちに認識停止状態に移行しないようにするために、空間エコーの漏れ込みが音声認識に関する処理を継続してよいレベルか否かを判定し、音声認識の処理に問題なければ認識中状態のままとする場合の例を示す説明図である。なお、前述の図と同一又は同様の処理に同一の符号を付し、説明を省略する。 FIG. 6 shows an example of determination of the transition of the recognition state, so that even if the microphone 3 is moving, in order to prevent the state from immediately entering the recognition stop state when processing related to speech recognition is performed, It is explanatory drawing which shows the example in the case of determining whether leakage is a level which may continue the process regarding speech recognition, and leaving it in the recognition state if there is no problem in the speech recognition process. In addition, the same code | symbol is attached | subjected to the process same as the above-mentioned figure, or description is abbreviate | omitted.

図６に示した認識状態の遷移判定の例は、図４に示した認識状態の遷移判定の例について、現在状態判定（ステップ１０８）、認識停止判定（ステップ１０９）を追加したものである。マイク３が移動中の場合でも現在状態判定（ステップ１０８）で認識中状態と判定された場合には、認識停止判定（ステップ１０９）で認識継続、認識停止の判定を行うようにしたものである。認識停止判定（ステップ１０９）では、例えば、認識中の現在までの音声区間のうち、漏れ込み推定量が閾値以上となっている時間の割合が一定割合以上になったら認識を停止し、それ以外の場合は認識を継続すると判定する。 The recognition state transition determination example illustrated in FIG. 6 is obtained by adding a current state determination (step 108) and a recognition stop determination (step 109) to the recognition state transition determination illustrated in FIG. Even if the microphone 3 is moving, if the current state determination (step 108) determines that it is in the recognition state, the recognition stop determination (step 109) determines whether or not the recognition is continued. . In the recognition stop determination (step 109), for example, the recognition is stopped when the ratio of the time during which the leak estimation amount is equal to or greater than the threshold in the speech segment to be recognized is equal to or greater than a certain ratio, and otherwise. In the case of, it is determined that the recognition is continued.

以上のように本実施の形態によれば、スピーカ２から出力された音声の空間エコーを入力音声信号からキャンセルし、当該空間エコーをキャンセルした入力音声信号の音声認識をする音声処理を、加速度センサ６によって監視されたマイク３の移動状態に応じて実行する、特に、マイク３の移動状態に応じて音声処理の起動・停止を行うことにより、マイク３が移動してエコーパスが変わることによるエコーキャンセルの失敗で音声認識を誤認識することを抑えることができる。
なお、本実施の形態では、マイク３の移動状態に応じてエコーキャンセル、音声認識等の音声認識に関する処理の起動・停止を行うことについて説明したが、少なくとも音声認識の処理の起動・停止を行う構成であればよく、これにより、マイク３が移動してエコーパスが変わることによるエコーキャンセルの失敗で音声認識を誤認識することを抑えることができることは言うまでもない。 As described above, according to the present embodiment, the voice processing for canceling the spatial echo of the voice output from the speaker 2 from the input voice signal and performing the voice recognition of the input voice signal with the spatial echo canceled is performed using the acceleration sensor. 6 is executed in accordance with the movement state of the microphone 3 monitored by 6, in particular, echo cancellation caused by changing the echo path by moving the microphone 3 by starting / stopping sound processing in accordance with the movement state of the microphone 3. It is possible to suppress erroneous recognition of voice recognition due to the failure.
In the present embodiment, it has been described that the processes related to voice recognition such as echo cancellation and voice recognition are started / stopped according to the moving state of the microphone 3, but at least the voice recognition process is started / stopped. Needless to say, it is possible to suppress erroneous recognition of speech recognition due to failure of echo cancellation due to movement of the microphone 3 and change of the echo path.

また、音声認識に関する処理を停止した後に待ち開始判定を行うことにより、エコーパスの学習に必要な時間を置いた後に音声認識に関する処理を起動するようにでき、音声認識の誤認識を抑えることができる。 In addition, by performing the waiting start determination after stopping the speech recognition processing, the speech recognition processing can be started after a time necessary for the learning of the echo path, and erroneous recognition of speech recognition can be suppressed. .

また、加速度センサによって監視されたマイクの移動状態に加え、現在の音声処理の状態に応じて、音声処理を実行すること、例えば、図３に示した認識状態の遷移判定例のように、音声認識に関する処理を行っている時は、直ちに認識停止状態とせずに、その処理を継続することにより、音声認識の誤認識を抑えつつ、発話中に音声認識に関する処理が停止することが減るのでユーザビリティが向上する。 Further, in addition to the moving state of the microphone monitored by the acceleration sensor, the voice processing is executed according to the current voice processing state, for example, as in the recognition state transition determination example shown in FIG. When processing related to recognition is performed, the processing is not immediately stopped, but the processing is continued, so that misrecognition of speech recognition is suppressed, and processing related to speech recognition is reduced during speech, thus reducing usability. Will improve.

また、加速度センサによって監視されたマイクの移動状態に加え、空間エコーの漏れ込み推定量に応じて、音声処理を実行すること、例えば、図４に示した認識状態の遷移判定例のように、まず空間エコーの漏れ込み推定量判定を行い、空間エコーの漏れ込み推定量が小さい場合はマイク３の移動状態によらず遷移判定をし、マイク３が移動中でも音声認識に関する処理を停止させないようにすることにより、マイク３が移動中でも音声認識の誤認識を起こしにくい場合は音声認識に関する処理を行えるのでユーザビリティが上がる。 Further, in addition to the movement state of the microphone monitored by the acceleration sensor, according to the estimated amount of leakage of the spatial echo, for example, as in the recognition state transition determination example illustrated in FIG. First, the estimated amount of leakage of spatial echo is determined. If the estimated amount of leakage of spatial echo is small, a transition determination is made regardless of the movement state of the microphone 3 so that processing relating to speech recognition is not stopped even when the microphone 3 is moving. Thus, when it is difficult for erroneous recognition of voice recognition to occur even while the microphone 3 is moving, processing related to voice recognition can be performed, so that usability is improved.

また、加速度センサによって監視されたマイクの移動状態に加え、現在の音声処理の状態、空間エコーの漏れ込み推定量に応じて、音声処理を実行すること、例えば、図５に示した認識状態の遷移判定例のように、空間エコーの漏れ込み推定量判定を行う場合でも、現在の認識状態が認識中状態の場合には、漏れ込み推定量判定、移動判定を行わないようにし、音声認識に関する処理を行っている時は、直ちに認識停止状態とせずに、その処理を継続することにより、音声認識の誤認識を抑えつつ、発話中に音声認識に関する処理が停止することが減るのでユーザビリティが向上する。 Further, in addition to the movement state of the microphone monitored by the acceleration sensor, the voice processing is executed according to the current voice processing state and the estimated amount of leakage of the spatial echo, for example, in the recognition state shown in FIG. Even when the spatial echo leakage estimation amount determination is performed as in the transition determination example, if the current recognition state is the recognition state, the leakage estimation amount determination and the movement determination are not performed, and the voice recognition is performed. When processing is performed, the process is not immediately stopped, but the process is continued, so that misrecognition of voice recognition is suppressed and processing related to voice recognition is reduced during speech, improving usability. To do.

また、加速度センサによって監視されたマイクの移動状態に加え、現在の音声処理の状態、空間エコーの漏れ込み推定量に応じて、音声処理を実行すること、例えば、図６に示した認識状態の遷移判定例のように、マイク３が移動中の場合でも、音声認識に関する処理を行っている場合には直ちに認識停止状態に移行しないようにするために、空間エコーの漏れ込みが音声認識に関する処理を継続してよいレベルか否かを判定し、音声認識に関する処理に問題なければ、その処理を継続することにより、マイク３が移動中の場合でも、音声認識の誤認識を抑えつつ、発話中に音声認識に関する処理が停止することが減るのでユーザビリティが向上する。 Further, in addition to the movement state of the microphone monitored by the acceleration sensor, the voice processing is executed in accordance with the current voice processing state and the estimated amount of leakage of the spatial echo, for example, in the recognition state shown in FIG. As in the transition determination example, even when the microphone 3 is moving, if a process related to speech recognition is being performed, leakage of spatial echo is a process related to speech recognition so as not to immediately shift to the recognition stop state. If there is no problem in the voice recognition process, the process is continued, so that even if the microphone 3 is moving, the speech recognition is being performed while suppressing erroneous recognition. In addition, since the processing related to speech recognition is reduced, usability is improved.

なお、本実施の形態において、マイク３の移動状態に応じて音声認識に関する処理の起動・停止を行うことにより、マイク３の移動に影響されずに音声認識の処理ができればよく、エコーパス学習の開始タイミングは、マイク３の移動状態に応じてマイク３が停止してから開始するようにしても良いし、所定時間ごとに随時開始するようにしても良い。 In the present embodiment, it is only necessary that the speech recognition processing can be performed without being affected by the movement of the microphone 3 by starting and stopping the processing related to the speech recognition in accordance with the movement state of the microphone 3, and the start of the echo path learning. The timing may be started after the microphone 3 is stopped according to the moving state of the microphone 3, or may be started at any given time.

また、本実施の形態においては、認識状態の遷移判定例として、認識状態が認識待ち状態であり、音声の入力が有った場合に、音声認識に関する処理を行う場合について説明したが、マイク３が移動状態から停止状態に移った際に、マイク３からの入力音声をバッファリングしておき、エコーパス学習が終了した後改めて停止状態に移行した時点からのエコーキャンセルおよび音声認識の処理を行うようにしても良い。このように停止状態では入力音声をバッファリングし、エコーパス学習後に改めてエコーキャンセルをしなおすことで、ユーザは移動停止直後の発話が可能になりユーザビリティが向上する。 In the present embodiment, as an example of determining the transition of the recognition state, a case has been described in which the recognition state is a recognition wait state, and when speech input is performed, processing related to speech recognition is performed. When the state shifts from the moving state to the stopped state, the input voice from the microphone 3 is buffered, and echo cancellation and voice recognition processing are performed from the time when the state is shifted to the stopped state after the echo path learning is completed. Anyway. As described above, in the stopped state, the input voice is buffered, and echo cancellation is performed again after learning of the echo path, so that the user can speak immediately after the movement is stopped and the usability is improved.

また、本実施の形態においては、マイクの移動状態として、移動中又は停止中という状態に基づいて遷移判定を行う場合について説明したが、マイクが所定速度以上で移動している場合に移動中とみなし、所定速度未満で移動している場合に停止中とみなすようにしても良い。すなわち、マイクの移動速度がエコーキャンセルに失敗する程度の所定速度以上で移動している場合に移動中とみなし、マイクの移動速度がエコーキャンセルに失敗しない程度の所定速度未満で移動している場合に停止中とみなすようにしても良い。 Further, in the present embodiment, the case where the transition determination is performed based on the moving state or the stopped state as the moving state of the microphone has been described, but when the microphone is moving at a predetermined speed or more, Assuming that the vehicle is moving at a speed lower than a predetermined speed, it may be regarded as being stopped. In other words, when the moving speed of the microphone is higher than a predetermined speed at which echo cancellation fails, it is considered that the microphone is moving, and the moving speed of the microphone is lower than a predetermined speed at which echo cancellation fails. It may be considered that the vehicle is stopped.

実施の形態２．
本実施の形態２では、ユーザに対して現在の認識状態を通知する場合の実施の形態について説明する。 Embodiment 2. FIG.
In the second embodiment, an embodiment in the case of notifying the user of the current recognition state will be described.

図７は、この発明を実施するための実施の形態２における音声認識装置を示すブロック図である。なお、図７において、図１と同一又は同様の部分については同一符号を付し、説明を省略する。ただし、音声処理部５は、前述の構成に加え、現在の音声処理の状態（認識状態）を出力するように構成されている。
認識状態通知部８は、音声処理部５で出力された音声処理の状態（認識状態）をユーザに対して通知する。例えば、認識状態として、認識停止状態、認識待ち状態、認識中状態のいずれであるかを通知する。 FIG. 7 is a block diagram showing a speech recognition apparatus according to Embodiment 2 for carrying out the present invention. In FIG. 7, the same or similar parts as in FIG. However, in addition to the above-described configuration, the voice processing unit 5 is configured to output the current voice processing state (recognition state).
The recognition state notification unit 8 notifies the user of the state of speech processing (recognition state) output from the speech processing unit 5. For example, the recognition state is notified of a recognition stopped state, a recognition waiting state, or a recognition in progress state.

次に動作について説明する。
音声処理部５が音声処理の状態（認識状態）を出力する。出力する音声処理の状態（認識状態）としては、例えば「認識待ち状態」、「認識停止状態」、「認識中状態」である。すると上記出力された音声処理の状態（認識状態）は、認識状態通知部７に入力され、当該認識状態通知部７は、上記音声処理の状態（認識状態）をユーザにわかるように通知する。 Next, the operation will be described.
The voice processing unit 5 outputs a voice processing state (recognition state). The output voice processing status (recognition status) is, for example, “recognition waiting status”, “recognition stop status”, or “recognition status”. Then, the output voice processing state (recognition state) is input to the recognition state notification unit 7, and the recognition state notification unit 7 notifies the user of the voice processing state (recognition state).

例えば、音声認識装置のマイクをリモコンに取り付けたＡＶ機器の場合、通知手法としては、例えば、ＡＶ機器本体のディスプレイ等の表示部に音声処理の状態（認識状態）を示すアイコン等を表示する。 For example, in the case of an AV device in which a microphone of a voice recognition device is attached to a remote control, as a notification method, for example, an icon indicating a voice processing state (recognition state) is displayed on a display unit such as a display of the AV device main body.

また例えば、音声認識装置のマイクをリモコンに取り付けたＴＶの場合は、ＴＶ画面にアイコン等を表示するようにしてもよい。
図８、図９は、この場合の通知例を示す説明図である。
例えば、リモコンが移動して音声認識に関する処理を停止する又は行わない状態（認識停止状態）になった際に図８のように画面隅に入力を受け付けていない旨を示すアイコンを表示し、音声認識に関する処理を行っている途中の状態（認識中状態）は図９のように画面隅に認識中である旨を示すアイコンを表示し、音声認識に関する処理を行うことが可能な状態（認識待ち状態）は表示なしとしてユーザに現在の音声認識状態を通知する。 For example, in the case of a TV in which the microphone of the voice recognition device is attached to the remote control, an icon or the like may be displayed on the TV screen.
8 and 9 are explanatory diagrams showing examples of notification in this case.
For example, when the remote controller moves and stops or does not perform processing related to voice recognition (recognition stopped state), an icon indicating that no input is accepted is displayed at the corner of the screen as shown in FIG. As shown in FIG. 9, an icon indicating that recognition is being performed is displayed at the corner of the screen as shown in FIG. Status) informs the user of the current voice recognition status as no display.

また、ユーザに対する音声処理の状態（認識状態）の通知手法としては、音声ガイダンスや、報知音を鳴らすことにより通知しても構わない。この場合は状態が切り替わる際に各音を出力する。また、表示と報知音の併用による通知でも構わない。 Moreover, as a notification method of the voice processing state (recognition state) to the user, notification may be made by sounding voice guidance or a notification sound. In this case, each sound is output when the state is switched. Moreover, the notification by the combined use of the display and the notification sound may be used.

以上のように本実施の形態によれば、音声処理の状態（認識状態）をユーザに通知する認識状態通知部を備えることにより、ユーザが音声処理の状態を把握出来る。このため、認識停止中状態にもかかわらずユーザが発話してしまい、音声認識結果に基づく動作が無反応という現象がなくなり、ユーザビリティが向上する。 As described above, according to the present embodiment, the user can grasp the state of the voice processing by providing the recognition state notification unit that notifies the user of the state (recognition state) of the voice processing. For this reason, the user utters regardless of the state in which the recognition is stopped, the phenomenon that the operation based on the voice recognition result does not react is eliminated, and usability is improved.

実施の形態３．
本実施形態３では、発話するタイミングをユーザが指定可能な場合の実施の形態をについて説明する。 Embodiment 3 FIG.
In the third embodiment, an embodiment in the case where the user can specify the timing of speaking will be described.

図１０は、この発明を実施するための実施の形態３における音声認識装置を示すブロック図である。図１０は、前述の実施の形態２の構成に対し、ユーザが発話する際に音声認識に関する処理を開始するための発話スイッチ９を備えたものである。
発話スイッチ９は、ユーザからの指示に基づいて、音声処理部５の起動命令を出力する。例えば、ユーザからの指示を電気信号に変換し、その電気信号に基づいて音声処理部５の起動命令を出力する。 FIG. 10 is a block diagram showing a speech recognition apparatus according to Embodiment 3 for carrying out the present invention. FIG. 10 is provided with an utterance switch 9 for starting processing relating to voice recognition when the user utters, in contrast to the configuration of the second embodiment described above.
The speech switch 9 outputs an activation command for the voice processing unit 5 based on an instruction from the user. For example, an instruction from the user is converted into an electrical signal, and an activation command for the sound processing unit 5 is output based on the electrical signal.

なお、図１０において、前述の図と同一又は同様の部分については同一符号を付し、説明を省略する。ただし、音声処理部５は、前述の構成に加え、ユーザの指示に応じて、入力音声信号に対する音声処理を実行するように構成されており、ここでは、発話スイッチ９からの起動命令を受理し、当該起動命令に基づいて動作するように構成されている。 In FIG. 10, parts that are the same as or similar to those in the previous figures are given the same reference numerals, and descriptions thereof are omitted. However, the voice processing unit 5 is configured to execute voice processing on the input voice signal in response to a user instruction in addition to the above-described configuration. Here, the voice processing unit 5 accepts an activation command from the speech switch 9. , And is configured to operate based on the start command.

次に動作について説明する。
ユーザが発話を開始する際に、発話スイッチ９を押すと、当該発話スイッチ９はユーザからの指示に基づいて、音声処理部５の起動命令を出力する。例えば、ユーザからの指示を電気信号に変換し、その電気信号に基づいて音声処理部５の起動命令を出力する。音声処理部５は、発話スイッチ９からの起動命令を受理すると、上記マイク３の移動状態および発話スイッチ９からの起動命令に基づいて処理を起動・停止する。ここでは、上記マイク３の移動状態および発話スイッチ９からの起動命令に基づいて、認識状態の遷移判定を行い、遷移判定の結果に応じて、音声処理を実施する。 Next, the operation will be described.
When the user starts speaking, when the user presses the speaking switch 9, the speaking switch 9 outputs an activation command for the voice processing unit 5 based on an instruction from the user. For example, an instruction from the user is converted into an electrical signal, and an activation command for the sound processing unit 5 is output based on the electrical signal. When the voice processing unit 5 receives the activation command from the speech switch 9, the voice processing unit 5 activates / stops the processing based on the moving state of the microphone 3 and the activation command from the speech switch 9. Here, based on the moving state of the microphone 3 and the activation command from the utterance switch 9, the recognition state transition determination is performed, and the audio processing is performed according to the transition determination result.

また、認識状態通知部８は、上記音声処理部５による認識状態の遷移判定の結果に基づいて音声処理の状態（認識状態）をユーザに対して通知する。ユーザは、音声処理の認識状態が、認識待ち状態になったことを確認した後、発話を開始する。 Further, the recognition state notification unit 8 notifies the user of the state of speech processing (recognition state) based on the result of determination of transition of the recognition state by the speech processing unit 5. The user starts speaking after confirming that the recognition state of the voice processing is in the recognition waiting state.

図１１は、本実施の形態の認識状態の遷移判定の一例であり、音声処理部５が前述の構成に加え、発話スイッチ９からの起動命令に基づいて動作する場合の例を示す説明図である。なお、前述の図と同一又は同様の処理に同一の符号を付し、説明を省略する。 FIG. 11 is an example of recognition state transition determination according to the present embodiment, and is an explanatory diagram illustrating an example in which the voice processing unit 5 operates based on an activation command from the speech switch 9 in addition to the above-described configuration. is there. In addition, the same code | symbol is attached | subjected to the process same as the above-mentioned figure, or description is abbreviate | omitted.

図１１に示した認識状態の遷移判定の例は、図６に示した認識状態の遷移判定の例に対して、漏れ込み推定量判定（ステップ１０７）の前に、現在状態判定（ステップ１１０）を行い、現在の認識状態が認識中状態でない場合は、発話スイッチ状態の判定（ステップ１１１）を行うようにしたものである。発話スイッチを押してから一定時間内にユーザの発話が開始されなかった場合は、認識停止状態（ステップ１０６ａ）に移行し、発話スイッチを押してから一定時間内は、前述の図６に示した認識状態の遷移判定の例と同様に判定を行う。ただし、本実施の形態においては、発話スイッチ１回につき、認識１回とするために、認識終了後には認識待ち状態（ステップ１０６ｂ）に戻らず認識停止状態（ステップ１０６ａ）となる。 The recognition state transition determination example shown in FIG. 11 is different from the recognition state transition determination example shown in FIG. 6 in the current state determination (step 110) before the leakage estimation amount determination (step 107). If the current recognition state is not the recognition state, the speech switch state determination (step 111) is performed. If the user's utterance is not started within a predetermined time after the utterance switch is pressed, the process proceeds to the recognition stop state (step 106a), and within the predetermined time after the utterance switch is pressed, the recognition state shown in FIG. The determination is performed in the same manner as the transition determination example. However, in this embodiment, since the recognition is performed once for each utterance switch, the recognition is stopped (step 106a) without returning to the recognition waiting state (step 106b) after completion of the recognition.

このように図１１に示した認識状態の遷移判定の例では、発話スイッチを押してから一定時間内にユーザの発話が開始されなかった場合に認識停止状態（ステップ１０６ａ）なるようにし、発話スイッチを押してから一定時間内にユーザの発話が開始された場合は、ユーザ発話中（音声認識中）に一定時間を越えても音声認識に関する処理を継続するという動作にするため、漏れ込み推定量判定（ステップ１０７）の前に、発話スイッチの状態判定（押した後の時間判定を含む）と、現在状態判定を行う。 As described above, in the example of the transition determination of the recognition state shown in FIG. 11, when the user's utterance is not started within a certain time after the utterance switch is pressed, the recognition stop state (step 106a) is set, and the utterance switch is turned on. If the user's utterance is started within a certain period of time after the button is pressed, the leak estimation amount determination ( Before step 107), the state determination of the speech switch (including time determination after pressing) and the current state determination are performed.

以上のように本実施の形態によれば、発話スイッチ９を備えて、マイク３の移動状態に加えて、ユーザの指示による発話スイッチ９からの起動命令に応じて、音声処理を実行する、特に、エコーキャンセル、音声認識等の音声認識に関する処理の起動・停止を行うことにより、ユーザの認識を意図しない発話や雑音等により音声認識を誤認識することを抑えることができる。 As described above, according to the present embodiment, the speech switch 9 is provided, and in addition to the moving state of the microphone 3, voice processing is executed in response to an activation command from the speech switch 9 according to a user instruction. In addition, by starting and stopping processes related to voice recognition such as echo cancellation and voice recognition, it is possible to suppress erroneous recognition of voice recognition due to speech or noise that is not intended for user recognition.

実施の形態４．
本実施の形態４では、音声処理部５がマイク３の移動状態に応じて音声処理を実行するものとして、エコーキャンセラがマイクの移動状態に応じてエコーパス学習を行う場合の実施の形態について説明する。 Embodiment 4 FIG.
In the fourth embodiment, the case where the echo canceller performs echo path learning according to the moving state of the microphone will be described on the assumption that the sound processing unit 5 executes the sound processing according to the moving state of the microphone 3. .

図１２は、この発明を実施するための実施の形態４における音声認識装置を示すブロック図である。なお、図１２において、前述の図と同一又は同様の部分については同一符号を付し、説明を省略する。ただし、エコーキャンセラ６は、前述の構成に加え、上記マイク３の移動状態に応じて、エコーパス学習を実行するように構成されている。例えば、上記マイク３の移動状態に応じて、エコーパス学習の係数を制御するように構成されている。例えば、学習の収束タイミング、学習の精度等のエコーパス学習の係数を変更する。また例えば、所定時間ごとにエコーパス学習を行う場合に、移動中はエコーパスの変動に追従するようにエコーパスの学習を早め、停止中はエコーパスの学習精度を上げるように構成されている。 FIG. 12 is a block diagram showing a speech recognition apparatus according to Embodiment 4 for carrying out the present invention. In FIG. 12, parts that are the same as or similar to those in the previous figures are given the same reference numerals, and descriptions thereof are omitted. However, the echo canceller 6 is configured to perform echo path learning according to the moving state of the microphone 3 in addition to the above-described configuration. For example, the coefficient of echo path learning is controlled in accordance with the moving state of the microphone 3. For example, the echo path learning coefficients such as learning convergence timing and learning accuracy are changed. Further, for example, when performing echo path learning every predetermined time, the learning of the echo path is advanced so as to follow the fluctuation of the echo path during movement, and the learning accuracy of the echo path is increased during the stop.

以上のように本実施の形態によれば、マイク３の移動を監視し、マイク３の移動状態に応じてエコーパス学習を制御することにより、マイク３が移動していない時のエコーキャンセル精度を保ちつつ、移動速度が遅い時のエコーキャンセルが可能になり、移動速度が一定速度より遅い場合にも音声認識を行うことが可能になる。 As described above, according to the present embodiment, the movement of the microphone 3 is monitored, and the echo path learning is controlled according to the movement state of the microphone 3, thereby maintaining the echo cancellation accuracy when the microphone 3 is not moving. On the other hand, echo cancellation when the moving speed is slow is possible, and voice recognition can be performed even when the moving speed is slower than a certain speed.

また、本実施の形態においては、マイクの移動状態として、移動中又は停止中という状態に基づいてエコーパスの学習を制御する場合について説明したが、マイクが所定速度以上で移動している場合に移動中とみなし、所定速度未満で移動している場合に停止中とみなすようにしても良い。 In the present embodiment, the case where the learning of the echo path is controlled based on the moving state or the stopped state is described as the moving state of the microphone. However, the moving is performed when the microphone is moving at a predetermined speed or more. It may be considered that the vehicle is stopped when it is moving at a speed lower than a predetermined speed.

この発明は、特定の用途に限定されるものではないが、例えば、音声認識装置のマイクを取り付けたＡＶ機器、ＴＶ等のリモコンを実現する上で特に有用である。 Although the present invention is not limited to a specific application, it is particularly useful for realizing, for example, a remote controller such as an AV device or a TV to which a microphone of a voice recognition device is attached.

実施の形態１による音声認識装置の構成を示すブロック図である。1 is a block diagram showing a configuration of a voice recognition device according to Embodiment 1. FIG. 実施の形態１の認識状態の遷移判定の一例を示す図説明図である。FIG. 6 is an explanatory diagram illustrating an example of recognition state transition determination according to the first embodiment. 実施の形態１の認識状態の遷移判定の一例を示す図説明図である。FIG. 6 is an explanatory diagram illustrating an example of recognition state transition determination according to the first embodiment. 実施の形態１の認識状態の遷移判定の一例を示す図説明図である。FIG. 6 is an explanatory diagram illustrating an example of recognition state transition determination according to the first embodiment. 実施の形態１の認識状態の遷移判定の一例を示す図説明図である。FIG. 6 is an explanatory diagram illustrating an example of recognition state transition determination according to the first embodiment. 実施の形態１の認識状態の遷移判定の一例を示す図説明図である。FIG. 6 is an explanatory diagram illustrating an example of recognition state transition determination according to the first embodiment. 実施の形態２による音声認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speech recognition apparatus by Embodiment 2. FIG. 認識状態が認識停止中の場合のＴＶでの通知例を示す説明図である。It is explanatory drawing which shows the example of notification on TV when a recognition state is stopping recognition. 認識状態が認識中の場合のＴＶでの通知例を示す説明図である。It is explanatory drawing which shows the example of notification on TV when a recognition state is recognizing. 実施の形態３による音声認識装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a voice recognition device according to a third embodiment. 実施の形態３の認識状態の遷移判定の一例を示す図説明図である。FIG. 10 is an explanatory diagram illustrating an example of recognition state transition determination according to the third embodiment. 実施の形態４による音声認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speech recognition apparatus by Embodiment 4. FIG.

符号の説明Explanation of symbols

１音声出力部、２スピーカ、３マイク、４加速度センサ、５音声処理部、６エコーキャンセラ、７音声認識部、８認識状態通知部、９発話スイッチ。 DESCRIPTION OF SYMBOLS 1 Voice output part, 2 Speaker, 3 Microphone, 4 Acceleration sensor, 5 Voice processing part, 6 Echo canceller, 7 Voice recognition part, 8 Recognition state notification part, 9 Speech switch

Claims

音声を視聴空間に出力するスピーカと、
視聴空間の音声を取り入れ、取り入れた入力音声の入力音声信号を出力するマイクと、
上記マイクの移動状態を監視する加速度センサと、
上記スピーカから出力された音声の空間エコーを上記入力音声信号からキャンセルし、当該空間エコーをキャンセルした入力音声信号の音声認識をする音声処理を、上記加速度センサによって監視されたマイクの移動状態に応じて実行する音声処理部と
を備えたことを特徴とする音声認識装置。 A speaker that outputs audio to the viewing space;
A microphone that takes the audio of the viewing space and outputs the input audio signal of the input audio,
An acceleration sensor for monitoring the moving state of the microphone;
The audio processing for canceling the spatial echo of the audio output from the speaker from the input audio signal and recognizing the audio of the input audio signal canceling the spatial echo depends on the moving state of the microphone monitored by the acceleration sensor. And a speech processing unit for executing the speech recognition.

上記音声処理部は、上記加速度センサによって監視されたマイクの移動状態に加え、現在の音声処理の状態に応じて、上記音声処理を実行するように構成されたことを特徴とする請求項１に記載の音声認識装置。 2. The voice processing unit according to claim 1, wherein the voice processing unit is configured to execute the voice processing according to a current voice processing state in addition to a microphone moving state monitored by the acceleration sensor. The speech recognition apparatus according to the description.

上記音声処理部は、上記加速度センサによって監視されたマイクの移動状態に加え、空間エコーの漏れ込み推定量に応じて、上記音声処理を実行するように構成されたことを特徴とする請求項１に記載の音声認識装置。 2. The voice processing unit is configured to execute the voice processing according to an estimated leak amount of a spatial echo in addition to a moving state of a microphone monitored by the acceleration sensor. The speech recognition apparatus described in 1.

上記音声処理部の音声処理の状態をユーザに通知する認識状態通知部を備えたことを特徴とする請求項１に記載の音声認識装置。 The speech recognition apparatus according to claim 1, further comprising a recognition state notification unit that notifies a user of a state of speech processing of the speech processing unit.

上記音声処理部は、上記加速度センサによって監視されたマイクの移動状態に加え、ユーザの指示に応じて、上記音声処理を実行するように構成されたことを特徴とする請求項１に記載の音声認識装置。 The sound processing unit according to claim 1, wherein the sound processing unit is configured to execute the sound processing in response to a user instruction in addition to a moving state of the microphone monitored by the acceleration sensor. Recognition device.

上記音声処理部は、エコーパス学習を上記加速度センサによって監視されたマイクの移動状態に応じて実行するように構成されたことを特徴とする請求項１に記載の音声認識装置。 The speech recognition apparatus according to claim 1, wherein the speech processing unit is configured to perform echo path learning according to a moving state of the microphone monitored by the acceleration sensor.