JP2015103824A

JP2015103824A - Voice generation system and stand for voice generation apparatus

Info

Publication number: JP2015103824A
Application number: JP2013240485A
Authority: JP
Inventors: 川▲原▼　毅彦; Takehiko Kawahara; 毅彦川▲原▼; 山木　清志; Kiyoshi Yamaki; 清志山木
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2013-11-20
Filing date: 2013-11-20
Publication date: 2015-06-04

Abstract

PROBLEM TO BE SOLVED: To provide a technique for directing a voice generation apparatus in a desired direction by using a voice generated from a sound source different from the voice generation apparatus, in a voice generation system having a voice generation apparatus.SOLUTION: A voice generation system 10 has voice generation apparatus 21, 22, N (N is an integer of 1 or more) microphones 31, 32, N echo cancellers 51, 52, and a drive section for changing the direction of the voice generation apparatus 21, 22. The N echo cancellers 51, 52 remove the voice generated from the voice generation apparatus from the voice collected by the N microphones 31, 32, and output N types of voice generated from a sound source different from the voice generation apparatus, and arriving at the N microphones 31, 32. The drive section changes the direction of the voice generation apparatus depending on the N types of voice outputted from the N echo cancellers 51, 52.

Description

この発明は、音声発生機器や音声発生機器を有する音声発生システムにおいて、音声発生機器とは別個の音源が発する音声を用いて音声発生機器を所望の方向に向ける技術に関する。 The present invention relates to a technology for directing a sound generating device in a desired direction using sound generated by a sound source separate from the sound generating device in a sound generating system having the sound generating device or the sound generating device.

テレビ会議システムでは、話者を撮像するために、複数のマイクロホンを使用して話者の方位を特定し、特定された方位に撮像装置の指向方向を向けることが行われている（例えば、特許文献１参照）。 In a video conference system, in order to capture an image of a speaker, the orientation of the speaker is specified using a plurality of microphones, and the orientation direction of the imaging device is directed to the specified orientation (for example, patents). Reference 1).

特開２００１−２９６３４３号公報JP 2001-296343 A

しかしながら、音源である話者の発する音声により話者の方位を特定する技術は、話者以外に音源がある場合には、正確に話者の方位を特定することが困難である。そのため、従来より、音源の発する音声により当該音源の方位を特定する技術は、音声を発生する機器を話者の方向に向ける方法には適用されてこなかった。また、同様の問題があるため、特定の音源が発する音声を用いて、音声を発生する機器を所望の方向に向けることについては、従来から考慮されてこなかった。 However, it is difficult for the technology for specifying the direction of the speaker based on the voice uttered by the speaker as the sound source to accurately specify the direction of the speaker when there is a sound source other than the speaker. For this reason, conventionally, the technology for specifying the direction of the sound source based on the sound emitted from the sound source has not been applied to a method of directing a device that generates sound toward the speaker. In addition, since there is a similar problem, it has not been conventionally considered to direct a device that generates sound in a desired direction using sound emitted from a specific sound source.

本発明は、上述した従来の課題を解決するためになされたものであり、音声発生機器や音声発生機器を有する音声発生システムにおいて、音声発生機器とは別個の音源が発する音声を用いて音声発生機器を所望の方向に向ける技術を提供することを目的とする。 The present invention has been made to solve the above-described conventional problems. In a sound generation system having a sound generation device or a sound generation device, sound generation is performed using sound generated by a sound source separate from the sound generation device. It aims at providing the technique which orient | assigns an apparatus to a desired direction.

上記課題の少なくとも一部を達成するために、本発明の音声発生システムは、音声を発生する音声発生システムであって、音声発生機器と、Ｎ個（Ｎは、１以上の整数）のマイクと、前記Ｎ個のマイクにより収音された音声から前記音声発生機器が発生する音声を除去して、前記音声発生機器とは別個の音源が発し前記Ｎ個のマイクに到来するＮ種の音声をそれぞれ出力するＮ個のエコーキャンセラと、前記Ｎ個のエコーキャンセラが出力する前記Ｎ種の音声に応じて、前記音声発生機器の方向を変更する駆動部と、を備えることを特徴とする。 In order to achieve at least a part of the above problems, a sound generation system of the present invention is a sound generation system that generates sound, and includes a sound generation device, N (N is an integer of 1 or more) microphones, and The sound generated by the sound generating device is removed from the sound collected by the N microphones, and N types of sound arriving at the N microphones are generated by a sound source separate from the sound generating device. N echo cancellers that respectively output, and a drive unit that changes the direction of the sound generating device according to the N types of sounds output by the N echo cancellers.

この構成によれば、Ｎ個のエコーキャンセラがＮ個のマイクにより収音された音声から音声発生機器が発生する音声を除去する。そして、Ｎ個のエコーキャンセラは、音声発生機器とは別個の音源が発しＮ個のマイクに到来するＮ種の音声をそれぞれ出力する。この場合、音声発生機器が音声を発生している状態においても、音声発生機器とは別個の音源が発した音声をＮ種の音声として抽出することができるので、駆動部がＮ種の音声に応じて音声発生機器の方向を変更することにより、音声発生機器とは別個の音源が発する音声を用いて音声発生機器を所望の方向に向けることが可能となる。 According to this configuration, the N echo cancellers remove the sound generated by the sound generating device from the sound collected by the N microphones. The N echo cancellers output N types of sounds that are emitted from a sound source separate from the sound generation device and arrive at the N microphones. In this case, since the sound generated by the sound source different from the sound generating device can be extracted as N types of sound even when the sound generating device is generating sound, the drive unit converts the sound into N types of sound. By changing the direction of the sound generating device accordingly, the sound generating device can be directed in a desired direction using sound generated by a sound source separate from the sound generating device.

前記音声発生システムにおいて、Ｎは、２以上であり、前記音声発生システムは、さらに、前記Ｎ個のエコーキャンセラが出力する前記Ｎ種の音声に基づいて、前記音源の方向を推定する音源方向推定部を備え、前記駆動部は、前記音源方向推定部により推定された前記音源の方向に基づいて、前記音声発生機器の方向を変更するものとしても良い。 In the speech generation system, N is 2 or more, and the speech generation system further estimates a direction of the sound source based on the N types of speech output from the N echo cancellers. And the drive unit may change the direction of the sound generating device based on the direction of the sound source estimated by the sound source direction estimation unit.

この構成によれば、Ｎ種の音声として抽出された音声発生機器とは別個の音源が発した音声に基づいて、当該音源の方向が推定される。そして、音声発生機器の方向は、推定された音源の方向に基づいて変更されるので、音声発生機器を音声発生機器とは別個の音源の方向により容易に向けることができる。 According to this configuration, the direction of the sound source is estimated based on the sound emitted by a sound source that is separate from the sound generating device extracted as N types of sound. Since the direction of the sound generating device is changed based on the estimated direction of the sound source, the sound generating device can be easily directed by the direction of the sound source separate from the sound generating device.

前記音声発生システムは、さらに、前記Ｎ個のエコーキャンセラが出力する前記Ｎ種の音声に基づいて、前記音源が発した音声の内容を判別する音声認識部を備え、前記駆動部は、前記音声認識部により判別された音声の内容に従って、前記音声発生機器の方向を変更するものとしても良い。 The voice generation system further includes a voice recognition unit that determines the content of the voice generated by the sound source based on the N types of voices output from the N echo cancellers, and the drive unit includes the voice The direction of the sound generating device may be changed according to the content of the sound determined by the recognition unit.

この構成によれば、音声認識部により判別された音声の内容に従って、音声発生機器の方向が変更されるので、より的確に音声発生機器を所望の方向に向けることが可能となる。 According to this configuration, the direction of the sound generating device is changed according to the content of the sound determined by the sound recognizing unit, so that the sound generating device can be directed more accurately in a desired direction.

前記音声発生システムにおいて、Ｎは、２以上であり、音声発生システムは、さらに、前記Ｎ個のエコーキャンセラが出力する前記Ｎ種の音声に基づいて、前記音源の方向を推定する音源方向推定部と、前記Ｎ個のエコーキャンセラが出力する前記Ｎ種の音声に基づいて、前記音源方向推定部により推定された前記音源の方向からの音声を強調する音声強調部と、を備え、前記音声認識部は、前記音声強調部により強調された音声に基づいて前記音源が発した音声の内容を判別するものとしても良い。 In the speech generation system, N is 2 or more, and the speech generation system further estimates a direction of the sound source based on the N types of speech output from the N echo cancellers. And a speech enhancement unit that enhances speech from the direction of the sound source estimated by the sound source direction estimation unit based on the N types of speech output from the N echo cancellers, and the speech recognition The unit may determine the content of the sound generated by the sound source based on the sound emphasized by the sound enhancement unit.

この構成によれば、音源方向推定部により推定された音源の方向からの音声が強調されるので、音源が発した音声は、周囲の雑音よりも相対的に大きくなる。そのため、音声認識部は、より正確に音源が発した音声の内容を判別することができるので、的確に音声発生機器を所望の方向に向けることがより容易となる。 According to this configuration, since the sound from the direction of the sound source estimated by the sound source direction estimating unit is emphasized, the sound emitted from the sound source is relatively larger than the surrounding noise. For this reason, the voice recognition unit can more accurately determine the content of the voice generated by the sound source, and thus it is easier to accurately point the voice generating device in a desired direction.

なお、本発明は、種々の態様で実現することが可能である。例えば、音声発生システム、音声発生機器を配置するための音声発生機器用スタンド、これらの音声発生システムや音声発生機器用スタンドの制御方法、その制御方法を実現するためのプログラム、等の態様で実現することができる。 Note that the present invention can be realized in various modes. For example, the sound generating system, the sound generating device stand for arranging the sound generating device, the control method of the sound generating system and the sound generating device stand, the program for realizing the control method, etc. can do.

第１実施形態としてのスピーカシステムの外観形状を示す説明図。Explanatory drawing which shows the external appearance shape of the speaker system as 1st Embodiment. スピーカシステムの機能的な構成を示すブロック図。The block diagram which shows the functional structure of a speaker system. スピーカシステムにおける処理の流れを示すフローチャート。The flowchart which shows the flow of a process in a speaker system. スピーカシステムの使用状態を示す説明図。Explanatory drawing which shows the use condition of a speaker system. スピーカシステムの使用状態を示す説明図。Explanatory drawing which shows the use condition of a speaker system. 第２実施形態としてのスピーカスタンドの使用状態を示す説明図。Explanatory drawing which shows the use condition of the speaker stand as 2nd Embodiment. スピーカスタンドの機能的な構成を示すブロック図。The block diagram which shows the functional structure of a speaker stand. スピーカスタンドとスピーカシステムとの接続態様の変形例を示す説明図。Explanatory drawing which shows the modification of the connection aspect of a speaker stand and a speaker system. スピーカスタンドとスピーカシステムとの接続態様の変形例を示す説明図。Explanatory drawing which shows the modification of the connection aspect of a speaker stand and a speaker system. 第３実施形態としてのスピーカスタンドの使用状態を示す説明図。Explanatory drawing which shows the use condition of the speaker stand as 3rd Embodiment. スピーカスタンドの機能的な構成を示すブロック図。The block diagram which shows the functional structure of a speaker stand. 第４実施形態としてのスピーカシステムの機能的な構成を示すブロック図。The block diagram which shows the functional structure of the speaker system as 4th Embodiment. スピーカシステムにおける処理の流れを示すフローチャート。The flowchart which shows the flow of a process in a speaker system.

Ａ．第１実施形態：
Ａ１．スピーカシステムの構成：
図１は、本発明の第１実施形態としてのスピーカシステム１０の外観形状を示す説明図である。図１（ａ）は、スピーカシステム１０の斜視図であり、図１（ｂ）は、スピーカシステム１０の底面図である。図１に示すスピーカシステム１０では、２つのスピーカユニット２１，２２と、２つのマイク３１，３２と、脚部４０とが筐体１１に取り付けられている。脚部４０は、図示しないモータを介して筐体１１に取り付けられている。このモータを駆動することにより、筐体１１は、回転軸４１を中心として左右方向に回転する。このとき、スピーカシステム１０は、回転軸４１を中心に全体としてその向きを変えるので、スピーカシステム１０自体が回転軸４１を中心に回転しているものと捉えることができる。なお、図１の例では、脚部４０を円筒状に描いているが、脚部４０の形状は種々変更することができる。 A. First embodiment:
A1. Speaker system configuration:
FIG. 1 is an explanatory diagram showing the external shape of a speaker system 10 as a first embodiment of the present invention. FIG. 1A is a perspective view of the speaker system 10, and FIG. 1B is a bottom view of the speaker system 10. In the speaker system 10 shown in FIG. 1, two speaker units 21 and 22, two microphones 31 and 32, and a leg portion 40 are attached to the housing 11. The leg 40 is attached to the housing 11 via a motor (not shown). By driving this motor, the housing 11 rotates in the left-right direction around the rotation shaft 41. At this time, since the direction of the speaker system 10 changes as a whole around the rotation axis 41, it can be considered that the speaker system 10 itself rotates around the rotation axis 41. In addition, in the example of FIG. 1, although the leg part 40 is drawn cylindrically, the shape of the leg part 40 can be variously changed.

図２は、スピーカシステム１０の機能的な構成を示すブロック図である。スピーカシステム１０は、２つのスピーカユニット２１，２２と、２つのマイク３１，３２と、２つのマイク３１，３２にそれぞれ接続された２つのエコーキャンセラ５１，５２と、方向調整部６０と、脚部４０（図１）を筐体１１に対して回転させるモータ７０と、モータ７０を制御するモータ制御部７１と、を有している。方向調整部６０は、外部からスピーカシステム１０に到来した音の到来方向の推定（後述する）を行う到来方向推定部６１を有している。 FIG. 2 is a block diagram showing a functional configuration of the speaker system 10. The speaker system 10 includes two speaker units 21, 22, two microphones 31, 32, two echo cancellers 51, 52 connected to the two microphones 31, 32, a direction adjustment unit 60, and leg portions A motor 70 that rotates the motor 40 (FIG. 1) with respect to the housing 11 and a motor control unit 71 that controls the motor 70 are provided. The direction adjustment unit 60 includes an arrival direction estimation unit 61 that performs estimation (described later) of the arrival direction of sound arriving at the speaker system 10 from the outside.

スピーカシステム１０には、所定の音声信号用のインタフェース（図示しない）が設けられており、このインタフェースを介して接続された音楽プレーヤＭＰから、ＬとＲの２チャンネルの音声信号（ステレオ音声信号）が入力される。なお、音声信号用のインタフェースとしては、音声信号が伝送できるものであれば、有線・無線を問わず任意の種類のインタフェースを用いることができる。ステレオ音声信号のうちの、Ｌチャンネルの音声信号はスピーカユニット２１に供給され、Ｒチャンネルの音声信号はスピーカユニット２２に供給される。これにより、スピーカシステム１０では、ステレオ音声が再生される。なお、以下では、特に断らない限り、音声信号とはステレオ音声信号のことをいう。 The speaker system 10 is provided with an interface (not shown) for a predetermined audio signal, and two audio signals (stereo audio signals) of L and R from a music player MP connected via this interface. Is entered. As an interface for audio signals, any type of interface can be used regardless of whether it is wired or wireless as long as it can transmit audio signals. Of the stereo audio signals, the L channel audio signal is supplied to the speaker unit 21, and the R channel audio signal is supplied to the speaker unit 22. Thereby, stereo sound is reproduced in the speaker system 10. In the following description, unless otherwise specified, the audio signal refers to a stereo audio signal.

なお、第１実施形態のスピーカシステム１０は、音声を発生する機器であると同時に、全体として音声を発生する機能を有しているので、音声発生システムともいうことができる。この場合において、スピーカユニット２１，２２は、音声発生システムにおける音声発生機器として捉えることも可能である。 Note that the speaker system 10 according to the first embodiment is a device that generates sound, and at the same time has a function of generating sound as a whole, and thus can also be referred to as a sound generation system. In this case, the speaker units 21 and 22 can also be regarded as sound generation devices in the sound generation system.

Ａ２．スピーカシステムにおける処理の流れとスピーカシステムの動作：
図３は、スピーカシステム１０（図２）における処理の流れを示すフローチャートである。スピーカシステム１０は、ＣＰＵと、ＲＯＭと、ＲＡＭと、マイク３１，３２やモータ７０等とのインタフェースと（いずれも図示しない）を有するコンピュータとしての機能を備えている。ＣＰＵがＲＯＭやＲＡＭに格納されたプログラムを実行することにより、図３（ａ）に示す方向調整処理と、図３（ｂ）に示すモータ制御処理とが実行され、エコーキャンセラ５１，５２、方向調整部６０およびモータ制御部７１の各機能部が実現される。この場合、機能部間の種々の信号の授受は、ＲＡＭの所定の領域にデータを格納し、格納されたデータを読み出すことにより行われる。 A2. Process flow and operation of the speaker system:
FIG. 3 is a flowchart showing the flow of processing in the speaker system 10 (FIG. 2). The speaker system 10 has a function as a computer having a CPU, a ROM, a RAM, and interfaces (not shown) with microphones 31 and 32, a motor 70, and the like. When the CPU executes a program stored in the ROM or RAM, the direction adjustment processing shown in FIG. 3A and the motor control processing shown in FIG. 3B are executed, and the echo cancellers 51 and 52, directions Each function part of the adjustment part 60 and the motor control part 71 is implement | achieved. In this case, the exchange of various signals between the functional units is performed by storing data in a predetermined area of the RAM and reading the stored data.

図４および図５は、スピーカシステム１０の使用状態を示す説明図であり、図３の方向調整処理およびモータ制御処理が実行された際のスピーカシステム１０の動作を示している。図４および図５の例では、音楽プレーヤＭＰ（図２）から供給される音声信号により、スピーカユニット２１，２２から音楽が再生されている。図４（ａ）に示す初期の状態において、ユーザＵＳＲは、一点鎖線で示すスピーカシステム１０の正面方向から、スピーカシステム１０に向かって右にずれた方向に位置している。 4 and 5 are explanatory diagrams showing the usage state of the speaker system 10 and show the operation of the speaker system 10 when the direction adjustment process and the motor control process of FIG. 3 are executed. In the example of FIGS. 4 and 5, music is reproduced from the speaker units 21 and 22 by an audio signal supplied from the music player MP (FIG. 2). In the initial state shown in FIG. 4A, the user USR is located in a direction shifted to the right toward the speaker system 10 from the front direction of the speaker system 10 indicated by a one-dot chain line.

図３（ａ）に示す方向調整処理は、タイマ割り込み等の機能により定期的に実行される。方向調整処理のステップＳ１１では、スピーカシステム１０が備えるマイクの数だけ、エコーキャンセルの処理が行われる。このエコーキャンセルの処理は、並列に実行される。第１実施形態のスピーカシステム１０では、エコーキャンセラ５１，５２（図２）が、それぞれに接続されたマイク３１，３２により収音されたマイク入力音に対してエコーキャンセルの処理を施す。具体的には、エコーキャンセラ５１，５２は、音楽プレーヤＭＰ（図２）から入力された音声信号を参照信号とし、その参照信号を適応フィルタに通して得られる疑似エコー信号をマイク入力音から差し引く（例えば、電子情報通信学会『知識の森』（http://www.ieice-hbkb.org/）２群−６編−５章参照）。これにより、スピーカユニット２１，２２が発した音がマイク３１，３２に伝達されることにより発生する音響エコーが除去され、スピーカシステム１０とは別個の音源が発することにより、スピーカシステム１０の外部から到来した音（外部到来音）が抽出される。抽出された外部到来音は、エコーキャンセラ５１，５２から方向調整部６０（図２）の到来方向推定部６１に供給される。なお、スピーカシステム１０には、２つのマイク３１，３２と２つのエコーキャンセラ５１，５２とが設けられているため、２つのエコーキャンセラ５１，５２が出力する外部到来音は、２種の音声として到来方向推定部６１に供給される。 The direction adjustment process shown in FIG. 3A is periodically executed by a function such as a timer interrupt. In step S11 of the direction adjustment processing, echo cancellation processing is performed for the number of microphones included in the speaker system 10. This echo cancellation process is executed in parallel. In the speaker system 10 according to the first embodiment, the echo cancellers 51 and 52 (FIG. 2) perform echo cancellation processing on the microphone input sounds collected by the microphones 31 and 32 connected thereto. Specifically, the echo cancellers 51 and 52 use the audio signal input from the music player MP (FIG. 2) as a reference signal, and subtract the pseudo echo signal obtained by passing the reference signal through the adaptive filter from the microphone input sound. (For example, see IEICE “Knowledge Forest” (http://www.ieice-hbkb.org/) Group 2-6-6). As a result, the acoustic echo generated when the sound emitted from the speaker units 21 and 22 is transmitted to the microphones 31 and 32 is removed, and a sound source separate from the speaker system 10 is emitted. The incoming sound (external incoming sound) is extracted. The extracted external incoming sound is supplied from the echo cancellers 51 and 52 to the arrival direction estimation unit 61 of the direction adjustment unit 60 (FIG. 2). Since the speaker system 10 is provided with two microphones 31 and 32 and two echo cancellers 51 and 52, the external incoming sounds output by the two echo cancellers 51 and 52 are two types of sounds. This is supplied to the arrival direction estimation unit 61.

図４（ａ）の例では、スピーカユニット２１，２２から音楽が再生されている状態で、ユーザＵＳＲが声を発している。ユーザＵＳＲが発した声は、スピーカシステム１０のマイク３１，３２により収音される。マイク３１，３２により収音された音（マイク入力音）には、ユーザＵＳＲの声と、スピーカユニット２１，２２により再生されている音楽（すなわち、音響エコー）との双方が含まれている。マイク入力音に対して、エコーキャンセルの処理を施すことにより（図３（ａ）のステップＳ１１）、ユーザＵＳＲの声が外部到来音として抽出される。 In the example of FIG. 4A, the user USR utters a voice while music is being played from the speaker units 21 and 22. The voice uttered by the user USR is collected by the microphones 31 and 32 of the speaker system 10. The sound (microphone input sound) collected by the microphones 31 and 32 includes both the voice of the user USR and the music (that is, acoustic echo) reproduced by the speaker units 21 and 22. By performing echo cancellation processing on the microphone input sound (step S11 in FIG. 3A), the voice of the user USR is extracted as an external incoming sound.

図３（ａ）のステップＳ１２において、到来方向推定部６１（図２）は、エコーキャンセラ５１，５２から供給された外部到来音に基づいて、音声の到来方向を推定する。音声の到来方向は、遅延時間推定法等の周知の音声到来方向推定技術（例えば、電子情報通信学会『知識の森』（http://www.ieice-hbkb.org/）２群−６編−３章参照）を用いて推定される。なお、到来方向推定部６１は、スピーカシステム１０とは別個の音源が発した外部到来音の到来方向を推定しているので、当該音源の方向を推定する音源方向推定部ともいうことができる。 In step S <b> 12 of FIG. 3A, the arrival direction estimation unit 61 (FIG. 2) estimates the arrival direction of speech based on the external arrival sound supplied from the echo cancellers 51 and 52. Speech arrival direction is determined by well-known speech arrival direction estimation technology such as delay time estimation method (for example, IEICE “Knowledge Forest” (http://www.ieice-hbkb.org/) 2-6) -3)). In addition, since the arrival direction estimation unit 61 estimates the arrival direction of the external incoming sound emitted from a sound source separate from the speaker system 10, it can also be referred to as a sound source direction estimation unit that estimates the direction of the sound source.

到来方向推定部６１により推定された外部到来音の到来方向は、スピーカシステム１０を外部到来音の到来方向へ向けるために必要な回転角度（以下、単に「回転角度」とも呼ぶ）として与えられる。なお、以下の説明では、回転角度を、スピーカシステム１０を上から見て反時計回りの方向を正とし、時計回りの方向を負とする符号付の角度として表す。但し、回転角度は、この反対に、スピーカシステム１０を上から見て時計回りの方向を正とし、反時計回りの方向を負とするものとしても良い。 The arrival direction of the external incoming sound estimated by the arrival direction estimation unit 61 is given as a rotation angle (hereinafter, also simply referred to as “rotation angle”) necessary to direct the speaker system 10 in the arrival direction of the external incoming sound. In the following description, the rotation angle is expressed as a signed angle with the counterclockwise direction as viewed from above the speaker system 10 being positive and the clockwise direction being negative. However, on the contrary, the rotation angle may be positive when the speaker system 10 is viewed from the top and positive when it is counterclockwise.

ステップＳ１３において、方向調整部６０（図２）は、外部到来音の到来方向が特定されたか否かを判断する。具体的には、エコーキャンセラ５１，５２により抽出された外部到来音の音量が予め設定された閾値よりも小さい場合、あるいは、複数の方向から音が到来して優位な到来方向が特定できなかった場合等には、外部到来音の到来方向が特定されなかったものと判断される。一方、外部到来音の音量が十分に大きく、かつ、優位な到来方向が特定できた場合には、外部到来音の到来方向が特定されたものと判断される。外部到来音の到来方向が特定されたものと判断された場合には、制御はステップＳ１４に移され、外部到来音の到来方向が特定されなかったものと判断された場合には、方向調整処理は終了する。 In step S13, the direction adjustment unit 60 (FIG. 2) determines whether or not the arrival direction of the external incoming sound has been specified. Specifically, when the volume of the external incoming sound extracted by the echo cancellers 51 and 52 is smaller than a preset threshold value, or when the sound arrives from a plurality of directions, the dominant arrival direction cannot be specified. In some cases, it is determined that the arrival direction of the external incoming sound has not been specified. On the other hand, if the volume of the external incoming sound is sufficiently large and the dominant direction of arrival can be specified, it is determined that the direction of arrival of the external incoming sound has been specified. If it is determined that the direction of arrival of the external incoming sound has been specified, control is transferred to step S14. If it is determined that the direction of arrival of the external incoming sound has not been specified, the direction adjustment processing is performed. Ends.

図３（ａ）のステップＳ１４において、方向調整部６０（図２）は、制御信号として、到来方向推定部６１により推定された外部到来音の到来方向、すなわち、回転角度をモータ制御部７１に送出する。そして、モータ制御部７１に制御信号を送出した後、方向調整処理は終了する。なお、回転角度の絶対値が予め設定された値よりも小さい場合、モータ制御部７１への制御信号の送出を省略するものとしても良い。 In step S14 of FIG. 3 (a), the direction adjustment unit 60 (FIG. 2) sends the arrival direction of the external incoming sound estimated by the arrival direction estimation unit 61, that is, the rotation angle to the motor control unit 71 as a control signal. Send it out. And after sending a control signal to the motor control part 71, a direction adjustment process is complete | finished. When the absolute value of the rotation angle is smaller than a preset value, the transmission of the control signal to the motor control unit 71 may be omitted.

図４（ｂ）は、ユーザＵＳＲの声（すなわち、外部到来音）の到来方向が推定された状態を示している。ユーザＵＳＲは、一点鎖線で示すスピーカシステム１０の正面方向から、スピーカシステム１０に向かって右にずれた方向において声を発している。そのため、図３（ａ）のステップＳ１２において、音声の到来方向は、二点鎖線で示すユーザＵＳＲの方向と推定される。図４（ｂ）の例では、音声の到来方向（ユーザＵＳＲの方向）は、スピーカシステム１０の正面方向から、スピーカシステム１０を上から見て反時計回りに角度θ回転させた方向と推定されている。そして、図３（ａ）のステップＳ１３においては、外部到来音の到来方向が特定されたものと判断され、ステップＳ１４において、回転角度θが制御信号としてモータ制御部７１に送出される。 FIG. 4B shows a state in which the arrival direction of the voice of the user USR (that is, the external incoming sound) is estimated. The user USR utters a voice in a direction shifted to the right from the front direction of the speaker system 10 indicated by a one-dot chain line toward the speaker system 10. Therefore, in step S <b> 12 of FIG. 3A, the voice arrival direction is estimated as the direction of the user USR indicated by a two-dot chain line. In the example of FIG. 4B, the voice arrival direction (the direction of the user USR) is estimated from the front direction of the speaker system 10 as a direction rotated by an angle θ counterclockwise when the speaker system 10 is viewed from above. ing. In step S13 of FIG. 3A, it is determined that the direction of arrival of the external incoming sound has been specified, and in step S14, the rotation angle θ is sent as a control signal to the motor control unit 71.

図３（ｂ）に示すモータ制御処理は、タイマ割り込み等の機能により定期的に実行される。モータ制御処理のステップＳ２１において、モータ制御部７１（図２）は、方向調整部６０（図２）から制御信号が送出されたか否かを判断する。制御信号が送出されなかったものと判断された場合、制御はステップＳ２３に移される。一方、方向調整部６０からモータ制御部７１に制御信号が送出されたものと判断された場合、制御はステップＳ２２に移される。制御信号が送出されたか否かは、ＲＡＭの所定の領域に格納されたフラグのオン・オフ状態や、制御信号に対応するデータの数値に基づいて判断することができる。なお、図３（ｂ）に示すモータ制御処理では、ステップＳ２１で制御信号が送出されなかったものと判断された場合、制御をステップＳ２３に移しているが、制御をステップＳ２３に移すことなく、モータ制御処理を終了するものとしても良い。 The motor control process shown in FIG. 3B is periodically executed by a function such as a timer interrupt. In step S21 of the motor control process, the motor control unit 71 (FIG. 2) determines whether or not a control signal is sent from the direction adjustment unit 60 (FIG. 2). If it is determined that the control signal has not been sent, control is transferred to step S23. On the other hand, when it is determined that the control signal is sent from the direction adjustment unit 60 to the motor control unit 71, the control is moved to step S22. Whether or not a control signal has been sent can be determined based on the on / off state of a flag stored in a predetermined area of the RAM and the numerical value of the data corresponding to the control signal. In the motor control process shown in FIG. 3B, when it is determined in step S21 that the control signal has not been sent, the control is transferred to step S23, but the control is not transferred to step S23. The motor control process may be terminated.

ステップＳ２２において、モータ制御部７１（図２）は、制御信号として送出された回転角度と、実行時点においてスピーカシステム１０が向いている方向（現時点方向）とに基づいて、スピーカシステム１０を向ける目標方向を更新する。目標方向および現時点方向は、脚部４０（図４）を基準とする特定の方向（例えば、筐体１１と脚部４０との機械的な原点方向）からの角度として表される。この場合、目標方向は、現時点方向に回転角度を加算した角度に更新される。 In step S22, the motor control unit 71 (FIG. 2) targets the speaker system 10 based on the rotation angle sent as the control signal and the direction (current direction) that the speaker system 10 faces at the time of execution. Update direction. The target direction and the current direction are expressed as angles from a specific direction (for example, the mechanical origin direction of the housing 11 and the leg 40) with respect to the leg 40 (FIG. 4). In this case, the target direction is updated to an angle obtained by adding the rotation angle to the current direction.

図５（ａ）は、図３（ｂ）のステップＳ２２において目標方向が更新された状態を示している。図５（ａ）の例では、現時点方向は、一点鎖線で示すスピーカシステム１０の正面方向となっている。そして、回転角度がθとなっているので、目標方向は、二点鎖線で示すユーザＵＳＲの方向となる。 FIG. 5A shows a state in which the target direction has been updated in step S22 of FIG. In the example of FIG. 5A, the current direction is the front direction of the speaker system 10 indicated by a one-dot chain line. Since the rotation angle is θ, the target direction is the direction of the user USR indicated by a two-dot chain line.

図３（ｂ）のステップＳ２３において、モータ制御部７１（図２）は、モータ７０を駆動してスピーカシステム１０を目標方向に向かって回転させる。具体的には、モータ制御部７１は、目標方向と現時点方向との角度差（すなわち、回転角度）に応じて、モータ７０を駆動する。これにより、スピーカシステム１０は、回転角度分だけ回転して、目標方向に向く。このように、モータ７０と、モータ制御部７１とにより、スピーカシステム１０の方向は変化させられる。そのため、モータ７０とモータ制御部７１とを合わせて一個の機能部（駆動部）として捉えることも可能である。 In step S23 of FIG. 3B, the motor control unit 71 (FIG. 2) drives the motor 70 to rotate the speaker system 10 toward the target direction. Specifically, the motor control unit 71 drives the motor 70 in accordance with the angle difference (that is, the rotation angle) between the target direction and the current direction. Thereby, the speaker system 10 rotates by the rotation angle and faces the target direction. As described above, the direction of the speaker system 10 is changed by the motor 70 and the motor control unit 71. Therefore, the motor 70 and the motor control unit 71 can be combined and regarded as one functional unit (drive unit).

図５（ｂ）は、モータ７０が駆動されてスピーカシステム１０が回転した状態を示している。図５（ｂ）に示すように、図３（ｂ）のステップＳ２３においてモータ７０を回転させることにより、スピーカシステム１０の筐体１１は、脚部４０に対して回転角度θだけ回転する。これにより、一点鎖線で示す方向を向いていたスピーカシステム１０は、二点鎖線で示す目標方向（すなわち、ユーザＵＳＲの方向）を向く。 FIG. 5B shows a state where the motor 70 is driven and the speaker system 10 is rotated. As shown in FIG. 5B, by rotating the motor 70 in step S23 of FIG. 3B, the casing 11 of the speaker system 10 rotates by the rotation angle θ with respect to the leg portion 40. Thereby, the speaker system 10 which has faced the direction shown with a dashed-dotted line faces the target direction (namely, direction of user USR) shown with a dashed-two dotted line.

このように、第１実施形態のスピーカシステム１０では、２つのマイク３１，３２により収音された音（マイク入力音）にエコーキャンセル処理を施すことにより、マイク入力音から、スピーカユニット２１，２２からマイク３１，３２に伝達されてきた音（音響エコー）が除去される。そして、マイク入力音から音響エコーを除去した音（外部到来音）から音声の到来方向を推定し、推定された到来方向に基づいてスピーカシステム１０を回転する。これにより、スピーカシステム１０において音声が再生されている場合においても、ユーザＵＳＲの発した声の到来方向、すなわち、ユーザＵＳＲの方向にスピーカシステム１０を向けることが可能となる。 As described above, in the speaker system 10 according to the first embodiment, the echo units are subjected to echo cancellation processing on the sounds (microphone input sounds) collected by the two microphones 31 and 32, so that the speaker units 21 and 22 can be obtained from the microphone input sounds. The sound (acoustic echo) transmitted to the microphones 31 and 32 is removed. Then, the arrival direction of the sound is estimated from the sound (external arrival sound) obtained by removing the acoustic echo from the microphone input sound, and the speaker system 10 is rotated based on the estimated arrival direction. Thereby, even when sound is reproduced in the speaker system 10, the speaker system 10 can be directed in the direction of arrival of the voice uttered by the user USR, that is, in the direction of the user USR.

なお、第１実施形態のスピーカシステム１０では、２つのマイク３１，３２と２つのエコーキャンセラ５１，５２を用いているが、マイクおよびエコーキャンセラの数は、２以上であれば、任意の数とすることができる。マイクおよびエコーキャンセラの数を３以上とすることにより、左右および上下の各方向における回転角度として、音声の到来方向を推定することができる。そして、左右方向に加え、上下方向に回転させるモータを用いることにより、ユーザの姿勢に関わらずスピーカシステムをユーザの頭部へ向けることが可能となる。 In the speaker system 10 of the first embodiment, the two microphones 31 and 32 and the two echo cancellers 51 and 52 are used. However, if the number of microphones and echo cancellers is two or more, any number can be used. can do. By setting the number of microphones and echo cancellers to 3 or more, the voice arrival direction can be estimated as the rotation angles in the left and right and up and down directions. Then, by using a motor that rotates in the vertical direction in addition to the horizontal direction, the speaker system can be directed to the user's head regardless of the user's posture.

第１実施形態のスピーカシステム１０では、モータ制御処理（図３（ｂ））のステップＳ２３において、目標方向と現時点方向との角度差（回転角度）に応じて、モータ７０を駆動しているが、回転角度が予め設定された上限角度以上である場合には、目標方向に向かって上限角度の分だけモータ７０を駆動し、回転角度が上限角度未満である場合には、目標方向に向かって回転角度分だけモータ７０を駆動するものとしても良い。このようにしても、モータ制御処理を複数回繰り返すことにより、スピーカシステム１０を外部到来音の到来方向（ユーザの方向）に向けることができる。このように上限角度を設定した場合、以下のような効果を奏する。ユーザが一時的に移動した先で声を発した場合においても、スピーカシステム１０は、直ちにユーザが移動した先の方向に向かない。そのため、ユーザが移動先から戻った際においても、スピーカシステム１０は、ユーザの方向に近い方向を向いた状態にすることができる。また、移動先から戻ったユーザが声を発してスピーカシステム１０をユーザの方向に向ける際には、スピーカシステム１０の回転角度を小さくすることができるので、より速やかにスピーカシステム１０をユーザの方向に向けることが可能となる。 In the speaker system 10 of the first embodiment, the motor 70 is driven according to the angle difference (rotation angle) between the target direction and the current direction in step S23 of the motor control process (FIG. 3B). When the rotation angle is greater than or equal to the preset upper limit angle, the motor 70 is driven by the upper limit angle toward the target direction, and when the rotation angle is less than the upper limit angle, the motor 70 is moved toward the target direction. The motor 70 may be driven by the rotation angle. Even in this case, by repeating the motor control process a plurality of times, the speaker system 10 can be directed in the direction of arrival of the external incoming sound (the direction of the user). When the upper limit angle is set in this way, the following effects are obtained. Even when the user utters a voice where the user has temporarily moved, the speaker system 10 does not immediately face the direction in which the user has moved. Therefore, even when the user returns from the destination, the speaker system 10 can be in a state of facing a direction close to the user's direction. Further, when the user returning from the destination speaks and directs the speaker system 10 toward the user, the rotation angle of the speaker system 10 can be reduced, so that the speaker system 10 can be moved more quickly toward the user. It becomes possible to turn to.

Ｂ．第２実施形態：
Ｂ１．スピーカスタンドの構成：
図６は、本発明の第２実施形態としてのスピーカスタンド８０の使用状態を示す説明図である。スピーカスタンド８０は、その筐体８１の上面にスピーカシステム１０ａを配置するように構成されている。スピーカシステム１０ａは、筐体１１ａに取り付けられた２つのスピーカユニット２１ａ，２２ａを有している。また、スピーカスタンド８０では、２つのマイク３１ａ，３２ａおよび脚部４０ａが筐体８１に取り付けられている。脚部４０ａは、第１実施形態のスピーカシステム１０と同様に、図示しないモータを介して筐体８１に取り付けられており、スピーカスタンド８０および筐体８１の上面に配置されたスピーカシステム１０ａを回転させることができる。なお、スピーカシステム１０ａは、音声を発生する機器であるので、音声発生機器とも言うことができる。 B. Second embodiment:
B1. Speaker stand configuration:
FIG. 6 is an explanatory diagram showing a usage state of the speaker stand 80 as the second embodiment of the present invention. The speaker stand 80 is configured to arrange the speaker system 10 a on the upper surface of the housing 81. The speaker system 10a has two speaker units 21a and 22a attached to the housing 11a. In the speaker stand 80, two microphones 31 a and 32 a and a leg portion 40 a are attached to the housing 81. Similarly to the speaker system 10 of the first embodiment, the leg portion 40a is attached to the housing 81 via a motor (not shown), and rotates the speaker stand 80 and the speaker system 10a disposed on the upper surface of the housing 81. Can be made. Since the speaker system 10a is a device that generates sound, it can also be referred to as a sound generating device.

図７は、スピーカスタンド８０の機能的な構成を示すブロック図である。スピーカスタンド８０は、スピーカユニット２１，２２が省略されている点と、スピーカユニット２１，２２に換えて外部に接続されたスピーカシステム１０ａに音声信号を供給している点で、第１実施形態のスピーカシステム１０と異なっている。他の点は、第１実施形態のスピーカシステム１０と同様である。 FIG. 7 is a block diagram showing a functional configuration of the speaker stand 80. The speaker stand 80 is the same as that of the first embodiment in that the speaker units 21 and 22 are omitted and that an audio signal is supplied to the speaker system 10a connected to the outside instead of the speaker units 21 and 22. Different from the speaker system 10. Other points are the same as those of the speaker system 10 of the first embodiment.

スピーカスタンド８０においても、第１実施形態のスピーカシステム１０と同様の処理（図３）が行われる。そのため、第１実施形態と同様に、スピーカシステム１０ａにおいて音声が再生されている場合においても、ユーザが声を発することにより、ユーザの発した声の到来方向、すなわち、ユーザの方向にスピーカスタンド８０およびスピーカシステム１０ａを向けることが可能となる。 In the speaker stand 80, the same processing (FIG. 3) as that of the speaker system 10 of the first embodiment is performed. Therefore, as in the first embodiment, even when sound is reproduced in the speaker system 10a, the speaker stands 80 in the direction of arrival of the voice uttered by the user, that is, in the direction of the user when the user utters the voice. And it becomes possible to point the speaker system 10a.

Ｂ２．接続態様の変形例：
スピーカスタンド８０には、所定の音声信号用のインタフェース（図示しない）が設けられている。図７の例では、音楽プレーヤＭＰとスピーカシステム１０ａとは、このインタフェースを介してスピーカスタンド８０に接続されている。そして、音楽プレーヤＭＰが出力する音声信号は、一旦スピーカスタンド８０に入力された後、スピーカスタンド８０からスピーカシステム１０ａに供給されている。しかしながら、スピーカスタンド、音楽プレーヤおよびスピーカシステムは、図７とは異なる態様で接続されるものとしても良い。なお、音声信号用のインタフェースとしては、音声信号を伝送できるものであれば、有線・無線を問わず任意の種類のインタフェースを用いることができる。 B2. Variation of connection mode:
The speaker stand 80 is provided with an interface (not shown) for a predetermined audio signal. In the example of FIG. 7, the music player MP and the speaker system 10a are connected to the speaker stand 80 via this interface. The audio signal output from the music player MP is once input to the speaker stand 80 and then supplied from the speaker stand 80 to the speaker system 10a. However, the speaker stand, the music player, and the speaker system may be connected in a manner different from that shown in FIG. As an interface for audio signals, any type of interface can be used regardless of wired or wireless as long as it can transmit audio signals.

図８および図９は、スピーカスタンド８０ａとスピーカシステム１０ａ，１０ｇとの接続態様の変形例を示す説明図である。なお、これらの場合、スピーカスタンド８０ａからは、音声信号を出力する必要がない。そのため、図８および図９に示すように、スピーカスタンド８０ａからは、音声信号を出力するための線路が省略され、入力された音声信号は、エコーキャンセラ５１，５２で使用される参照信号としてのみ用いられる。但し、図８および図９に示すスピーカスタンド８０ａに換えて、図７に示すスピーカスタンド８０をそのまま使用することも可能である。 FIG. 8 and FIG. 9 are explanatory diagrams showing modifications of the connection mode between the speaker stand 80a and the speaker systems 10a and 10g. In these cases, it is not necessary to output an audio signal from the speaker stand 80a. Therefore, as shown in FIGS. 8 and 9, a line for outputting an audio signal is omitted from the speaker stand 80 a, and the input audio signal is only used as a reference signal used in the echo cancellers 51 and 52. Used. However, the speaker stand 80 shown in FIG. 7 can be used as it is instead of the speaker stand 80a shown in FIGS.

図８に示す接続態様の第１の変形例では、スピーカスタンド８０ａおよびスピーカシステム１０ａは、音楽プレーヤＭＰに並列に接続されている。そして、音楽プレーヤＭＰから出力される音声信号は、スピーカスタンド８０ａおよびスピーカシステム１０ａのそれぞれに直接供給される。そのため、スピーカシステム１０ａへの音声信号の伝送経路をより簡略化することができるので、音声信号の伝送過程で音質が低下することを抑制することが可能となる。 In the first modification of the connection mode shown in FIG. 8, the speaker stand 80a and the speaker system 10a are connected in parallel to the music player MP. The audio signal output from the music player MP is directly supplied to each of the speaker stand 80a and the speaker system 10a. Therefore, since the transmission path of the audio signal to the speaker system 10a can be further simplified, it is possible to suppress deterioration in sound quality during the audio signal transmission process.

図９に示す接続態様の第２の変形例では、音楽プレーヤＭＰが出力する音声信号は、一旦スピーカシステム１０ｇに入力された後、スピーカシステム１０ｇからスピーカスタンド８０ａに供給される。この場合、スピーカシステム１０ｇには、音声信号を出力するための線路が付加されるので、スピーカシステム１０ｇの構成が、図７および図８のスピーカシステム１０ａよりも複雑になる。但し、スピーカシステム１０ｇが出力する音声信号を、スピーカシステム１０ｇの再生特性等に合わせて補正することができる。そのため、スピーカスタンド８０ａでは、音響エコーの除去をより確実に行うことができ、外部到来音の到来方向をより正確に推定することが可能となる。 In the second modification of the connection mode shown in FIG. 9, the audio signal output from the music player MP is once input to the speaker system 10g and then supplied from the speaker system 10g to the speaker stand 80a. In this case, since a line for outputting an audio signal is added to the speaker system 10g, the configuration of the speaker system 10g becomes more complicated than the speaker system 10a of FIGS. However, the audio signal output from the speaker system 10g can be corrected in accordance with the reproduction characteristics of the speaker system 10g. Therefore, in the speaker stand 80a, the acoustic echo can be more reliably removed, and the arrival direction of the external incoming sound can be estimated more accurately.

Ｃ．第３実施形態：
図１０は、本発明の第３実施形態としてのスピーカスタンド８０ｂの使用状態を示す説明図である。図１０は、スピーカスタンド８０ｂと、その筐体８１ｂの上面に配置されたスピーカシステム１０ｂとからなる組を２組用いたスピーカセット１を示している。スピーカシステム１０ｂは、筐体１１ｂに取り付けられた１つのスピーカユニット２３を有している。スピーカスタンド８０ｂでは、１つのマイク３３および脚部４０ｂが筐体８１ｂに取り付けられている。脚部４０ｂは、第１実施形態のスピーカシステム１０や第２実施形態のスピーカスタンド８０と同様に、図示しないモータを介して筐体８１ｂに取り付けられており、スピーカスタンド８０ｂおよび筐体８１ｂの上面に配置されたスピーカシステム１０ｂを回転させることができる。なお、図１０に示すように、スピーカセット１は、それぞれ２つのスピーカシステム１０ｂとスピーカスタンド８０ｂとを有している。そこで、以下では、スピーカシステム１０ｂやスピーカスタンド８０ｂ等の複数ある構成部を区別する場合には、符号の後に［１］あるいは［２］を付加して表記する。 C. Third embodiment:
FIG. 10 is an explanatory diagram showing a usage state of the speaker stand 80b as the third embodiment of the present invention. FIG. 10 shows the speaker set 1 using two sets of the speaker stand 80b and the speaker system 10b arranged on the upper surface of the casing 81b. The speaker system 10b has one speaker unit 23 attached to the housing 11b. In the speaker stand 80b, one microphone 33 and a leg 40b are attached to the housing 81b. Similar to the speaker system 10 of the first embodiment and the speaker stand 80 of the second embodiment, the leg portion 40b is attached to the housing 81b via a motor (not shown), and the upper surfaces of the speaker stand 80b and the housing 81b. Can be rotated. As shown in FIG. 10, the speaker set 1 has two speaker systems 10b and a speaker stand 80b. Therefore, in the following, when a plurality of components such as the speaker system 10b and the speaker stand 80b are distinguished, they are described by adding [1] or [2] after the reference numerals.

第３実施形態のスピーカセット１は、２つのスピーカシステム１０ｂを有しているので、一方のスピーカシステム１０ｂにおいてＬチャンネルの音声信号を再生し、他方のスピーカシステム１０ｂにおいてＲチャンネルの音声信号を再生することにより、スピーカセット１でステレオの音声が再生される。このように、スピーカセット１は、全体として音声を発生する機能を有しているので、音声発生システムともいうことができる。 Since the speaker set 1 of the third embodiment has two speaker systems 10b, one speaker system 10b reproduces an L channel audio signal and the other speaker system 10b reproduces an R channel audio signal. By doing so, stereo sound is reproduced by the speaker set 1. Thus, since the speaker set 1 has a function of generating sound as a whole, it can also be referred to as a sound generation system.

図１１は、スピーカスタンド８０ｂの機能的な構成を示すブロック図である。なお、２つのスピーカスタンド８０ｂ［１］，８０ｂ［２］の構成は同一であるので、図１１では、一方のスピーカスタンド８０ｂ［１］についてのみ構成を図示している。スピーカスタンド８０ｂ［１］は、２つのマイク３１ａ，３２ａおよび２つのエコーキャンセラ５１，５２に換えて、スピーカスタンド８０ｂ［１］が有するマイク３３［１］およびエコーキャンセラ５３［１］と、他のスピーカスタンド８０ｂ［２］が有するマイク３３［２］およびエコーキャンセラ５３［２］とを使用している点で、第２実施形態のスピーカスタンド８０（図７）と異なっている。他の点は、第２実施形態のスピーカスタンド８０と同様である。 FIG. 11 is a block diagram showing a functional configuration of the speaker stand 80b. Since the two speaker stands 80b [1] and 80b [2] have the same configuration, FIG. 11 shows only the configuration of one speaker stand 80b [1]. The speaker stand 80b [1] replaces the two microphones 31a and 32a and the two echo cancellers 51 and 52 with the microphone 33 [1] and the echo canceller 53 [1] of the speaker stand 80b [1], The speaker stand 80b [2] is different from the speaker stand 80 (FIG. 7) of the second embodiment in that the microphone 33 [2] and the echo canceller 53 [2] are used. Other points are the same as those of the speaker stand 80 of the second embodiment.

第３実施形態のスピーカセット１では、マイク３３［１］が収音したマイク入力音には、スピーカシステム１０ｂ［１］のスピーカユニット２３［１］（図１０）から伝達されてきた音と、スピーカシステム１０ｂ［２］のスピーカユニット２３［２］から空間を伝播してきた音とが含まれる。エコーキャンセラ５３［１］は、これらの音を音響エコーとして、マイク３３［１］が収音したマイク入力音から除去する。音響エコーが除去された外部到来音は、到来方向推定部６１［１］と、他のスピーカスタンド８０ｂ［２］が有する到来方向推定部６１［２］とに供給される。 In the speaker set 1 of the third embodiment, the microphone input sound picked up by the microphone 33 [1] includes the sound transmitted from the speaker unit 23 [1] (FIG. 10) of the speaker system 10b [1], Sound transmitted through the space from the speaker unit 23 [2] of the speaker system 10b [2]. The echo canceller 53 [1] removes these sounds as acoustic echoes from the microphone input sound collected by the microphone 33 [1]. The external incoming sound from which the acoustic echo has been removed is supplied to the arrival direction estimation unit 61 [1] and the arrival direction estimation unit 61 [2] of the other speaker stand 80b [2].

到来方向推定部６１［１］は、スピーカスタンド８０ｂ［１］のエコーキャンセラ５３［１］から供給される外部到来音と、他のスピーカスタンド８０ｂ［２］のエコーキャンセラ５３［２］から供給される外部到来音とに基づいて、音声の到来方向を推定する。なお、この場合、到来方向の推定には、２つのスピーカスタンド８０ｂ［１］，８０ｂ［２］間の距離等、２つのスピーカスタンド８０ｂ［１］，８０ｂ［２］の位置関係に関する情報が必要となる。このような位置関係に関する情報は、予めユーザにより入力される。また、ユーザが位置関係に関する情報を入力するのに換えて、個々のスピーカシステム１０ｂ［１］，１０ｂ［２］に位置測定用の音声を再生させるとともに、別個に設けられたマイクで収音した位置測定用の音声を解析することにより、２つのスピーカスタンド８０ｂ［１］，８０ｂ［２］の位置関係に関する情報を設定するものとしても良い。２つのスピーカスタンド８０ｂ［１］，８０ｂ［２］の位置関係は、上述した音声の解析による他、赤外線による位置の測定等、種々の位置測定方法を用いて取得することができる。 The arrival direction estimation unit 61 [1] is supplied from the external incoming sound supplied from the echo canceller 53 [1] of the speaker stand 80b [1] and from the echo canceller 53 [2] of the other speaker stand 80b [2]. The direction of voice arrival is estimated based on the external incoming sound. In this case, in order to estimate the direction of arrival, information on the positional relationship between the two speaker stands 80b [1] and 80b [2] such as the distance between the two speaker stands 80b [1] and 80b [2] is required. It becomes. Information regarding such a positional relationship is input in advance by the user. Also, instead of the user inputting information related to the positional relationship, each speaker system 10b [1], 10b [2] reproduces the sound for position measurement and picks up the sound with a separately provided microphone. Information on the positional relationship between the two speaker stands 80b [1] and 80b [2] may be set by analyzing the sound for position measurement. The positional relationship between the two speaker stands 80b [1] and 80b [2] can be acquired using various position measurement methods such as the measurement of the position using infrared rays in addition to the above-described analysis of the voice.

このように、第３実施形態のスピーカセット１においても、２つのマイク３３［１］，３３［２］と、個々のマイク３３［１］，３３［２］に接続された２つのエコーキャンセラ５３［１］，５３［２］とにより、スピーカセット１の外部から到来した外部到来音が抽出される。そして、到来方向推定部６１により、抽出された外部到来音に基づいて外部到来音の到来方向が推定されるので、２つのスピーカスタンド８０ｂは、それぞれが有するモータ７０を駆動して、スピーカスタンド８０ｂとスピーカシステム１０ｂとを外部到来音の到来方向に向けることができる。そのため、第１実施形態及び第２実施形態と同様に、スピーカセット１において音声が再生されている場合においても、ユーザが声を発することにより、ユーザの発した声の到来方向、すなわち、ユーザの方向にスピーカスタンド８０ｂおよびスピーカシステム１０ｂを向けることが可能となる。 Thus, also in the speaker set 1 of the third embodiment, the two microphones 33 [1] and 33 [2] and the two echo cancellers 53 connected to the individual microphones 33 [1] and 33 [2] are provided. From [1] and 53 [2], the incoming sound coming from the outside of the speaker set 1 is extracted. Then, since the arrival direction of the external incoming sound is estimated based on the extracted external incoming sound by the arrival direction estimation unit 61, the two speaker stands 80b drive the motor 70 that each has, and the speaker stand 80b. And the speaker system 10b can be directed in the direction of arrival of the external incoming sound. Therefore, similarly to the first embodiment and the second embodiment, even when sound is reproduced in the speaker set 1, when the user speaks, the arrival direction of the voice uttered by the user, that is, the user's voice The speaker stand 80b and the speaker system 10b can be directed in the direction.

なお、第３実施形態のスピーカスタンド８０ｂでは、スピーカスタンド８０ｂに入力された音声信号をそのままスピーカシステム１０ｂに出力しているが、スピーカシステム１０ｂに出力する音声信号を、外部到来音の到来方向に基づいて遅延させるものとしても良い。このようにすれば、２つのスピーカシステム１０ｂ［１］，１０ｂ［２］のいずれか一方に近い位置にユーザがいる場合においても、スピーカシステム１０ｂ［１］，１０ｂ［２］からユーザに到達する音声の位相を合わせることができるので、ユーザはより自然な音声を聴取することが可能となる。 In the speaker stand 80b of the third embodiment, the audio signal input to the speaker stand 80b is output to the speaker system 10b as it is, but the audio signal output to the speaker system 10b is set in the direction of arrival of the external incoming sound. It is good also as what delays based. In this way, even when the user is near one of the two speaker systems 10b [1] and 10b [2], the user reaches the user from the speaker systems 10b [1] and 10b [2]. Since the phase of the sound can be matched, the user can listen to a more natural sound.

また、第３実施形態のスピーカセット１では、ステレオ音声を再生するために、スピーカスタンド８０ｂと、スピーカシステム１０ｂとからなる組を２組用いている。但し、スピーカスタンド８０ｂと、スピーカシステム１０ｂとからなる組は、２以上であれば任意の数とすることができる。例えば、５．１チャンネルのサラウンド音声を再生するために、スピーカスタンド８０ｂおよびスピーカシステム１０ｂからなる組を５組と、サブウーファとを使用することも可能である。この場合においては、５．１チャンネルの音声信号をスピーカスタンド８０ｂに供給し、当該音声信号を参照信号としてエコーキャンセルが行われる。 In the speaker set 1 of the third embodiment, two sets of the speaker stand 80b and the speaker system 10b are used to reproduce stereo sound. However, the number of pairs of the speaker stand 80b and the speaker system 10b can be any number as long as it is two or more. For example, in order to reproduce 5.1 channel surround sound, it is possible to use five sets of the speaker stand 80b and the speaker system 10b and a subwoofer. In this case, the 5.1 channel audio signal is supplied to the speaker stand 80b, and echo cancellation is performed using the audio signal as a reference signal.

上記説明では、本発明の第３実施形態として、スピーカシステム１０ｂを筐体８１ｂの上面に配置するスピーカスタンド８０ｂの例を示しているが、本発明の第３実施形態としては、スピーカシステム１０ｂおよびスピーカスタンド８０ｂの構成と機能とを有する単体のスピーカシステムとして構成することも可能である。 In the above description, an example of the speaker stand 80b in which the speaker system 10b is arranged on the upper surface of the housing 81b is shown as the third embodiment of the present invention. However, as the third embodiment of the present invention, the speaker system 10b and It is also possible to configure as a single speaker system having the configuration and function of the speaker stand 80b.

Ｄ．第４実施形態：
図１２は、本発明の第４実施形態としてのスピーカシステム１０ｃの機能的な構成を示すブロック図である。第４実施形態のスピーカシステム１０ｃは、モード切替スイッチ９１、音声データ格納部９２および音声データ登録部９３が付加されている点と、方向調整部６０ｃの構成が異なっている点とで、第１実施形態のスピーカシステム１０と異なっている。他の点は、第１実施形態のスピーカシステム１０と同様である。 D. Fourth embodiment:
FIG. 12 is a block diagram showing a functional configuration of a speaker system 10c as the fourth embodiment of the present invention. The speaker system 10c according to the fourth embodiment is different in that the mode changeover switch 91, the audio data storage unit 92, and the audio data registration unit 93 are added, and the configuration of the direction adjustment unit 60c is different. This is different from the speaker system 10 of the embodiment. Other points are the same as those of the speaker system 10 of the first embodiment.

スピーカシステム１０ｃは、動作モードとして、音声データを登録するための音声登録モード（後述する）と、第１実施形態のスピーカシステム１０と同様にスピーカシステム１０ｃの方向を調整する方向調整モードとを有している。モード切替スイッチ９１は、オルタネイト動作型（位置保持型）のスイッチで、スピーカシステム１０ｃの動作モードは、スイッチの位置に応じて、音声登録モードと方向調整モードとのいずれかに切り替えられる。 The speaker system 10c has, as operation modes, a voice registration mode (to be described later) for registering voice data and a direction adjustment mode for adjusting the direction of the speaker system 10c in the same manner as the speaker system 10 of the first embodiment. doing. The mode changeover switch 91 is an alternate operation type (position holding type) switch, and the operation mode of the speaker system 10c is switched to either the voice registration mode or the direction adjustment mode according to the position of the switch.

図１３は、スピーカシステム１０ｃにおける処理の流れを示すフローチャートである。スピーカシステム１０ｃは、ＣＰＵと、ＲＯＭと、ＲＡＭと、二次記憶装置と、マイク３１，３２、モータ７０、モード切替スイッチ９１等とのインタフェースと（いずれも図示しない）を有するコンピュータとしての機能を備えている。ＣＰＵがＲＯＭやＲＡＭに格納されたプログラムを実行することにより、図１３（ａ）ないし（ｃ）に示す各処理が実行され、エコーキャンセラ５１，５２、方向調整部６０ｃ、モータ制御部７１および音声データ登録部９３の各機能部が実現される。なお、図１３（ｃ）に示すモータ制御処理は、図３（ｂ）に示す第１実施形態におけるモータ制御処理と同一であるので、ここではその説明を省略する。 FIG. 13 is a flowchart showing the flow of processing in the speaker system 10c. The speaker system 10c functions as a computer having a CPU, a ROM, a RAM, a secondary storage device, and interfaces with microphones 31 and 32, a motor 70, a mode changeover switch 91, and the like (all not shown). I have. When the CPU executes a program stored in the ROM or RAM, the processes shown in FIGS. 13A to 13C are executed, and the echo cancellers 51 and 52, the direction adjustment unit 60c, the motor control unit 71, and the sound are performed. Each functional unit of the data registration unit 93 is realized. Since the motor control process shown in FIG. 13C is the same as the motor control process in the first embodiment shown in FIG. 3B, the description thereof is omitted here.

図１３（ａ）に示すモード対応処理は、タイマ割り込み等の機能により定期的に実行される。モード対応処理のステップＳ３１において、スピーカシステム１０ｃは、モード切替スイッチ９１の位置に応じて、スピーカシステム１０ｃの動作モードが音声登録モードであるか否かを判断する。動作モードが音声登録モードであると判断された場合には、制御はステップＳ３２に移され、音声登録モードにおける処理が実行される。一方、動作モードが音声登録モードでない、すなわち、動作モードが方向調整モードであると判断された場合には、制御はステップＳ３４に移される。 The mode corresponding process shown in FIG. 13A is periodically executed by a function such as a timer interrupt. In step S31 of the mode handling process, the speaker system 10c determines whether the operation mode of the speaker system 10c is the voice registration mode according to the position of the mode switch 91. If it is determined that the operation mode is the voice registration mode, control is transferred to step S32, and processing in the voice registration mode is executed. On the other hand, when it is determined that the operation mode is not the voice registration mode, that is, the operation mode is the direction adjustment mode, the control is moved to step S34.

ステップＳ３２において、音声データ登録部９３は、マイク３１が収音した音声を一時的に録音し、録音された音声を解析する。録音された音声を解析することにより、録音された音声からユーザの声を抽出する。ユーザの声は、例えば、人の声に含まれる帯域において一定以上の大きさの音が所定の時間（例えば、０．５〜２秒）継続して検出された場合に、継続して検出された音声として抽出することができる。 In step S32, the voice data registration unit 93 temporarily records the voice picked up by the microphone 31, and analyzes the recorded voice. By analyzing the recorded voice, the user's voice is extracted from the recorded voice. For example, the user's voice is continuously detected when a sound of a certain level or higher is continuously detected for a predetermined time (for example, 0.5 to 2 seconds) in a band included in the human voice. Can be extracted as voice.

ステップＳ３３において、音声データ登録部９３は、ステップＳ３２において抽出されたユーザの声を表す音声データを、音声データ格納部９２に登録する。具体的には、二次記憶装置（図示しない）の所定の領域に確保された音声データ格納部９２に、ステップＳ３２において抽出されたユーザの声を表す音声データを格納する。そして、音声データ格納部９２への音声データの登録後、図１３（ａ）に示すモード対応処理は終了する。 In step S33, the voice data registration unit 93 registers the voice data representing the user's voice extracted in step S32 in the voice data storage unit 92. Specifically, the voice data representing the user's voice extracted in step S32 is stored in the voice data storage unit 92 secured in a predetermined area of the secondary storage device (not shown). Then, after the audio data is registered in the audio data storage unit 92, the mode corresponding process shown in FIG.

ステップＳ３１において、動作モードが音声登録モードでないと判断された場合に制御が移されるステップＳ３４では、図１３（ｂ）に示す方向調整処理が実行される。そして、方向調整処理の終了後、モード対応処理は終了する。 In step S34, in which the control is shifted when it is determined that the operation mode is not the voice registration mode in step S31, the direction adjustment process shown in FIG. 13B is executed. And after completion | finish of a direction adjustment process, a mode corresponding | compatible process is complete | finished.

図１３（ｂ）の方向調整処理は、ステップＳ１３がステップＳ４３に置き換えられている点と、ステップＳ１２とステップＳ４３との間に２つのステップＳ４１，Ｓ４２が付加されている点で、第１実施形態の方向調整処理（図３（ａ））と異なっている。他の点は、第１実施形態の方向調整処理と同様である。 The direction adjustment process of FIG. 13B is the first implementation in that step S13 is replaced by step S43 and two steps S41 and S42 are added between step S12 and step S43. This is different from the direction adjustment processing of the form (FIG. 3A). Other points are the same as those in the direction adjustment process of the first embodiment.

ステップＳ４１において、到来方向強調部６２は、到来方向推定部６１において推定された到来方向からの音声（外部到来音）を強調する。具体的には、到来方向推定部６１から供給される外部到来音の到来方向を表す情報（方向情報）に基づいて、エコーキャンセラ５１，５２から供給される外部到来音に対してビームフォーミング（例えば、電気情報通信学会の知識ベース（http://www.ieice-hbkb.org/portal/）の２群−６編−２章参照）を施す。これにより、到来方向推定部６１により推定された到来方向からの音声が強調される。 In step S <b> 41, the arrival direction emphasizing unit 62 emphasizes the voice from the arrival direction estimated by the arrival direction estimation unit 61 (external incoming sound). Specifically, beam forming (for example, for the external incoming sound supplied from the echo cancellers 51 and 52 based on the information (direction information) indicating the incoming direction of the external incoming sound supplied from the arrival direction estimation unit 61 is performed. The knowledge base of the Institute of Electrical, Information and Communication Engineers (see http://www.ieice-hbkb.org/portal/), Group 2-6, Chapter-2). Thereby, the voice from the arrival direction estimated by the arrival direction estimation unit 61 is enhanced.

ステップＳ４２において、音声認識部６３は、到来方向強調部６２において強調され、音声認識部６３に供給された音声（強調音声）からユーザの声を認識する。具体的には、パターンマッチング等の周知の音声認識技術を用いることにより、強調音声から音声データ格納部９２に登録された音声（登録音声）を検出する。 In step S <b> 42, the voice recognition unit 63 recognizes the user's voice from the voice (emphasized voice) emphasized by the arrival direction enhancement unit 62 and supplied to the voice recognition unit 63. Specifically, the voice (registered voice) registered in the voice data storage unit 92 is detected from the emphasized voice by using a known voice recognition technique such as pattern matching.

ステップＳ４３において、方向調整部６０ｃは、音声認識部６３による登録音声の検出結果に基づいて、強調音声が登録音声と一致するか否かを判断する。具体的には、強調音声から登録音声が検出された場合には、強調音声が登録音声と一致するものと判断され、制御はステップＳ１４に移される。一方、強調音声から登録音声が検出されなかった場合には、強調音声が登録音声と一致しないものと判断され、方向調整処理が終了する。 In step S43, the direction adjustment unit 60c determines whether or not the emphasized voice matches the registered voice based on the detection result of the registered voice by the voice recognition unit 63. Specifically, when the registered voice is detected from the emphasized voice, it is determined that the emphasized voice matches the registered voice, and the control is moved to step S14. On the other hand, if the registered voice is not detected from the emphasized voice, it is determined that the emphasized voice does not match the registered voice, and the direction adjustment process ends.

これにより、予め登録されたユーザの声が検出された場合、ステップＳ１４では、制御信号として、到来方向推定部６１において推定された回転角度をモータ制御部７１に送出する。そして、図１３（ｃ）に示すモータ制御処理が実行されることにより、スピーカシステム１０ｃは、ユーザの声の到来方向に向く。 Thereby, when a user's voice registered in advance is detected, the rotation angle estimated in the arrival direction estimation unit 61 is sent to the motor control unit 71 as a control signal in step S14. Then, by executing the motor control process shown in FIG. 13C, the speaker system 10c is directed to the arrival direction of the user's voice.

このように、第４実施形態のスピーカシステム１０ｃにおいても、２つのマイク３１，３２により収音されたマイク入力音にエコーキャンセル処理を施すことにより、マイク入力音から音響エコーが除去される。そして、マイク入力音から音響エコーを除去した外部到来音から音声の到来方向を推定し、推定された到来方向に基づいてスピーカシステム１０ｃを回転する。そのため、スピーカシステム１０ｃにおいて音声が再生されている場合においても、ユーザの発した声の到来方向、すなわち、ユーザの方向にスピーカシステム１０ｃを向けることが可能となる。 As described above, also in the speaker system 10c of the fourth embodiment, the acoustic echo is removed from the microphone input sound by performing the echo cancellation processing on the microphone input sound collected by the two microphones 31 and 32. Then, the arrival direction of the sound is estimated from the external arrival sound obtained by removing the acoustic echo from the microphone input sound, and the speaker system 10c is rotated based on the estimated arrival direction. Therefore, even when sound is reproduced in the speaker system 10c, the speaker system 10c can be directed in the direction of arrival of the voice uttered by the user, that is, in the direction of the user.

さらに、第４実施形態のスピーカシステム１０ｃでは、２つのマイク３１，３２により収音されたマイク入力音にエコーキャンセル処理を施した外部到来音から音声の到来方向を推定し、推定された到来方向からの外部到来音を強調している。このように強調された音声では、音声の到来方向に位置するユーザの声が周囲の雑音よりも相対的に大きくなる。そのため、強調された音声を用いて音声認識を行うことにより、より正確にユーザの声を検出することが可能となる。但し、外部到来音の強調を省略することも可能である。 Furthermore, in the speaker system 10c according to the fourth embodiment, the arrival direction of the speech is estimated from the external arrival sound obtained by performing echo cancellation processing on the microphone input sound collected by the two microphones 31 and 32, and the estimated arrival direction. Emphasizes the sound coming from outside. In the voice emphasized in this way, the voice of the user located in the voice arrival direction becomes relatively larger than the surrounding noise. Therefore, it is possible to detect the user's voice more accurately by performing voice recognition using the emphasized voice. However, it is possible to omit the enhancement of the external incoming sound.

また、第４実施形態のスピーカシステム１０ｃでは、予め登録されたユーザの声が検出された場合に、推定された到来方向に基づいてスピーカシステム１０ｃを回転する。そのため、登録されていないユーザの声や周囲の雑音により、ユーザの意図に反してスピーカシステム１０ｃが回転することが抑制される。 Moreover, in the speaker system 10c of 4th Embodiment, when the user's voice registered beforehand is detected, the speaker system 10c is rotated based on the estimated arrival direction. Therefore, the rotation of the speaker system 10c against the user's intention is suppressed by the unregistered user's voice and ambient noise.

第４実施形態では、方向調整処理（図１３（ｂ）のステップＳ１４において、制御信号として、外部到来音の到来方向として推定された回転角度を送出しているが、外部到来音の到来方向と関係なく予め設定された回転角度を送出することも可能である。例えば、回転方向を表すユーザの複数の声（例えば、「右」および「左」）を登録し、音声認識により複数の登録音声のいずれが検出されたかを判別して、判別結果に基づいた制御信号を送出することも可能である。この場合、制御信号として送出される回転角度には、ユーザの声により表される回転方向に向かって予め設定された角度分だけ回転させるための値が設定される。この場合、外部到来音が回転方向を表すものであるか否かを判別して、スピーカシステム１０ｃを回転しているので、スピーカシステム１０ｃは、外部到来音の内容に従ってその方向が変更されているものということができる。 In the fourth embodiment, the rotation angle estimated as the direction of arrival of the external incoming sound is transmitted as the control signal in step S14 of FIG. 13 (b). It is also possible to send a preset rotation angle regardless of, for example, a plurality of user's voices (for example, “right” and “left”) representing the rotation direction are registered, and a plurality of registered voices are recognized by voice recognition. It is also possible to transmit a control signal based on the determination result, in which case the rotation angle transmitted as the control signal includes the rotation direction represented by the voice of the user. In this case, it is determined whether or not the external incoming sound represents the rotation direction, and the speaker system 10c is rotated. Since the speaker system 10c can be said that its direction is changed according to the contents of the external sound arriving.

さらに、回転方向を表す複数の声に加え、回転方向を指定しないことを表す声（例えば、「こちら」）を登録し、音声認識により検出された音声が回転方向を指定しないことを表すものであった場合に、制御信号として、外部到来音の到来方向として推定された回転角度を送出するものとしても良い。 Furthermore, in addition to a plurality of voices indicating the rotation direction, a voice indicating that the rotation direction is not specified (for example, “here”) is registered, and the voice detected by voice recognition indicates that the rotation direction is not specified. If there is, the rotation angle estimated as the direction of arrival of the external incoming sound may be transmitted as the control signal.

なお、音声認識により回転方向を表す複数の登録音声のいずれが検出されたかを判別し、判別結果に基づいて制御信号を送出する場合において、さらに外部到来音の強調を省略する場合、外部到来音の到来方向の推定を省略することもできる。この場合には、スピーカシステムに設けられるマイクの数を１つとすることも可能である。 In the case where it is determined which of a plurality of registered sounds representing the rotation direction is detected by the voice recognition, and the control signal is transmitted based on the determination result, and further emphasis of the external incoming sound is omitted, the external incoming sound It is also possible to omit the estimation of the direction of arrival. In this case, the number of microphones provided in the speaker system can be one.

第４実施形態では、スピーカシステム１０ｃの動作モードの切替をオルタネイト動作型のモード切替スイッチ９１を用いて行っているが、他の方法で動作モードの切替を行うこともできる。例えば、動作モードの切り替えに、モーメンタリ動作型（自動復帰型）のスイッチを用いることも可能である。この場合、スイッチの押下を検出した際に、モード対応処理（図１３（ａ））のステップＳ３２，Ｓ３３を実行することにより、音声データを登録することができる。そして、方向調整処理（図１３（ｂ））をタイマ割り込み等の機能により定期的に実行すれば、第４実施形態と同様にスピーカシステム１０ｃをユーザの方向に向けることができる。 In the fourth embodiment, the operation mode of the speaker system 10c is switched using the alternate operation type mode switch 91, but the operation mode can be switched by other methods. For example, a momentary operation type (automatic return type) switch may be used for switching the operation mode. In this case, when the pressing of the switch is detected, the voice data can be registered by executing steps S32 and S33 of the mode handling process (FIG. 13A). Then, if the direction adjustment process (FIG. 13B) is periodically executed by a function such as timer interruption, the speaker system 10c can be directed to the user as in the fourth embodiment.

また、第４実施形態では、音声登録モードでユーザの声を表す音声データを登録しているが、音声データは、パーソナルコンピュータやスマートフォンやタブレット型端末等の外部のコンピュータを用いて登録することも可能である。この場合には、スピーカシステム１０ｃに、外部のコンピュータとスピーカシステム１０ｃとを接続するためのインタフェースが設けられる。 In the fourth embodiment, the voice data representing the user's voice is registered in the voice registration mode. However, the voice data may be registered using an external computer such as a personal computer, a smartphone, or a tablet terminal. Is possible. In this case, the speaker system 10c is provided with an interface for connecting an external computer and the speaker system 10c.

上記説明では、本発明の第４実施形態として、単体のスピーカシステム１０ｃの例を示しているが、本発明の第４実施形態としては、第２実施形態と同様に、スピーカシステムとスピーカスタンドとを別個のものとして構成することも可能である。 In the above description, an example of a single speaker system 10c is shown as the fourth embodiment of the present invention. However, as the fourth embodiment of the present invention, as in the second embodiment, a speaker system, a speaker stand, Can be configured separately.

Ｅ．変形例：
本発明は上記各実施形態に限られるものではなく、その要旨を逸脱しない範囲において種々の態様において実施することが可能であり、例えば、次のような変形も可能である。 E. Variation:
The present invention is not limited to the above-described embodiments, and can be implemented in various modes without departing from the gist thereof. For example, the following modifications are possible.

Ｅ１．変形例１：
上記各実施形態では、エコーキャンセラ５１，５２，５３と、方向調整部６０，６０ｃと、モータ制御部７１と、音声データ登録部９３との各機能部を、ＣＰＵ（図示しない）がプログラムを実行することにより実現しているが、これらの各機能部の少なくとも一部をハードウェアにより実現するものとしても良い。 E1. Modification 1:
In each of the above-described embodiments, the CPU (not shown) executes programs for the function units of the echo cancellers 51, 52, and 53, the direction adjustment units 60 and 60c, the motor control unit 71, and the audio data registration unit 93. However, at least a part of each functional unit may be realized by hardware.

Ｅ２．変形例２：
上記各実施形態では、本発明を、スピーカセット、スピーカシステムおよびスピーカシステムを配置するためのスピーカスタンドに適用しているが、本発明は、テレビやテレビのスタンド、スマートフォンやタブレット型端末のスタンド等、音声を発生する種々のシステム（音声発生システム）や音声発生機器を配置するための種々のスタンド（音声発生機器用スタンド）に適用することも可能である。 E2. Modification 2:
In each of the above embodiments, the present invention is applied to a speaker set, a speaker system, and a speaker stand for arranging the speaker system. However, the present invention includes a television, a television stand, a smartphone, a tablet terminal stand, and the like. The present invention can also be applied to various systems (sound generation systems) that generate sound and various stands (sound generation device stands) for arranging sound generation devices.

Ｅ３．変形例３：
上記各実施形態では、声を発するユーザ自身がスピーカシステムとは別個の音源となっているが、必ずしもユーザ自身が声を発する必要はない。例えば、ユーザがベルやブザー等を鳴らすことにより、スピーカシステムの方向を変化させるものとしても良い。この場合、ユーザが鳴らすベルやブザー等がスピーカシステムとは別個の音源となる。 E3. Modification 3:
In each of the above embodiments, the user who speaks is a sound source separate from the speaker system, but the user does not necessarily have to speak. For example, the user may change the direction of the speaker system by ringing a bell or buzzer. In this case, a bell, a buzzer, or the like that the user rings is a sound source that is separate from the speaker system.

１…スピーカセット、１０，１０ａ，１０ｂ，１０ｃ，１０ｇ…スピーカシステム、１１，１１ａ，１１ｂ…筐体、２１，２１ａ，２２，２２ａ，２３…スピーカユニット、３１，３１ａ，３２，３２ａ，３３…マイク、４０，４０ａ，４０ｂ…脚部、４１…回転軸、５１，５２，５３…エコーキャンセラ、６０，６０ｃ…方向調整部、６１…到来方向推定部、６２…到来方向強調部、６３…音声認識部、７０…モータ、７１…モータ制御部、８０，８０ａ，８０ｂ…スピーカスタンド、８１，８１ｂ…筐体、９１…モード切替スイッチ、９２…音声データ格納部、９３…音声データ登録部、ＭＰ…音楽プレーヤ、ＵＳＲ…ユーザ DESCRIPTION OF SYMBOLS 1 ... Speaker set, 10, 10a, 10b, 10c, 10g ... Speaker system, 11, 11a, 11b ... Housing, 21, 21a, 22, 22a, 23 ... Speaker unit, 31, 31a, 32, 32a, 33 ... Microphone, 40, 40a, 40b ... Leg, 41 ... Rotation axis, 51, 52, 53 ... Echo canceller, 60, 60c ... Direction adjustment unit, 61 ... Arrival direction estimation unit, 62 ... Arrival direction enhancement unit, 63 ... Audio Recognizing unit, 70 ... motor, 71 ... motor control unit, 80, 80a, 80b ... speaker stand, 81, 81b ... housing, 91 ... mode change switch, 92 ... audio data storage unit, 93 ... audio data registration unit, MP ... music player, USR ... user

Claims

音声を発生する音声発生システムであって、
音声発生機器と、
Ｎ個（Ｎは、１以上の整数）のマイクと、
前記Ｎ個のマイクにより収音された音声から前記音声発生機器が発生する音声を除去して、前記音声発生機器とは別個の音源が発し前記Ｎ個のマイクに到来するＮ種の音声をそれぞれ出力するＮ個のエコーキャンセラと、
前記Ｎ個のエコーキャンセラが出力する前記Ｎ種の音声に応じて、前記音声発生機器の方向を変更する駆動部と、
を備える音声発生システム。 A sound generation system for generating sound,
A sound generator,
N microphones (N is an integer of 1 or more),
The sound generated by the sound generation device is removed from the sound collected by the N microphones, and N types of sound arriving at the N microphones are emitted from a sound source separate from the sound generation device. N echo cancellers to output,
A drive unit that changes the direction of the sound generating device according to the N types of sound output by the N echo cancellers;
A sound generation system comprising:

請求項１記載の音声発生システムであって、
Ｎは、２以上であり、
前記音声発生システムは、さらに、
前記Ｎ個のエコーキャンセラが出力する前記Ｎ種の音声に基づいて、前記音源の方向を推定する音源方向推定部
を備え、
前記駆動部は、前記音源方向推定部により推定された前記音源の方向に基づいて、前記音声発生機器の方向を変更する
音声発生システム。 The sound generation system according to claim 1,
N is 2 or more,
The sound generation system further includes:
A sound source direction estimating unit that estimates the direction of the sound source based on the N types of sounds output from the N echo cancellers;
The said drive part changes the direction of the said sound generation apparatus based on the direction of the said sound source estimated by the said sound source direction estimation part.

請求項１記載の音声発生システムであって、さらに、
前記Ｎ個のエコーキャンセラが出力する前記Ｎ種の音声に基づいて、前記音源が発した音声の内容を判別する音声認識部
を備え、
前記駆動部は、前記音声認識部により判別された音声の内容に従って、前記音声発生機器の方向を変更する
音声発生システム。 The sound generation system according to claim 1, further comprising:
A speech recognizing unit for discriminating the content of speech emitted by the sound source based on the N types of speech output by the N echo cancellers;
The said drive part changes the direction of the said audio | voice generation apparatus according to the content of the audio | voice discriminated by the said audio | voice recognition part Voice generating system.

請求項３記載の音声発生システムであって、
Ｎは、２以上であり、
前記音声発生システムは、さらに、
前記Ｎ個のエコーキャンセラが出力する前記Ｎ種の音声に基づいて、前記音源の方向を推定する音源方向推定部と、
前記Ｎ個のエコーキャンセラが出力する前記Ｎ種の音声に基づいて、前記音源方向推定部により推定された前記音源の方向からの音声を強調する音声強調部と、
を備え、
前記音声認識部は、前記音声強調部により強調された音声に基づいて前記音源が発した音声の内容を判別する
音声発生システム。 The sound generation system according to claim 3,
N is 2 or more,
The sound generation system further includes:
A sound source direction estimating unit that estimates the direction of the sound source based on the N types of sounds output by the N echo cancellers;
A speech enhancement unit that enhances speech from the direction of the sound source estimated by the sound source direction estimation unit based on the N types of speech output by the N echo cancellers;
With
The voice recognition system, wherein the voice recognition unit discriminates the content of voice generated by the sound source based on the voice emphasized by the voice enhancement unit.

音声発生機器を配置するための音声発生機器用スタンドであって、
Ｎ個（Ｎは、１以上の整数）のマイクと、
前記Ｎ個のマイクにより収音された音声から前記音声発生機器が発生する音声を除去して、前記音声発生機器とは別個の音源が発し前記Ｎ個のマイクに到来するＮ種の音声をそれぞれ出力するＮ個のエコーキャンセラと、
前記Ｎ個のエコーキャンセラが出力する前記Ｎ種の音声に応じて、前記音声発生機器の方向を回転させる駆動部と、
を備える音声発生機器用スタンド。 A sound generating device stand for arranging sound generating devices,
N microphones (N is an integer of 1 or more),
The sound generated by the sound generation device is removed from the sound collected by the N microphones, and N types of sound arriving at the N microphones are emitted from a sound source separate from the sound generation device. N echo cancellers to output,
A drive unit that rotates the direction of the sound generation device according to the N types of sound output by the N echo cancellers;
A stand for sound generating equipment.