JP4920511B2

JP4920511B2 - Multichannel echo canceller

Info

Publication number: JP4920511B2
Application number: JP2007175430A
Authority: JP
Inventors: 剛樹西川; 丈郎金森; 考一郎水島
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2006-07-06
Filing date: 2007-07-03
Publication date: 2012-04-18
Anticipated expiration: 2027-07-03
Also published as: JP2008033307A

Abstract

PROBLEM TO BE SOLVED: To always perform stable echo cancellation without sound quality deterioration during multichannel reproduction, and without depending on either double talk time or single talk time. SOLUTION: A multichannel echo canceler according to the present invention includes an echo cancellation section which: receives loudspeaker input signals (sp1 , sp2) including a second acoustic signal to be respectively input to a plurality of loudspeakers (10, 20) provided in a first location and detection signals (m1, m2) detected by a plurality of microphones (11, 21) provided in the first location; separates a first acoustic signal and the second acoustic signal, which are included in the detection signals (m1, m2) by performing signal processing based on an independent component analysis; and cancels as an echo the second acoustic signal, which is included in the detection signals (m1, m2) by outputting only the separated first acoustic signal to a plurality of loudspeakers (30, 40) provided in a second location. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、マルチチャンネルエコーキャンセラに関し、より特定的には、会議システムやハンズフリー電話などに用いられるマルチチャンネルエコーキャンセラに関するものである。 The present invention relates to a multi-channel echo canceller, and more particularly to a multi-channel echo canceller used for a conference system, a hands-free telephone or the like.

近年、離れた場所に存在する話者同士の音声である音響信号を相互伝送する会議システムやハンズフリー電話などのマルチチャンネルの音響システムが実現されている。この音響システムを例えば第１および第２の場所間で実現する場合、第１および第２の場所それぞれに、話者自身の音声を検出するための複数のマイクロホンと、離れた場所に存在する話者の音声を聞くための複数のスピーカとが設けられる。第１の場所の各スピーカは第２の場所の各マイクロホンと接続され、第１の場所の各マイクロホンは第２の場所の各スピーカと接続される。これにより、例えば第１の場所に存在する話者Ｓ１は、第１の場所の各スピーカを通して第２の場所に存在する話者Ｓ２の音声を聞くことができる。また、第１の場所の各マイクロホンを通して話者Ｓ１の音声を話者Ｓ２に聞かすことができる。 In recent years, multi-channel acoustic systems such as conference systems and hands-free telephones that mutually transmit acoustic signals, which are voices between speakers in remote locations, have been realized. When this acoustic system is realized between, for example, a first location and a second location, a plurality of microphones for detecting the voice of the speaker itself and a story existing at a remote location are provided in each of the first and second locations. And a plurality of speakers for listening to a person's voice. Each speaker at the first location is connected to each microphone at the second location, and each microphone at the first location is connected to each speaker at the second location. Thereby, for example, the speaker S1 existing in the first location can hear the voice of the speaker S2 existing in the second location through each speaker in the first location. Also, the voice of the speaker S1 can be heard by the speaker S2 through each microphone at the first location.

しかしながら、このような音響システムでは、エコーをキャンセルしなければならないという課題がある。例えば話者Ｓ２が音声を発したとき、その音声は第２の場所の各マイクロホンを通して第１の場所の各スピーカで拡声される。ここで、第１の場所には各マイクロホンが設けられている。このため、第１の場所の各スピーカで拡声された話者Ｓ２の音声は、第１の場所の各マイクロホンで検出されることになる。その結果、話者Ｓ２は、話者Ｓ１の音声以外に、自分自身が発した音声を第２の場所の各スピーカを通して聞くことになる。このように、離れた場所に存在する話者の音声を聞くためのスピーカで拡声される自分自身の音声は、話者にとって不要なエコーとなる。 However, such an acoustic system has a problem that the echo must be canceled. For example, when the speaker S2 utters a sound, the sound is amplified by each speaker at the first location through each microphone at the second location. Here, each microphone is provided in the first location. For this reason, the voice of the speaker S2 amplified by each speaker at the first location is detected by each microphone at the first location. As a result, the speaker S2 listens to the sound generated by himself / herself through each speaker in the second place in addition to the sound of the speaker S1. As described above, the sound of the user himself / herself that is amplified by the speaker for listening to the sound of the speaker existing at a distant place becomes an unnecessary echo for the speaker.

そこで従来において、このようなエコーをキャンセルするマルチチャンネルエコーキャンセラとして、適応フィルタを用いたマルチチャンネルエコーキャンセラが提案されている。図８は、音響システムに用いられる従来の適応フィルタを用いたマルチチャンネルエコーキャンセラ９の構成を示す図である。図８に示す音響システムでは、チャンネルが２つの場合を示している。また図８に示す音響システムでは、近端側には、話者Ｓ１が音源（近端音源）として存在しており、遠端側には、話者Ｓ２が音源（遠端音源）として存在しているとする。近端側には、遠端側の話者Ｓ２の音声からなる遠端音響信号をステレオで拡声するためのスピーカ１０および２０と、近端側の話者Ｓ１の音声からなる近端音響信号を検出するためのマイクロホン１１および２１とが設けられている。遠端側には、近端音響信号をステレオで拡声するためのスピーカ３０および４０と、遠端音響信号を検出するためのマイクロホン３１および４１とが設けられている。また図８に示す音響システムでは、一例として、マルチチャンネルエコーキャンセラ９が近端側にのみ設けられているとする。 Therefore, conventionally, a multichannel echo canceller using an adaptive filter has been proposed as a multichannel echo canceller for canceling such echo. FIG. 8 is a diagram showing a configuration of a multichannel echo canceller 9 using a conventional adaptive filter used in an acoustic system. The acoustic system shown in FIG. 8 shows a case where there are two channels. In the acoustic system shown in FIG. 8, the speaker S1 exists as a sound source (near end sound source) on the near end side, and the speaker S2 exists as a sound source (far end sound source) on the far end side. Suppose that On the near-end side, speakers 10 and 20 for amplifying the far-end acoustic signal composed of the speech of the far-end speaker S2 in stereo and the near-end acoustic signal composed of the speech of the near-end speaker S1 are provided. Microphones 11 and 21 for detection are provided. On the far end side, speakers 30 and 40 for amplifying the near end acoustic signal in stereo, and microphones 31 and 41 for detecting the far end acoustic signal are provided. In the acoustic system shown in FIG. 8, it is assumed that the multi-channel echo canceller 9 is provided only on the near end side as an example.

図８において、マルチチャンネルエコーキャンセラ９は、適応フィルタ９１〜９４、加算器９５および９７、減算器９６および９８により構成される。適応フィルタ９１は、減算器９６からの出力信号に基づいて、スピーカ１０からマイクロホン１１への伝達特性ｈ１１（ω）を推定する。ωは周波数である。適応フィルタ９１は、スピーカ１０に入力されるべきスピーカ入力信号ｓｐ１に推定結果ｅｈ１１（ω）を畳み込んで出力する。適応フィルタ９２は、減算器９６からの出力信号に基づいて、スピーカ２０からマイクロホン１１への伝達特性ｈ２１（ω）を推定する。適応フィルタ９２は、スピーカ２０に入力されるべきスピーカ入力信号ｓｐ２に推定結果ｅｈ２１（ω）を畳み込んで出力する。適応フィルタ９３は、減算器９８からの出力信号に基づいて、スピーカ１０からマイクロホン２１への伝達特性ｈ１２（ω）を推定する。適応フィルタ９３は、スピーカ１０に入力されるべきスピーカ入力信号ｓｐ１に推定結果ｅｈ１２（ω）を畳み込んで出力する。適応フィルタ９４は、減算器９８からの出力信号に基づいて、スピーカ２０からマイクロホン２１への伝達特性ｈ２２（ω）を推定する。適応フィルタ９４は、スピーカ２０に入力されるべきスピーカ入力信号ｓｐ２に推定結果ｅｈ２２（ω）を畳み込んで出力する。 In FIG. 8, the multi-channel echo canceller 9 includes adaptive filters 91 to 94, adders 95 and 97, and subtractors 96 and 98. The adaptive filter 91 estimates the transfer characteristic h11 (ω) from the speaker 10 to the microphone 11 based on the output signal from the subtractor 96. ω is a frequency. The adaptive filter 91 convolves the estimation result eh11 (ω) with the speaker input signal sp1 to be input to the speaker 10 and outputs the result. The adaptive filter 92 estimates the transfer characteristic h 21 (ω) from the speaker 20 to the microphone 11 based on the output signal from the subtractor 96. The adaptive filter 92 convolves the estimation result eh21 (ω) with the speaker input signal sp2 to be input to the speaker 20 and outputs the result. The adaptive filter 93 estimates the transfer characteristic h12 (ω) from the speaker 10 to the microphone 21 based on the output signal from the subtractor 98. The adaptive filter 93 convolves the estimation result eh12 (ω) with the speaker input signal sp1 to be input to the speaker 10 and outputs the result. The adaptive filter 94 estimates the transfer characteristic h22 (ω) from the speaker 20 to the microphone 21 based on the output signal from the subtractor 98. The adaptive filter 94 convolves the estimation result eh22 (ω) with the speaker input signal sp2 to be input to the speaker 20 and outputs the result.

加算器９５は、適応フィルタ９１からの出力信号と適応フィルタ９２からの出力信号とを入力とし、これらの出力信号を加算する。減算器９６は、マイクロホン１１で検出された検出信号ｍ１と加算器９５からの出力信号とを入力とし、検出信号ｍ１から加算器９５からの出力信号を減算する。これにより、減算器９６からの出力信号ｙ１は、エコーである遠端側の話者Ｓ２の音声がキャンセルされた信号となる。減算器９６からの出力信号ｙ１は、遠端側に伝送され、遠端側のスピーカ３０で拡声される。加算器９７は、適応フィルタ９３からの出力信号と適応フィルタ９４からの出力信号とを入力とし、これらの出力信号を加算する。減算器９８は、マイクロホン２１で検出された検出信号ｍ２と加算器９７からの出力信号とを入力とし、検出信号ｍ２から加算器９７からの出力信号を減算する。これにより、減算器９８からの出力信号ｙ２は、エコーである遠端側の話者Ｓ２の音声がキャンセルされた信号となる。減算器９８からの出力信号ｙ２は、遠端側に伝送され、遠端側のスピーカ４０で拡声される。 The adder 95 receives the output signal from the adaptive filter 91 and the output signal from the adaptive filter 92, and adds these output signals. The subtracter 96 receives the detection signal m1 detected by the microphone 11 and the output signal from the adder 95, and subtracts the output signal from the adder 95 from the detection signal m1. Thereby, the output signal y1 from the subtracter 96 becomes a signal in which the voice of the far-end speaker S2 which is an echo is canceled. The output signal y1 from the subtractor 96 is transmitted to the far end side and is amplified by the far end side speaker 30. The adder 97 receives the output signal from the adaptive filter 93 and the output signal from the adaptive filter 94 and adds these output signals. The subtracter 98 receives the detection signal m2 detected by the microphone 21 and the output signal from the adder 97, and subtracts the output signal from the adder 97 from the detection signal m2. Thereby, the output signal y2 from the subtractor 98 becomes a signal in which the voice of the far-end speaker S2 which is an echo is canceled. The output signal y2 from the subtracter 98 is transmitted to the far end side and is amplified by the far end side speaker 40.

ここで、適応フィルタ９１〜９４で行われる伝達特性の推定には、適応フィルタの学習方法として一般に用いられる学習同定法（ＬＭＳ）などが利用される。具体的には、適応フィルタ９１および９２は、減算器９６からの出力信号ｙ１のパワーが最小となるように、伝達特性を推定する。適応フィルタ９３および９４は、減算器９８からの出力信号ｙ２のパワーが最小となるように、伝達特性を推定する。 Here, for the estimation of transfer characteristics performed by the adaptive filters 91 to 94, a learning identification method (LMS) generally used as a learning method of the adaptive filter is used. Specifically, the adaptive filters 91 and 92 estimate the transfer characteristics so that the power of the output signal y1 from the subtractor 96 is minimized. The adaptive filters 93 and 94 estimate the transfer characteristics so that the power of the output signal y2 from the subtractor 98 is minimized.

以下、従来のマルチチャンネルエコーキャンセラ９の問題点について説明する。図８においてエコーキャンセル効果を得るには、適応フィルタ９１〜９４それぞれにおいて正しい伝達特性が推定されなければならない。例えば適応フィルタ９１で言えば、推定結果ｅｈ１１（ω）が伝達特性ｈ１１（ω）と一致する必要がある。しかしながら、従来のマルチチャンネルエコーキャンセラ９では、スピーカ入力信号ｓｐ１またはスピーカ入力信号ｓｐ２のいずれか一方の信号のみが拡声されている状態でなければ、正しい伝達特性を推定することができない。つまり、スピーカ１０またはスピーカ２０のいずれか一方のみが動作しているモノラル再生の状態でなければ、正しい伝達特性を推定することができない。 Hereinafter, problems of the conventional multi-channel echo canceller 9 will be described. In order to obtain the echo cancellation effect in FIG. 8, the correct transfer characteristics must be estimated in each of the adaptive filters 91 to 94. For example, in the case of the adaptive filter 91, the estimation result eh11 (ω) needs to coincide with the transfer characteristic h11 (ω). However, the conventional multi-channel echo canceller 9 cannot estimate a correct transfer characteristic unless only one of the speaker input signal sp1 and the speaker input signal sp2 is amplified. That is, unless the speaker 10 or the speaker 20 is operating in a monaural reproduction state, correct transfer characteristics cannot be estimated.

マルチチャンネル再生時（ここではステレオ再生時）、大抵はスピーカ１０またはスピーカ２０の両方が動作し、スピーカ１０および２０には相関をもつ信号が入力される。例えば図８に示す遠端側のマイクロホン３１および４１において、話者Ｓ２の音声がステレオ検出されるとする。また、話者Ｓ２の音声をｓ２（ω）とし、話者Ｓ２からマイクロホン３１への伝達特性をａ２１（ω）、話者Ｓ２からマイクロホン４１への伝達特性をａ２２（ω）とする。このとき、スピーカ１０に入力されるスピーカ入力信号ｓｐ１はｓ２（ω）・ａ２１（ω）となり、スピーカ２０に入力されるスピーカ入力信号ｓｐ２はｓ２（ω）・ａ２２（ω）となる。スピーカ入力信号ｓｐ１およびｓｐ２は、ともにｓ２（ω）を含むので、相関をもつことになる。また、マイクロホン１１で検出される検出信号ｍ１（ω）は、式（１）のようになる。

式（１）で表されたｓ２（ω）成分は、エコーである。よって、適応フィルタ９１および９２は、擬似エコーである加算器９５からの出力信号が式（１）で表されたｓ２（ω）成分と同じになるように、伝達特性を推定すればよい。加算器９５からの出力信号が式（１）で表されたｓ２（ω）成分と同じになれば、出力信号ｙ１のパワーが最小となり（つまり、ｓ１（ω）成分のみとなり）、エコーがキャンセルされる。 At the time of multi-channel reproduction (in this case, during stereo reproduction), usually both the speaker 10 and the speaker 20 operate, and a signal having a correlation is input to the speakers 10 and 20. For example, it is assumed that the voice of the speaker S2 is detected in stereo in the far-

end microphones

31 and 41 shown in FIG. Further, the voice of the speaker S2 is s2 (ω), the transfer characteristic from the speaker S2 to the microphone 31 is a21 (ω), and the transfer characteristic from the speaker S2 to the microphone 41 is a22 (ω). At this time, the speaker input signal sp1 input to the speaker 10 is s2 (ω) · a21 (ω), and the speaker input signal sp2 input to the speaker 20 is s2 (ω) · a22 (ω). Since the speaker input signals sp1 and sp2 both include s2 (ω), they have a correlation. Further, the detection signal m1 (ω) detected by the microphone 11 is as shown in Expression (1).

The s2 (ω) component expressed by Equation (1) is an echo. Therefore, the

adaptive filters

91 and 92 may estimate the transfer characteristics so that the output signal from the adder 95, which is a pseudo echo, is the same as the s2 (ω) component expressed by Expression (1). When the output signal from the adder 95 becomes the same as the s2 (ω) component expressed by the equation (1), the power of the output signal y1 becomes minimum (that is, only the s1 (ω) component), and the echo is canceled. Is done.

しかしながら、式（１）で表されたｍ１（ω）は、ｓ２（ω）に対して所定の伝達特性が乗算されたものを含んでおり、スピーカ入力信号ｓｐ１およびｓｐ２もｓ２（ω）に対して所定の伝達特性が乗算されたものを含んでいる。これは、スピーカ入力信号ｓｐ１またはスピーカ入力信号ｓｐ２のいずれか一方を用いることにより、式（１）で表されたｓ２（ω）成分を再現できることを意味するものである。したがって、適応フィルタ９１で推定される伝達特性ｅｈ１１（ω）と、適応フィルタ９２で推定される伝達特性ｅｈ２１（ω）とに複数の解（例えば、式（２）または式（３））が存在することになる。

However, m1 (ω) expressed by the equation (1) includes a signal obtained by multiplying s2 (ω) by a predetermined transfer characteristic, and the speaker input signals sp1 and sp2 are also s2 (ω). And a product multiplied by a predetermined transfer characteristic. This means that the s2 (ω) component expressed by the equation (1) can be reproduced by using either the speaker input signal sp1 or the speaker input signal sp2. Therefore, a plurality of solutions (for example, Expression (2) or Expression (3)) exist in the transfer characteristic eh11 (ω) estimated by the adaptive filter 91 and the transfer characteristic eh21 (ω) estimated by the adaptive filter 92. Will do.

このように、従来のマルチチャンネルエコーキャンセラ９では、マルチチャンネル再生時、解の不定性によって正しい伝達特性を推定することができず、エコーキャンセル効果が安定して得られないという問題があった。 As described above, the conventional multi-channel echo canceller 9 has a problem that, during multi-channel reproduction, the correct transfer characteristic cannot be estimated due to the indefiniteness of the solution, and the echo cancellation effect cannot be obtained stably.

そこで従来では、各チャンネルの信号レベルの大小を判定して推定処理を行うチャンネルを１つ選択する技術（例えば特許文献１など）が提案されている。また、スピーカ入力信号ｓｐ１およびスピーカ入力信号ｓｐ２に付加信号を加えることによって正しい伝達特性を推定する技術（例えば特許文献２など）も提案されている。従来では、これらの技術を採用することにより、従来のマルチチャンネルエコーキャンセラ９における解の不定性への対策が行われている。
特許第３４０７３９２号公報特許第３０７３９７６号公報 Therefore, conventionally, a technique (for example, Patent Document 1) has been proposed in which the level of the signal level of each channel is determined to select one channel for estimation processing. In addition, a technique for estimating a correct transfer characteristic by adding an additional signal to the speaker input signal sp1 and the speaker input signal sp2 (for example, Patent Document 2) has been proposed. Conventionally, by adopting these techniques, countermeasures against indefiniteness of solutions in the conventional multi-channel echo canceller 9 have been taken.
Japanese Patent No. 3407392 Japanese Patent No. 3073976

しかしながら、特許文献１に開示された技術では、チャンネル間の信号レベル差が小さい場合、各チャンネルの信号レベルの大小を正しく判定することができず、正しい伝達特性を推定することはできない。このため、特許文献１に開示された技術では、エコーキャンセルを常に安定して行うことはできなかった。また、特許文献２に開示された技術では、正しい伝達特性を推定するために付加信号をスピーカ入力信号ｓｐ１およびスピーカ入力信号ｓｐ２に加えていた。このため、スピーカでは話者の音声以外に付加信号も拡声されてしまい、付加信号による音質劣化が生じるという問題があった。このように、解の不定性への対策として提案された特許文献１および２に開示された技術では、常に安定したエコーキャンセルを行うことができなかったり、音質劣化が生じたりしていた。 However, in the technique disclosed in Patent Document 1, when the signal level difference between channels is small, the magnitude of the signal level of each channel cannot be correctly determined, and the correct transfer characteristic cannot be estimated. For this reason, the technique disclosed in Patent Document 1 cannot always perform echo cancellation stably. Further, in the technique disclosed in Patent Document 2, an additional signal is added to the speaker input signal sp1 and the speaker input signal sp2 in order to estimate a correct transfer characteristic. For this reason, in the speaker, there is a problem that the additional signal is also amplified in addition to the voice of the speaker, and the sound quality deteriorates due to the additional signal. As described above, in the techniques disclosed in Patent Documents 1 and 2 proposed as measures against the indefiniteness of the solution, stable echo cancellation cannot always be performed, or sound quality deterioration occurs.

それ故、本発明は、マルチチャンネル再生時において音質劣化が生じることなく常に安定したエコーキャンセルを行うことができ、ダブルトーク時やシングルトーク時に関係なく安定したエコーキャンセルを行うことが可能なマルチチャンネルエコーキャンセラを提供することを目的とする。 Therefore, the present invention can always perform stable echo cancellation without causing deterioration in sound quality during multi-channel playback, and can perform stable echo cancellation regardless of double talk or single talk. An object is to provide an echo canceller.

本発明に係るマルチチャンネルエコーキャンセラは、上記課題を解決するものであり、本発明に係るマルチチャンネルエコーキャンセラは、第１の場所に設けられた複数のマイクロホンで検出される当該第１の場所に存在する１つ以上の音源からの第１の音響信号と、第２の場所に設けられた複数のマイクロホンで検出される当該第２の場所に存在する１つ以上の音源からの第２の音響信号とを、第１および第２の場所それぞれに設けられた複数のスピーカを用いることによって、第１および第２の場所間で相互伝送する音響システムに用いられるマルチチャンネルエコーキャンセラであって、第１の場所に設けられた複数のマイクロホンは、第１の音響信号に加えてさらに、第１の場所に設けられた複数のスピーカで拡声された第２の音響信号を検出しており、マルチチャンネルエコーキャンセラは、第１の場所に設けられた複数のスピーカそれぞれに入力されるべき第２の音響信号を含むスピーカ入力信号と、第１の場所に設けられた複数のマイクロホンの検出信号とを入力とし、独立成分分析に基づく信号処理を施して各検出信号に含まれる第１の音響信号と第２の音響信号とを分離し、当該分離した第１の音響信号のみを第２の場所に設けられた複数のスピーカへ出力することによって、各検出信号に含まれる第２の音響信号をエコーとしてキャンセルするエコーキャンセル部を備える。 The multi-channel echo canceller according to the present invention solves the above problems, and the multi-channel echo canceller according to the present invention is located at the first location detected by a plurality of microphones provided at the first location. First acoustic signal from one or more sound sources present and second sound from one or more sound sources present at the second location detected by a plurality of microphones provided at the second location. A multi-channel echo canceller for use in an acoustic system for transmitting signals to and from each other by using a plurality of speakers provided at each of the first and second locations, In addition to the first acoustic signal, the plurality of microphones provided at one location further includes a second sound that is amplified by a plurality of speakers provided at the first location. The multi-channel echo canceller is provided at the first location and the speaker input signal including the second acoustic signal to be input to each of the plurality of speakers provided at the first location. The detection signals of the plurality of microphones are input, signal processing based on independent component analysis is performed to separate the first acoustic signal and the second acoustic signal included in each detection signal, and the separated first acoustic signal An echo cancellation unit is provided that cancels the second acoustic signal included in each detection signal as an echo by outputting only the signal to a plurality of speakers provided at the second location.

独立成分分析に基づく信号処理が施されることにより、各スピーカ入力信号に相関をもつ第２の音響信号が含まれていても、各検出信号に含まれる第１の音響信号と第２の音響信号とを分離することができる。これにより、マルチチャンネル再生時における解の不定性の問題を解消しつつ、音質劣化が生じることなく常に安定したエコーキャンセルを行うことができる。さらに、ダブルトーク時やシングルトーク時に関係なく安定したエコーキャンセルを行うことができる。 By performing signal processing based on independent component analysis, even if a second acoustic signal having a correlation is included in each speaker input signal, the first acoustic signal and the second acoustic signal included in each detection signal are included. The signal can be separated. Thereby, it is possible to always perform stable echo cancellation without causing deterioration of sound quality while solving the problem of indefiniteness of the solution during multi-channel reproduction. Furthermore, stable echo cancellation can be performed regardless of double talk or single talk.

なお、上記第１の場所は、例えば、後述する実施形態における、近端側の場所に相当するものである。また、上記第１の音響信号は、後述する実施形態における、近端音響信号に相当するものである。また、上記第２の場所は、例えば、後述する実施形態における、遠端側の場所に相当するものである。また、上記第２の音響信号は、後述する実施形態における、遠端音響信号に相当するものである。 The first location corresponds to, for example, a location on the near end side in an embodiment described later. The first acoustic signal corresponds to a near-end acoustic signal in an embodiment described later. The second location corresponds to, for example, a far-end location in an embodiment described later. The second acoustic signal corresponds to a far-end acoustic signal in an embodiment described later.

より好ましくは、エコーキャンセル部は、各スピーカ入力信号および各検出信号を入力とし、独立成分分析に基づく信号処理を施して、各検出信号に含まれる第１の音響信号と第２の音響信号とを分離するとともに当該第１の音響信号に含まれる互いに相関の低い信号を検出信号の数の分だけ分離し、当該分離した互いに相関の低い信号のみを第２の場所に設けられた複数のスピーカへ出力する音源分離部を有するとよい。 More preferably, the echo cancellation unit receives each speaker input signal and each detection signal as input, performs signal processing based on independent component analysis, and includes a first acoustic signal and a second acoustic signal included in each detection signal, A plurality of loudspeakers provided in the second place, the signals having a low correlation included in the first acoustic signal are separated by the number of detection signals, and only the separated signals having a low correlation are provided in the second location. A sound source separation unit that outputs to

より好ましくは、エコーキャンセル部は、第１の場所に設けられた複数のマイクロホンそれぞれに対応して設けられ、対応するマイクロホンの検出信号および各スピーカ入力信号を入力とし、独立成分分析に基づく信号処理を施して対応するマイクロホンの検出信号に含まれる第１の音響信号と第２の音響信号とを分離し、当該分離した第１の音響信号のみを第２の場所に設けられたいずれか１つのスピーカへ出力する複数の音源分離部を有するとよい。 More preferably, the echo cancellation unit is provided corresponding to each of the plurality of microphones provided in the first place, and receives the detection signal of the corresponding microphone and each speaker input signal as input, and performs signal processing based on independent component analysis The first acoustic signal and the second acoustic signal included in the detection signal of the corresponding microphone are separated, and only the separated first acoustic signal is provided at the second location. It is preferable to have a plurality of sound source separation units that output to a speaker.

より好ましくは、エコーキャンセル部には、各検出信号に含まれる第１の音響信号と第２の音響信号とを分離するための分離行列が予め設定されており、分離行列は、第１の場所に設けられた複数のスピーカから当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の第１の行列要素であって、独立成分分析に従って学習された複数の第１の行列要素を含み、エコーキャンセル部は、各スピーカ入力信号および各検出信号により構成される入力ベクトルを分離行列に対して乗算して各検出信号に含まれる第２の音響信号を各検出信号から減算することにより、各検出信号に含まれる第１の音響信号と第２の音響信号とを分離するとよい。 More preferably, a separation matrix for separating the first acoustic signal and the second acoustic signal included in each detection signal is preset in the echo cancellation unit, and the separation matrix is the first location. A plurality of first matrix elements relating to respective transfer characteristics from a plurality of speakers provided in the first location to a plurality of microphones provided in the first location, the plurality of first matrices being learned according to independent component analysis The echo cancellation unit includes an element, and multiplies the separation matrix by an input vector constituted by each speaker input signal and each detection signal, and subtracts a second acoustic signal included in each detection signal from each detection signal. Thus, the first acoustic signal and the second acoustic signal included in each detection signal may be separated.

より好ましくは、分離行列は、第２の場所に存在する１以上の音源から当該第２の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の第２の行列要素をさらに含み、各第２の行列要素のうち、分離行列の対角以外を構成する行列要素が０であるとよい。 More preferably, the separation matrix further includes a plurality of second matrix elements for each transfer characteristic from one or more sound sources present at the second location to a plurality of microphones provided at the second location, Of the second matrix elements, the matrix elements other than the diagonal of the separation matrix may be zero.

より好ましくは、分離行列は、各検出信号を用いて、各スピーカ入力信号における第２の音響信号に含まれる互いに相関の低い信号をスピーカ入力信号の数の分だけ分離するための複数の第２の行列要素をさらに含み、各第２の行列要素は、全て０であるとよい。 More preferably, the separation matrix uses a plurality of second signals for separating signals having low correlation included in the second acoustic signal in each speaker input signal by the number of speaker input signals using each detection signal. The second matrix elements may be all zero.

より好ましくは、分離行列は、第２の場所に存在する１以上の音源から当該第２の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の第２の行列要素と、各検出信号を用いて、各スピーカ入力信号における第２の音響信号に含まれる互いに相関の低い信号をスピーカ入力信号の数の分だけ分離するための複数の第３の行列要素とをさらに含み、各第２の行列要素のうち、分離行列の対角以外を構成する行列要素が０であり、各第３の行列要素が全て０であるとよい。 More preferably, the separation matrix includes a plurality of second matrix elements relating to each transfer characteristic from one or more sound sources present at the second location to a plurality of microphones provided at the second location, and each detection signal. And a plurality of third matrix elements for separating signals having a low correlation included in the second acoustic signal in each speaker input signal by the number of speaker input signals. Of the matrix elements, the matrix elements other than the diagonal of the separation matrix are 0, and all the third matrix elements are all 0.

より好ましくは、分離行列は、第１の場所に存在する１以上の音源から当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の第２の行列要素をさらに含み、各第２の行列要素のうち、分離行列の対角以外を構成する行列要素が０であるとよい。 More preferably, the separation matrix further includes a plurality of second matrix elements for each transfer characteristic from one or more sound sources present at the first location to a plurality of microphones provided at the first location, Of the second matrix elements, the matrix elements other than the diagonal of the separation matrix may be zero.

また本発明は、マルチチャンネルエコーキャンセル方法にも向けられており、本発明に係るマルチチャンネルエコーキャンセル方法は、第１の場所に設けられた複数のマイクロホンで検出される当該第１の場所に存在する１つ以上の音源からの第１の音響信号と、第２の場所に設けられた複数のマイクロホンで検出される当該第２の場所に存在する１つ以上の音源からの第２の音響信号とを、第１および第２の場所それぞれに設けられた複数のスピーカを用いることによって、第１および第２の場所間で相互伝送する音響システムに用いられるマルチチャンネルエコーキャンセル方法であって、第１の場所に設けられた複数のマイクロホンは、第１の音響信号に加えてさらに、第１の場所に設けられた複数のスピーカで拡声された第２の音響信号を検出しており、マルチチャンネルエコーキャンセル方法は、第１の場所に設けられた複数のスピーカそれぞれに入力されるべき第２の音響信号を含むスピーカ入力信号と、第１の場所に設けられた複数のマイクロホンの検出信号とを入力する入力ステップと、入力ステップにおいて入力された各スピーカ入力信号および各検出信号に対して独立成分分析に基づく信号処理を施すことによって、各検出信号に含まれる第１の音響信号と第２の音響信号とを分離する分離ステップと、分離ステップにおいて分離された第１の音響信号のみを第２の場所に設けられた複数のスピーカへ出力することによって、各検出信号に含まれる第２の音響信号をエコーとしてキャンセルするキャンセルステップとを有する。 The present invention is also directed to a multi-channel echo cancellation method, and the multi-channel echo cancellation method according to the present invention is present at the first location detected by a plurality of microphones provided at the first location. A first acoustic signal from one or more sound sources to be detected, and a second acoustic signal from one or more sound sources present at the second location detected by a plurality of microphones provided at the second location Is a multi-channel echo cancellation method used in an acoustic system that mutually transmits between the first and second locations by using a plurality of speakers provided in the first and second locations, respectively. In addition to the first acoustic signal, the plurality of microphones provided at one location further includes a second sound that is amplified by a plurality of speakers provided at the first location. The multi-channel echo canceling method is provided in the first location, including a speaker input signal including a second acoustic signal to be input to each of the plurality of speakers provided in the first location. Included in each detection signal by performing input processing for inputting detection signals of a plurality of microphones, and performing signal processing based on independent component analysis on each speaker input signal and each detection signal input in the input step. A separation step for separating the first acoustic signal and the second acoustic signal, and outputting only the first acoustic signal separated in the separation step to a plurality of speakers provided in the second location, A cancellation step of canceling the second acoustic signal included in the detection signal as an echo.

より好ましくは、分離ステップは、入力ステップにおいて入力された各スピーカ入力信号および各検出信号で構成される入力ベクトルに対して、各検出信号に含まれる第１の音響信号と第２の音響信号とを分離するための分離行列であって、第１の場所に設けられた複数のスピーカから当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素を含む分離行列を乗算することにより、当該乗算した結果である出力ベクトルを構成する複数の出力信号を算出する第１の信号算出ステップと、第１の信号算出ステップにおいて算出された出力信号間についての高次の相関を行列要素にもつ相関行列を算出する行列算出ステップと、行列算出ステップにおいて算出された相関行列を用いて、更新すべき分離行列に含まれる各行列要素を学習する学習ステップと、第１の信号算出ステップにおいて用いられた分離行列に含まれる各行列要素を、学習ステップにおいて学習された各行列要素に更新する更新ステップと、入力ステップにおいて入力された各スピーカ入力信号および各検出信号で構成される入力ベクトルに対して、更新ステップにおいて各行列要素が更新された分離行列を乗算することにより、各検出信号に含まれる第１の音響信号と第２の音響信号とが分離した出力ベクトルを構成する複数の出力信号を算出する第２の信号算出ステップとを有し、キャンセルステップは、第２の信号算出ステップにおいて算出された各出力信号のうち、第１の音響信号のみを含む出力信号を第２の場所に設けられた複数のスピーカへ出力するとよい。 More preferably, the separation step includes a first acoustic signal and a second acoustic signal included in each detection signal with respect to an input vector composed of each speaker input signal and each detection signal input in the input step. A separation matrix including a plurality of matrix elements relating to respective transfer characteristics from a plurality of speakers provided at the first location to a plurality of microphones provided at the first location. A first signal calculation step for calculating a plurality of output signals constituting an output vector as a result of the multiplication by multiplication, and a higher-order correlation between the output signals calculated in the first signal calculation step A matrix calculation step for calculating a correlation matrix having matrix elements in the matrix element, and a correlation matrix calculated in the matrix calculation step. A learning step for learning each matrix element, an update step for updating each matrix element included in the separation matrix used in the first signal calculation step to each matrix element learned in the learning step, and an input step A first acoustic signal included in each detection signal is obtained by multiplying an input vector composed of each input speaker input signal and each detection signal by a separation matrix in which each matrix element is updated in the update step. And a second signal calculation step for calculating a plurality of output signals constituting an output vector from which the second acoustic signal is separated, and the cancellation step includes the output signals calculated in the second signal calculation step. Among them, an output signal including only the first acoustic signal may be output to a plurality of speakers provided at the second location.

また本発明は、プログラムにも向けられており、本発明に係るプログラムは、第１の場所に設けられた複数のマイクロホンで検出される当該第１の場所に存在する１つ以上の音源からの第１の音響信号と、第２の場所に設けられた複数のマイクロホンで検出される当該第２の場所に存在する１つ以上の音源からの第２の音響信号とを、第１および第２の場所それぞれに設けられた複数のスピーカを用いることによって、第１および第２の場所間で相互伝送する音響システムに用いられるコンピュータに実行させるプログラムであって、第１の場所に設けられた複数のマイクロホンは、第１の音響信号に加えてさらに、第１の場所に設けられた複数のスピーカで拡声された第２の音響信号を検出しており、コンピュータに、第１の場所に設けられた複数のスピーカそれぞれに入力されるべき第２の音響信号を含むスピーカ入力信号と、第１の場所に設けられた複数のマイクロホンの検出信号とを入力する入力ステップと、入力ステップにおいて入力された各スピーカ入力信号および各検出信号に対して独立成分分析に基づく信号処理を施すことによって、各検出信号に含まれる第１の音響信号と第２の音響信号とを分離する分離ステップと、分離ステップにおいて分離された第１の音響信号のみを第２の場所に設けられた複数のスピーカへ出力することによって、各検出信号に含まれる第２の音響信号をエコーとしてキャンセルするキャンセルステップとを実行させるプログラムである。 The present invention is also directed to a program, the program according to the present invention from one or more sound sources present at the first location detected by a plurality of microphones provided at the first location. A first acoustic signal and a second acoustic signal from one or more sound sources present at the second location detected by a plurality of microphones provided at the second location are first and second. A program to be executed by a computer used in an acoustic system for mutual transmission between the first and second locations by using a plurality of speakers provided at each location, and a plurality of speakers provided at the first location In addition to the first acoustic signal, the microphone detects a second acoustic signal amplified by a plurality of speakers provided at the first location, and is provided to the computer at the first location. An input step for inputting a speaker input signal including a second acoustic signal to be input to each of the plurality of speakers, and detection signals of a plurality of microphones provided at the first location, and input in the input step A separation step for separating the first acoustic signal and the second acoustic signal included in each detection signal by performing signal processing based on independent component analysis on each speaker input signal and each detection signal, and a separation step A cancellation step of canceling the second acoustic signal included in each detection signal as an echo is executed by outputting only the first acoustic signal separated in step 2 to a plurality of speakers provided in the second location. It is a program.

また本発明は、集積回路にも向けられており、本発明に係る集積回路は、第１の場所に設けられた複数のマイクロホンで検出される当該第１の場所に存在する１つ以上の音源からの第１の音響信号と、第２の場所に設けられた複数のマイクロホンで検出される当該第２の場所に存在する１つ以上の音源からの第２の音響信号とを、第１および第２の場所それぞれに設けられた複数のスピーカを用いることによって、第１および第２の場所間で相互伝送する音響システムに用いられる集積回路であって、第１の場所に設けられた複数のマイクロホンは、第１の音響信号に加えてさらに、第１の場所に設けられた複数のスピーカで拡声された第２の音響信号を検出しており、集積回路は、第１の場所に設けられた複数のスピーカそれぞれに入力されるべき第２の音響信号を含むスピーカ入力信号と、第１の場所に設けられた複数のマイクロホンの検出信号とを入力とし、独立成分分析に基づく信号処理を施して各検出信号に含まれる第１の音響信号と第２の音響信号とを分離し、当該分離した第１の音響信号のみを第２の場所に設けられた複数のスピーカへ出力することによって、各検出信号に含まれる第２の音響信号をエコーとしてキャンセルするエコーキャンセル部を備える。 The present invention is also directed to an integrated circuit, and the integrated circuit according to the present invention includes one or more sound sources existing at the first location detected by a plurality of microphones provided at the first location. And a second acoustic signal from one or more sound sources present at the second location detected by a plurality of microphones provided at the second location, and An integrated circuit used in an acoustic system for mutual transmission between the first and second locations by using a plurality of speakers provided at each of the second locations, the plurality of speakers provided at the first location In addition to the first acoustic signal, the microphone detects a second acoustic signal amplified by a plurality of speakers provided at the first location, and the integrated circuit is provided at the first location. Input to multiple speakers. The speaker input signal including the second acoustic signal to be input and the detection signals of the plurality of microphones provided at the first location are input, and signal processing based on independent component analysis is performed to include the first signal included in each detection signal. The first acoustic signal and the second acoustic signal are separated, and only the separated first acoustic signal is output to a plurality of speakers provided at the second location, whereby the second included in each detection signal. An echo canceling unit for canceling the acoustic signal as an echo.

また、本発明に係るマルチチャンネルエコーキャンセラは、検出すべき近端音源の音響信号に加えてさらに、複数のスピーカで拡声された音響信号が含まれる１つ以上のマイクロホンの検出信号に対して、当該複数のスピーカで拡声された音響信号をエコーとしてキャンセルして近端音源の音響信号のみを出力するマルチチャンネルエコーキャンセラであって、近端音源の音響信号と各スピーカで拡声された音響信号とを含む１つ以上のマイクロホンの検出信号と、音の方向感を有する各スピーカに入力されるべきスピーカ入力信号とを入力とし、近端音源の音響信号と各スピーカで拡声された音響信号とが同時に発生する所定時間において、出力すべき信号が近端音源の音響信号の音質を保持した信号となるように、かつ、出力すべき信号が各スピーカから拡声された音響信号をキャンセルした信号となるように適応動作することによって、１つ以上のマイクロホンの検出信号に含まれる近端音源の音響信号と各スピーカで拡声された音響信号とを分離し、分離した近端音源の音響信号のみを出力する音源分離部を備える。 Further, the multi-channel echo canceller according to the present invention, in addition to the acoustic signal of the near-end sound source to be detected, further detects a detection signal of one or more microphones including an acoustic signal amplified by a plurality of speakers. A multi-channel echo canceller that cancels the sound signals amplified by the plurality of speakers as echoes and outputs only the sound signal of the near-end sound source, and the sound signal of the near-end sound source and the sound signal amplified by each speaker The detection signal of one or more microphones including a speaker input signal to be input to each speaker having a sense of direction of sound, and an acoustic signal of a near-end sound source and an acoustic signal amplified by each speaker The signal to be output so that the signal to be output becomes a signal that retains the sound quality of the sound signal of the near-end sound source at the same time that is generated simultaneously. By performing an adaptive operation so that the sound signal amplified by each speaker is canceled, the sound signal of the near-end sound source included in the detection signal of one or more microphones and the sound signal amplified by each speaker are obtained. A sound source separation unit that separates and outputs only the acoustic signal of the separated near-end sound source is provided.

なお、近端音源の音響信号は、１つ以上のマイクロホンが設けられた場所に存在する１つ以上の音源から発生した音や、当該音の特徴を有する統計量などを示す信号であり、１つ以上のマイクロホンの検出信号に対して複数のスピーカで拡声された音響信号をキャンセルした信号である。また、音の方向感を有するスピーカ入力信号とは、例えば遠端側のマイクロホンで検出される音響信号の複数の特性（レベル比や時間遅延など）を、近端側の複数のスピーカを用いて再現することができる信号を意味する。また、音質を保持した信号とは、音源分離部に入力される近端音源の音響信号がもつ周波数特性（振幅周波数特性や振幅位相周波数特性など）を保持した信号を意味する。 The acoustic signal of the near-end sound source is a signal indicating a sound generated from one or more sound sources existing at a place where one or more microphones are provided, a statistic having characteristics of the sound, and the like. This is a signal obtained by canceling an acoustic signal amplified by a plurality of speakers with respect to detection signals of two or more microphones. In addition, a speaker input signal having a sense of direction of sound refers to, for example, a plurality of characteristics (level ratio, time delay, etc.) of an acoustic signal detected by a far-end microphone using a plurality of near-end speakers. A signal that can be reproduced. The signal retaining the sound quality means a signal retaining the frequency characteristics (amplitude frequency characteristics, amplitude phase frequency characteristics, etc.) of the acoustic signal of the near-end sound source input to the sound source separation unit.

より好ましくは、音源分離部は、各スピーカから１つ以上のマイクロホンまでの各伝達特性を推定し、各スピーカで拡声されて１つ以上のマイクロホンで検出された音響信号を推定した各伝達特性を用いて算出し、算出した音響信号を１つ以上のマイクロホンの検出信号から減算するように適応動作するとよい。 More preferably, the sound source separation unit estimates each transfer characteristic from each speaker to one or more microphones, and determines each transfer characteristic obtained by estimating an acoustic signal amplified by each speaker and detected by one or more microphones. The adaptive operation may be performed so that the calculated acoustic signal is subtracted from the detection signal of one or more microphones.

本発明によれば、マルチチャンネル再生時において音質劣化が生じることなく常に安定したエコーキャンセルを行うことができ、ダブルトーク時やシングルトーク時に関係なく安定したエコーキャンセルを行うことが可能なマルチチャンネルエコーキャンセラを提供することができる。 According to the present invention, multi-channel echo that can always perform stable echo cancellation without causing deterioration in sound quality during multi-channel playback, and can perform stable echo cancellation regardless of double talk or single talk. A canceller can be provided.

以下、本発明の実施形態について、図面を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（第１の実施形態）
図１を参照して、本発明の第１の実施形態に係るマルチチャンネルエコーキャンセラの構成について説明する。図１は、音響システムに用いられる第１の実施形態に係るマルチチャンネルエコーキャンセラの構成例を示す図である。音響システムは、近端側および遠端側の場所間で音響信号を相互伝送するシステムである。図１に示す音響システムでは、近端側には、話者Ｓ１およびＳ２が互いに異なる複数の音源（近端音源）として存在しており、遠端側には、話者Ｓ３およびＳ４が互いに異なる複数の音源（遠端音源）として存在しているとする。近端側には、遠端側の話者Ｓ３およびＳ４の音声からなる遠端音響信号を拡声するためのスピーカ１０および２０と、近端側の話者Ｓ１およびＳ２の音声からなる近端音響信号を検出するためのマイクロホン１１および２１とが設けられている。遠端側には、近端音響信号を拡声するためのスピーカ３０および４０と、遠端音響信号を検出するためのマイクロホン３１および４１とが設けられている。スピーカ（１０、２０、３０、４０）とマイクロホン（１１、２１、３１、４１）は、図８で説明したそれらと同様であり、同じ符号を付している。また図１に示す音響システムでは、一例として、本実施形態に係るマルチチャンネルエコーキャンセラが近端側にのみ設けられているとする。また図１に示す音響システムでは、一例として、近端側の話者Ｓ１およびＳ２と、遠端側の話者Ｓ３およびＳ４とが同時に会話をしている、いわゆるダブルトークが行われている状態であるとする。 (First embodiment)
With reference to FIG. 1, a configuration of a multi-channel echo canceller according to the first embodiment of the present invention will be described. FIG. 1 is a diagram illustrating a configuration example of a multichannel echo canceller according to a first embodiment used in an acoustic system. The acoustic system is a system that mutually transmits acoustic signals between locations on the near end side and the far end side. In the acoustic system shown in FIG. 1, speakers S1 and S2 exist as a plurality of different sound sources (near-end sound sources) on the near end side, and speakers S3 and S4 differ on the far end side. Assume that there are multiple sound sources (far-end sound sources). On the near end side, speakers 10 and 20 for amplifying far-end acoustic signals composed of the speech of the far-end speakers S3 and S4, and near-end sound composed of the speech of the near-end speakers S1 and S2 Microphones 11 and 21 for detecting signals are provided. On the far end side, speakers 30 and 40 for amplifying the near-end acoustic signal and microphones 31 and 41 for detecting the far-end acoustic signal are provided. The speakers (10, 20, 30, 40) and the microphones (11, 21, 31, 41) are the same as those described in FIG. 8, and are denoted by the same reference numerals. In the acoustic system shown in FIG. 1, as an example, it is assumed that the multichannel echo canceller according to the present embodiment is provided only on the near end side. In the acoustic system shown in FIG. 1, as an example, a state where a so-called double talk is performed in which the near-end speakers S1 and S2 and the far-end speakers S3 and S4 are talking simultaneously. Suppose that

図１において、本実施形態に係るマルチチャンネルエコーキャンセラは、エコーキャンセル部１により構成される。エコーキャンセル部１は、音源分離部１００、変換部１１０〜１１３、逆変換部１２０および１２１により構成される。 In FIG. 1, the multi-channel echo canceller according to the present embodiment is configured by an echo cancellation unit 1. The echo cancel unit 1 includes a sound source separation unit 100, conversion units 110 to 113, and inverse conversion units 120 and 121.

変換部１１０は、スピーカ２０に入力されるべき遠端音響信号を含むスピーカ入力信号ｓｐ２（ｔ）を入力とし、時間領域（ｔ）の信号から周波数領域（ω）の信号に変換する。変換部１１０において変換されたスピーカ入力信号ｓｐ２（ω）は、音源分離部１００へ出力される。変換部１１１は、スピーカ１０に入力されるべき遠端音響信号を含むスピーカ入力信号ｓｐ１（ｔ）を入力とし、時間領域の信号（ｔ）から周波数領域の信号（ω）に変換する。変換部１１１において変換されたスピーカ入力信号ｓｐ１（ω）は、音源分離部１００へ出力される。変換部１１２は、マイクロホン２１で検出された、近端音響信号とスピーカ１０および２０で拡声された遠端音響信号とを含む検出信号ｍ２（ｔ）を入力とし、時間領域の信号（ｔ）から周波数領域の信号（ω）に変換する。変換部１１２において変換された検出信号ｍ２（ω）は、音源分離部１００へ出力される。変換部１１３は、マイクロホン１１で検出された、近端音響信号とスピーカ１０および２０で拡声された遠端音響信号とを含む検出信号ｍ１（ｔ）を入力とし、時間領域の信号（ｔ）から周波数領域の信号（ω）に変換する。変換部１１３において変換された検出信号ｍ１（ω）は、音源分離部１００へ出力される。 The conversion unit 110 receives the speaker input signal sp2 (t) including the far-end acoustic signal to be input to the speaker 20, and converts the signal in the time domain (t) to the signal in the frequency domain (ω). Speaker input signal sp 2 (ω) converted by conversion unit 110 is output to sound source separation unit 100. The conversion unit 111 receives a speaker input signal sp1 (t) including a far-end acoustic signal to be input to the speaker 10, and converts the signal (t) in the time domain into a signal (ω) in the frequency domain. The speaker input signal sp1 (ω) converted by the conversion unit 111 is output to the sound source separation unit 100. The conversion unit 112 receives the detection signal m2 (t), which is detected by the microphone 21 and includes the near-end acoustic signal and the far-end acoustic signal amplified by the speakers 10 and 20, and is input from the time-domain signal (t). Convert to frequency domain signal (ω). The detection signal m2 (ω) converted by the conversion unit 112 is output to the sound source separation unit 100. The conversion unit 113 receives the detection signal m1 (t) including the near-end acoustic signal detected by the microphone 11 and the far-end acoustic signal amplified by the speakers 10 and 20 as input, and from the time-domain signal (t). Convert to frequency domain signal (ω). The detection signal m1 (ω) converted by the conversion unit 113 is output to the sound source separation unit 100.

音源分離部１００は、検出信号（ｍ１（ω）、ｍ２（ω））とスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））とを入力とする。音源分離部１００は、入力された信号に対し、独立成分分析に基づく音源分離処理を施す。この音源分離処理により、検出信号（ｍ１（ω）、ｍ２（ω））に含まれる近端音響信号と遠端音響信号とが分離される。独立成分分析に基づく音源分離処理については後述にて詳細に説明する。音源分離部１００は、分離した近端音響信号のみを出力信号ｙ１（ω）およびｙ２（ω）として出力する。ここで、遠端側の話者Ｓ３およびＳ４の音声からなる遠端音響信号は、話者Ｓ３およびＳ４にとって不要な信号、つまりエコーに相当する。したがって、音源分離部１００から近端音響信号のみが出力されることで、検出信号（ｍ１（ω）、ｍ２（ω））に含まれる遠端音響信号をエコーとしてキャンセルすることができる。 The sound source separation unit 100 receives the detection signals (m1 (ω), m2 (ω)) and the speaker input signals (sp1 (ω), sp2 (ω)) as inputs. The sound source separation unit 100 performs sound source separation processing based on independent component analysis on the input signal. By this sound source separation process, the near-end acoustic signal and the far-end acoustic signal included in the detection signals (m1 (ω), m2 (ω)) are separated. The sound source separation process based on the independent component analysis will be described in detail later. The sound source separation unit 100 outputs only the separated near-end acoustic signals as output signals y1 (ω) and y2 (ω). Here, the far-end acoustic signal composed of the voices of the far-end speakers S3 and S4 corresponds to a signal unnecessary for the speakers S3 and S4, that is, an echo. Therefore, by outputting only the near-end acoustic signal from the sound source separation unit 100, the far-end acoustic signal included in the detection signals (m1 (ω), m2 (ω)) can be canceled as an echo.

逆変換部１２０は、音源分離部１００からの出力信号ｙ１（ω）を入力とし、周波数領域（ω）の信号から時間領域（ｔ）の信号に変換する。逆変換部１２０において変換された音響信号ｙ１（ｔ）は、スピーカ３０へ出力され、スピーカ３０で拡声される。逆変換部１２１は、音源分離部１００からの出力信号ｙ２（ω）を入力とし、周波数領域（ω）の信号から時間領域（ｔ）の信号に変換する。逆変換部１２１において変換された出力信号ｙ２（ｔ）は、スピーカ４０へ出力され、スピーカ４０で拡声される。 The inverse conversion unit 120 receives the output signal y1 (ω) from the sound source separation unit 100 and converts the signal in the frequency domain (ω) into a signal in the time domain (t). The acoustic signal y 1 (t) converted by the inverse conversion unit 120 is output to the speaker 30 and is amplified by the speaker 30. The inverse conversion unit 121 receives the output signal y2 (ω) from the sound source separation unit 100 and converts the signal in the frequency domain (ω) into a signal in the time domain (t). The output signal y 2 (t) converted by the inverse conversion unit 121 is output to the speaker 40 and is amplified by the speaker 40.

以下、音源分離部１００で行われる独立成分分析に基づく音源分離処理について詳細に説明する。まず、音源分離部１００に入力される検出信号（ｍ１（ω）、ｍ２（ω））とスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））について詳細に説明する。検出信号（ｍ１（ω）、ｍ２（ω））とスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））は、式（４）のように表される。

式（４）において、話者Ｓ１の音声をｓ１（ω）、話者Ｓ２の音声をｓ２（ω）、話者Ｓ３の音声をｓ３（ω）、話者Ｓ４の音声をｓ４（ω）とする。また、話者Ｓ１からマイクロホン１１までの伝達特性をａ１１（ω）、話者Ｓ１からマイクロホン２１までの伝達特性をａ１２（ω）、話者Ｓ２からマイクロホン１１までの伝達特性をａ２１（ω）、話者Ｓ２からマイクロホン２１までの伝達特性をａ２２（ω）、話者Ｓ３からマイクロホン３１までの伝達特性をａ３１（ω）、話者Ｓ３からマイクロホン４１までの伝達特性をａ３２（ω）、話者Ｓ４からマイクロホン３１までの伝達特性をａ４１（ω）、話者Ｓ４からマイクロホン４１までの伝達特性をａ４２（ω）とする。 Hereinafter, the sound source separation process based on the independent component analysis performed by the sound source separation unit 100 will be described in detail. First, detection signals (m1 (ω), m2 (ω)) and speaker input signals (sp1 (ω), sp2 (ω)) input to the sound source separation unit 100 will be described in detail. The detection signals (m1 (ω), m2 (ω)) and the speaker input signals (sp1 (ω), sp2 (ω)) are expressed as in Expression (4).

In the equation (4), the voice of the speaker S1 is s1 (ω), the voice of the speaker S2 is s2 (ω), the voice of the speaker S3 is s3 (ω), and the voice of the speaker S4 is s4 (ω). To do. Further, the transfer characteristic from the speaker S1 to the microphone 11 is a11 (ω), the transfer characteristic from the speaker S1 to the microphone 21 is a12 (ω), the transfer characteristic from the speaker S2 to the microphone 11 is a21 (ω), The transfer characteristic from the speaker S2 to the microphone 21 is a22 (ω), the transfer characteristic from the speaker S3 to the microphone 31 is a31 (ω), the transfer characteristic from the speaker S3 to the microphone 41 is a32 (ω), and the speaker The transfer characteristic from S4 to the microphone 31 is a41 (ω), and the transfer characteristic from the speaker S4 to the microphone 41 is a42 (ω).

次に、図２を参照して、音源分離部１００の詳細な構成について説明する。図２は、音源分離部１００の詳細な構成を示す図である。図２において、音源分離部１００は、分離部１０１および学習部１０２により構成される。 Next, a detailed configuration of the sound source separation unit 100 will be described with reference to FIG. FIG. 2 is a diagram illustrating a detailed configuration of the sound source separation unit 100. In FIG. 2, the sound source separation unit 100 includes a separation unit 101 and a learning unit 102.

分離部１０１には、行列要素ｗｉｊ（行数ｉ、列数ｊは１〜４の整数）で構成される分離行列Ｗ（４、４）が設定されている。初期状態では、例えば単位行列が分離行列Ｗ（４、４）として設定されているとする。分離部１０１には、検出信号（ｍ１（ω）、ｍ２（ω））とスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））が入力される。分離部１０１は、設定された分離行列Ｗ（４、４）に基づく式（５）に従って、出力信号ｙ１〜ｙ４をそれぞれ算出し、算出した出力信号ｙ１〜ｙ４をそれぞれ出力する。具体的には、分離部１０１は、式（５）に示すように、検出信号（ｍ１（ω）、ｍ２（ω））およびスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））で構成される入力ベクトルと、設定された分離行列Ｗ（４、４）とを乗算することにより、出力信号ｙ１（ω）〜ｙ４（ω）で構成される出力ベクトルを算出する。

In the separation unit 101, a separation matrix W (4, 4) configured by matrix elements wij (the number of rows i and the number of columns j is an integer of 1 to 4) is set. In the initial state, for example, it is assumed that the unit matrix is set as the separation matrix W (4, 4). The separation unit 101 receives detection signals (m1 (ω), m2 (ω)) and speaker input signals (sp1 (ω), sp2 (ω)). The separation unit 101 calculates the output signals y1 to y4 according to the equation (5) based on the set separation matrix W (4, 4), and outputs the calculated output signals y1 to y4, respectively. Specifically, the separation unit 101 includes a detection signal (m1 (ω), m2 (ω)) and a speaker input signal (sp1 (ω), sp2 (ω)) as shown in Expression (5). Is multiplied by the set separation matrix W (4, 4) to calculate an output vector composed of the output signals y1 (ω) to y4 (ω).

学習部１０２は、出力信号ｙ１（ω）〜ｙ４（ω）を入力とし、独立成分分析に従って分離行列Ｗ（４、４）を学習する。具体的には、学習部１０２は、出力信号ｙ１（ω）〜ｙ４（ω）が互いに独立した信号となるように、分離行列Ｗ（４、４）を学習する。ここで独立とは、相関がないこと、つまり相関が０（ゼロ）であることを意味する。学習部１０２は、分離部１０１に設定された分離行列Ｗ（４、４）を、学習した分離行列Ｗ（４、４）に更新する。 The learning unit 102 receives the output signals y1 (ω) to y4 (ω) and learns the separation matrix W (4, 4) according to the independent component analysis. Specifically, the learning unit 102 learns the separation matrix W (4, 4) so that the output signals y1 (ω) to y4 (ω) are independent signals. Here, independent means that there is no correlation, that is, the correlation is 0 (zero). The learning unit 102 updates the separation matrix W (4, 4) set in the separation unit 101 to the learned separation matrix W (4, 4).

以下、学習部１０２の学習方法についてより具体的に説明する。勾配法を用いた周波数領域の独立成分分析に一般的に用いられる学習式は、式（６）のようになる。なお、独立成分分析に用いられる学習式は、式（６）に限定されるものではなく、他の学習式であってもよい。

式（６）において、出力信号ｙ１（ω）〜ｙ４（ω）の要素は、周波数領域の複素信号になっており、分離行列Ｗ（４、４）ｉ、Ｗ（４、４）ｉ―１を構成する各行列要素は、複素数の係数になっている。Ｉは４×４の単位行列を示し、ε｛・｝は時間平均を示し、＊は複素共役信号を示す。φ（・）は非線形関数を示す。非線形関数としては、信号の確率密度関数の対数部分を微分したものに対応したものを用いるのがよく、一般的にはｔａｎｈ（・）を用いる。αは学習速度を制御するためのステップサイズパラメータを示す。ｉは学習回数を示し、右辺のＷ（４、４）ｉを左辺のＷ（４、４）ｉ―１に代入することで学習が行われる。εの括弧内に示される行列は、高次の相関行列である。 Hereinafter, the learning method of the learning unit 102 will be described more specifically. A learning formula generally used for frequency domain independent component analysis using the gradient method is as shown in Formula (6). Note that the learning formula used for the independent component analysis is not limited to the formula (6), and may be another learning formula.

In Expression (6), the elements of the output signals y1 (ω) to y4 (ω) are complex signals in the frequency domain, and the separation matrices W (4, 4) i, W (4, 4) i−1. Each matrix element that constitutes is a complex coefficient. I represents a 4 × 4 unit matrix, ε {·} represents a time average, and * represents a complex conjugate signal. φ (·) indicates a nonlinear function. As the nonlinear function, it is preferable to use a function corresponding to a derivative of the logarithm part of the probability density function of the signal, and generally tanh (·) is used. α represents a step size parameter for controlling the learning speed. i indicates the number of learning times, and learning is performed by substituting W (4, 4) i on the right side into W (4, 4) i-1 on the left side. The matrix shown in parentheses for ε is a higher-order correlation matrix.

ここで、話者Ｓ１〜Ｓ４はすべて異なる話者であり、互いに独立した音源である。よって、式（４）中のｓ１（ω）〜ｓ４（ω）は互い独立しており、互いに相関のない音声になるといえる。また、検出信号（ｍ１（ω）、ｍ２（ω））は２つ入力され、この検出信号の数は近端側の話者（Ｓ１およびＳ２）の数と一致する。また、スピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））は２つ入力され、このスピーカ入力信号の数は遠端側の話者（Ｓ３およびＳ４）の数と一致する。したがって、これらの条件で学習部１０２が分離行列Ｗ（４、４）を学習し、当該学習が収束した場合、分離行列Ｗ（４、４）は、検出信号（ｍ１（ω）、ｍ２（ω））とスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））から、ｓ１（ω）〜ｓ４（ω）それぞれを分離することができる行列となる。つまり、学習が収束した分離行列Ｗ（４、４）に基づいて分離部１０１が算出した出力信号ｙ１には、検出信号（ｍ１（ω）、ｍ２（ω））に含まれていたｓ１（ω）のみが含まれ、出力信号ｙ２には、検出信号（ｍ１（ω）、ｍ２（ω））に含まれていたｓ２（ω）のみが含まれることになる。同様に、出力信号ｙ３には、スピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））に含まれていたｓ３（ω）のみが含まれ、出力信号ｙ４には、スピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））に含まれていたｓ４（ω）のみが含まれることになる。 Here, the speakers S1 to S4 are all different speakers and are independent sound sources. Therefore, it can be said that s1 (ω) to s4 (ω) in the equation (4) are independent from each other and have uncorrelated sounds. Further, two detection signals (m1 (ω), m2 (ω)) are input, and the number of detection signals matches the number of speakers (S1 and S2) on the near end side. Also, two speaker input signals (sp1 (ω), sp2 (ω)) are input, and the number of speaker input signals matches the number of far-end speakers (S3 and S4). Therefore, when the learning unit 102 learns the separation matrix W (4, 4) under these conditions, and the learning converges, the separation matrix W (4, 4) becomes the detection signals (m1 (ω), m2 (ω )) And the loudspeaker input signals (sp1 (ω), sp2 (ω)), a matrix capable of separating each of s1 (ω) to s4 (ω). That is, the output signal y1 calculated by the separation unit 101 based on the separation matrix W (4, 4) where learning has converged is included in the detection signal (m1 (ω), m2 (ω)). ), And the output signal y2 includes only s2 (ω) included in the detection signals (m1 (ω), m2 (ω)). Similarly, the output signal y3 includes only s3 (ω) included in the speaker input signals (sp1 (ω), sp2 (ω)), and the output signal y4 includes the speaker input signal (sp1 (ω). ), Sp2 (ω)), only s4 (ω) included in sp2 (ω)) is included.

なお、実際には、近端側において話者Ｓ１およびＳ２以外の独立した音源からの音として、近端側の環境ノイズなどがある。遠端側についても、同様である。しかしながら、これらの環境ノイズは、一般的に話者の音声に比べてガウス分布に近い信号である。このため、式（６）による学習、つまり独立成分分析による学習では、非ガウス性の大きい話者の音声を優先的に処理することになる。つまり、学習部１０２では、ｓ１（ω）〜ｓ４（ω）を優先的な処理対象とするので、検出信号（ｍ１（ω）、ｍ２（ω））とスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））から、ｓ１（ω）〜ｓ４（ω）それぞれを分離することができる分離行列が学習されることになる。 Actually, as sounds from independent sound sources other than the speakers S1 and S2 on the near end side, there are environmental noises on the near end side. The same applies to the far end side. However, these environmental noises are generally signals closer to a Gaussian distribution than the speaker's voice. For this reason, in the learning based on the equation (6), that is, the learning based on the independent component analysis, the speech of a speaker having a large non-Gaussian property is preferentially processed. That is, since the learning unit 102 preferentially processes s1 (ω) to s4 (ω), the detection signal (m1 (ω), m2 (ω)) and the speaker input signal (sp1 (ω), sp2 (Ω)), a separation matrix capable of separating each of s1 (ω) to s4 (ω) is learned.

このように、学習部１０２が独立成分分析に従って分離行列Ｗ（４、４）を学習することで、分離部１０１は、検出信号（ｍ１（ω）、ｍ２（ω））から近端音響信号を出力信号ｙ１およびｙ２として分離することができるとともに、スピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））から遠端音響信号を出力信号ｙ３およびｙ４として分離することができる。なお、遠端音響信号である出力信号ｙ３およびｙ４は、音源分離部１００からは出力されず、学習部１０２の学習にのみ用いられる。 In this manner, the learning unit 102 learns the separation matrix W (4, 4) according to the independent component analysis, so that the separation unit 101 obtains the near-end acoustic signal from the detection signals (m1 (ω), m2 (ω)). The output signals y1 and y2 can be separated, and the far-end acoustic signal can be separated from the speaker input signals (sp1 (ω), sp2 (ω)) as output signals y3 and y4. Note that the output signals y3 and y4, which are far-end acoustic signals, are not output from the sound source separation unit 100 and are used only for learning by the learning unit 102.

以下、出力信号ｙ１〜ｙ４がどのような信号になるかについて、式（７）を用いながら説明する。式（７）は、式（５）に式（４）を代入し、さらに、分離部１０１に入力される信号をより詳細に表したものである。なお、式（７）では、式（４）に示された（ω）の記載を省略している。

Hereinafter, what kind of signal the output signals y1 to y4 will be will be described using Expression (7). Expression (7) is a more detailed representation of the signal input to the separation unit 101 by substituting Expression (4) into Expression (5). In Expression (7), the description of (ω) shown in Expression (4) is omitted.

分離行列Ｗ（４、４）の学習が収束した状態では、分離部１０１から出力される出力信号ｙ１において、検出信号ｍ１に含まれる話者Ｓ３およびＳ４の音声（ｓ３・ａ３１、ｓ３・ａ３２、ｓ４・ａ４１、ｓ４・ａ４２）がエコーとしてキャンセルされ、検出信号ｍ１に含まれる話者Ｓ２の音声（ｓ２・ａ２１）がキャンセルされ、検出信号ｍ２に含まれる話者Ｓ１の音声（ｓ１・ａ１２）が加算されることとなる。そして最終的には、出力信号ｙ１は、ｓ１のみを含む信号となり、話者Ｓ１の音声しか含まない信号となる。同様に、出力信号ｙ２は、ｓ２のみを含む信号となり、話者Ｓ２の音声しか含まない信号となる。また、出力信号ｙ３は、ｓ３のみを含む信号となり、話者Ｓ３の音声しか含まない信号となる。また、出力信号ｙ４は、ｓ４のみを含む信号となり、話者Ｓ４の音声しか含まない信号となる。 In the state where learning of the separation matrix W (4, 4) has converged, in the output signal y1 output from the separation unit 101, the voices of the speakers S3 and S4 included in the detection signal m1 (s3 · a31, s3 · a32, s4 · a41, s4 · a42) are canceled as echoes, the voice of the speaker S2 (s2 · a21) included in the detection signal m1 is canceled, and the voice of the speaker S1 included in the detection signal m2 (s1 · a12) Will be added. Finally, the output signal y1 is a signal including only s1, and a signal including only the voice of the speaker S1. Similarly, the output signal y2 is a signal including only s2, and includes only the voice of the speaker S2. The output signal y3 is a signal including only s3, and includes only the voice of the speaker S3. The output signal y4 is a signal including only s4, and includes only the voice of the speaker S4.

ここで、例えば出力信号ｙ１において、検出信号ｍ１に含まれる話者Ｓ３およびＳ４の音声がエコーとしてキャンセルされるには、ｗ１３がスピーカ１０からマイクロホン１１までの伝達特性ｈ１１にマイナスを乗じたもの（−ｈ１１）となる必要がある。また、ｗ１４がスピーカ２０からマイクロホン１１までの伝達特性ｈ２１にマイナスを乗じたもの（−ｈ２１）となる必要がある。したがって、分離行列Ｗ（４、４）の学習が収束した状態では、スピーカ入力信号（ｓｐ１、ｓｐ２）に相関のある音声（ｓ３、ｓ４）が含まれていても、スピーカ１０およびスピーカ２０からマイクロホン１１までの伝達特性ｈ１１およびｈ２１が正しく推定できているといえる。 Here, for example, in the output signal y1, in order to cancel the voices of the speakers S3 and S4 included in the detection signal m1 as echoes, w13 is obtained by multiplying the transfer characteristic h11 from the speaker 10 to the microphone 11 by minus ( -H11). Further, w14 needs to be a value obtained by multiplying the transfer characteristic h21 from the speaker 20 to the microphone 11 by minus (−h21). Therefore, in the state where learning of the separation matrix W (4, 4) has converged, even if the speaker input signals (sp1, sp2) include correlated sounds (s3, s4), the microphones from the speaker 10 and the speaker 20 are used. It can be said that the transfer characteristics h11 and h21 up to 11 are correctly estimated.

なお、分離行列Ｗ（４、４）を構成する各行列要素のうち、（ｗ１１、ｗ１２、ｗ２１、ｗ２２）は、近端側の話者Ｓ１およびＳ２からマイクロホン１１および２１までの各伝達特性に関するものである。（ｗ１１、ｗ１２、ｗ２１、ｗ２２）は、検出信号（ｍ１、ｍ２）に含まれる話者Ｓ１の音声と話者Ｓ２の音声とを、出力信号ｙ１およびｙ２として分離するために用いられる。また、（ｗ１３、ｗ１４、ｗ２３、ｗ２４）は、近端側のスピーカ１０および２０からマイクロホン１１および２１までの各伝達特性に関するものである。（ｗ１３、ｗ１４、ｗ２３、ｗ２４）は、検出信号（ｍ１、ｍ２）からエコー成分である話者Ｓ３およびＳ４の音声をキャンセルするために用いられる。また、（ｗ３３、ｗ３４、ｗ４３、ｗ４４）は、遠端側の話者Ｓ３およびＳ４からマイクロホン３１および４１までの各伝達特性に関するものである。（ｗ３３、ｗ３４、ｗ４３、ｗ４４）は、スピーカ入力信号（ｓｐ１、ｓｐ２）に含まれる話者Ｓ３の音声と話者Ｓ４の音声とを、出力信号ｙ３およびｙ４として分離するために用いられる。（ｗ３１、ｗ３２、ｗ４１、ｗ４２）は、検出信号（ｍ１、ｍ２）を用いて、スピーカ入力信号（ｓｐ１、ｓｐ２）に含まれる話者Ｓ３の音声と話者Ｓ４の音声とを、出力信号ｙ３およびｙ４として分離するために用いられる。 Of the matrix elements constituting the separation matrix W (4, 4), (w11, w12, w21, w22) relate to the transfer characteristics from the speakers S1 and S2 on the near end side to the microphones 11 and 21. Is. (W11, w12, w21, w22) are used to separate the speech of the speaker S1 and the speech of the speaker S2 included in the detection signals (m1, m2) as output signals y1 and y2. Further, (w13, w14, w23, w24) relates to transfer characteristics from the speakers 10 and 20 on the near end side to the microphones 11 and 21. (W13, w14, w23, w24) are used to cancel the voices of the speakers S3 and S4, which are echo components, from the detection signals (m1, m2). Further, (w33, w34, w43, w44) relates to transfer characteristics from the far-end speakers S3 and S4 to the microphones 31 and 41. (W33, w34, w43, w44) are used to separate the speech of the speaker S3 and the speech of the speaker S4 included in the speaker input signals (sp1, sp2) as output signals y3 and y4. (W31, w32, w41, w42) uses the detection signals (m1, m2) to output the voice of the speaker S3 and the voice of the speaker S4 included in the speaker input signals (sp1, sp2) as the output signal y3. And y4 to separate.

以上のように、本実施形態では、音源分離部１００は、検出信号（ｍ１、ｍ２）に含まれる遠端音響信号をキャンセルすることによって、検出信号（ｍ１、ｍ２）に含まれる近端音響信号と遠端音響信号とを分離する。そして、音源分離部１００は、分離した近端音響信号のみを出力信号ｙ１およびｙ２として出力する。これにより、スピーカ入力信号（ｓｐ１、ｓｐ２）に相関のある音声が含まれているか否かに関係なく、検出信号（ｍ１、ｍ２）に含まれる遠端音響信号をエコーとしてキャンセルすることができる。その結果、本実施形態では、マルチチャンネル再生時において、解の不定性を解決しつつ、音質劣化を生じさせることなく常に安定したエコーキャンセルを行うことができる。 As described above, in the present embodiment, the sound source separation unit 100 cancels the far-end acoustic signal included in the detection signal (m1, m2), so that the near-end acoustic signal included in the detection signal (m1, m2). And the far-end acoustic signal are separated. Then, the sound source separation unit 100 outputs only the separated near-end acoustic signals as output signals y1 and y2. Thereby, the far-end acoustic signal included in the detection signal (m1, m2) can be canceled as an echo regardless of whether or not the correlated sound is included in the speaker input signals (sp1, sp2). As a result, in the present embodiment, it is possible to always perform stable echo cancellation without causing deterioration in sound quality while solving indefiniteness of solutions during multi-channel reproduction.

また、本実施形態では、従来のような適応フィルタを用いていないので、ダブルトーク時やシングルトーク時に関係なく正しい伝達特性を推定することができる。 Further, in the present embodiment, since a conventional adaptive filter is not used, it is possible to estimate a correct transfer characteristic regardless of double talk or single talk.

なお、上述では、音源分離部１００からの出力信号ｙ１およびｙ２には、近端音響信号として、話者Ｓ１およびＳ２の音声そのものが含まれるとしたが、これに限定されない。出力信号ｙ１およびｙ２には、近端音響信号として、話者Ｓ１およびＳ２の音声の特徴を示す統計量が含まれてもよい。つまり、近端音響信号は、話者Ｓ１およびＳ２の音声ではなく、話者Ｓ１およびＳ２の音声の特徴を示す統計量で構成される音響信号であってもよい。 In the above description, the output signals y1 and y2 from the sound source separation unit 100 include the voices of the speakers S1 and S2 as near-end acoustic signals. However, the present invention is not limited to this. The output signals y1 and y2 may include a statistic indicating the characteristics of the voices of the speakers S1 and S2 as the near-end acoustic signal. That is, the near-end acoustic signal may be an acoustic signal composed of statistics indicating characteristics of the voices of the speakers S1 and S2 instead of the voices of the speakers S1 and S2.

また、上述では、ダブルトーク時の処理について説明したが、シングルトーク時（話者Ｓ３およびＳ４のみが会話している時）においてもダブルトーク時と同様の処理を行うことによって、エコーがキャンセルされることは言うまでもない。但し、シングルトーク時においては、検出信号（ｍ１、ｍ２）には話者Ｓ１およびＳ２の音声が含まれないので、音源分離部１００は、出力信号ｙ１およびｙ２を無音信号として出力することになる。実際には、話者Ｓ３およびＳ４に対して独立した音源からの音である近端側の環境ノイズなどが無音信号として出力される。 In the above description, the processing at the time of double talk has been described. However, even at the time of single talk (when only the speakers S3 and S4 are talking), echo is canceled by performing the same processing as at the time of double talk. Needless to say. However, during single talk, since the detection signals (m1, m2) do not include the voices of the speakers S1 and S2, the sound source separation unit 100 outputs the output signals y1 and y2 as silence signals. . Actually, the near-end environmental noise that is a sound from a sound source independent of the speakers S3 and S4 is output as a silence signal.

また、上述では、近端側に話者Ｓ１およびＳ２の２名が存在するとしたが、これに限定されない。近端側に存在する話者は１名であってもよいし、３名以上であってもよい。 In the above description, there are two speakers S1 and S2 on the near end side, but the present invention is not limited to this. There may be one speaker on the near end side or three or more speakers.

まず、近端側に存在する話者が１名である場合として例えば話者Ｓ１だけが存在する場合について説明する。音源分離部１００は、近端音響信号に関しては、入力されるマイクロホンの検出信号の数分だけ分離する。本実施形態では、音源分離部１００に入力されるマイクロホンの検出信号の数は、ｍ１およびｍ２の２つである。よって、この場合、音源分離部１００は、出力信号ｙ１を話者Ｓ１の音声のみを含む信号として出力し、出力信号ｙ２を近端側の環境ノイズのみを含む無音信号として出力することになる。なお、この場合、話者Ｓ１の音声と環境ノイズは互いに独立であるため、話者Ｓ１の音声のみを含む出力信号ｙ１と、環境ノイズのみを含む出力信号ｙ２との間は独立となる。また、出力信号ｙ１およびｙ２と、出力信号ｙ３およびｙ４との間も独立となる。したがって、この場合であっても、音源分離部１００は、入力される信号から互いに独立な出力信号ｙ１〜ｙ４を分離することができ、検出信号（ｍ１、ｍ２）に含まれる近端音響信号と遠端音響信号とを分離することができる。 First, a case where only one speaker S1 exists will be described as a case where there is only one speaker on the near end side. The sound source separation unit 100 separates the near-end acoustic signal by the number of input microphone detection signals. In the present embodiment, the number of microphone detection signals input to the sound source separation unit 100 is two, m1 and m2. Therefore, in this case, the sound source separation unit 100 outputs the output signal y1 as a signal including only the voice of the speaker S1, and outputs the output signal y2 as a silence signal including only the environmental noise on the near end side. In this case, since the voice of the speaker S1 and the environmental noise are independent from each other, the output signal y1 including only the voice of the speaker S1 and the output signal y2 including only the environmental noise are independent. Further, the output signals y1 and y2 and the output signals y3 and y4 are also independent. Therefore, even in this case, the sound source separation unit 100 can separate the output signals y1 to y4 that are independent from each other from the input signal, and the near-end acoustic signal included in the detection signal (m1, m2). The far-end acoustic signal can be separated.

次に、近端側に存在する話者が３名である場合として例えば話者Ｓ５がさらに存在する場合について説明する。この場合において例えば話者Ｓ５が話者Ｓ１に近い位置に存在するとすると、音源分離部１００は、出力信号ｙ１を話者Ｓ１およびＳ５の音声のみを含む信号として出力し、出力信号ｙ２を話者Ｓ２の音声のみを含む信号として出力することになる。話者Ｓ５が話者Ｓ１に近い位置に存在する場合、話者Ｓ１からマイクロホン１１までの伝達特性と、話者Ｓ５からマイクロホン１１までの伝達特性とが近似し、話者Ｓ１からマイクロホン２１までの伝達特性と、話者Ｓ５からマイクロホン２１までの伝達特性とが近似する。このため、話者Ｓ５の音声は、伝達特性が近似する話者Ｓ１の音声を含む出力信号ｙ１に含まれることになる。なお、この場合、話者Ｓ１、Ｓ２、Ｓ５の音声は互いに独立であるため、話者Ｓ１およびＳ５の音声のみを含む出力信号ｙ１と、話者Ｓ２の音声のみを含む出力信号ｙ２との間は独立となる。また、出力信号ｙ１およびｙ２と、出力信号ｙ３およびｙ４との間も独立となる。したがって、音源分離部１００は、入力される信号から互いに独立な出力信号ｙ１〜ｙ４を分離することができ、検出信号（ｍ１、ｍ２）に含まれる近端音響信号と遠端音響信号とを分離することができる。 Next, as a case where there are three speakers on the near end side, for example, a case where speaker S5 further exists will be described. In this case, for example, if the speaker S5 is present at a position close to the speaker S1, the sound source separation unit 100 outputs the output signal y1 as a signal including only the voices of the speakers S1 and S5, and outputs the output signal y2 as the speaker. It is output as a signal including only the sound of S2. When the speaker S5 exists at a position close to the speaker S1, the transfer characteristics from the speaker S1 to the microphone 11 and the transfer characteristics from the speaker S5 to the microphone 11 are approximated, and the speaker S1 to the microphone 21 is approximated. The transfer characteristic and the transfer characteristic from the speaker S5 to the microphone 21 are approximated. Therefore, the voice of the speaker S5 is included in the output signal y1 including the voice of the speaker S1 whose transfer characteristics are approximate. In this case, since the voices of the speakers S1, S2, and S5 are independent from each other, the output signal y1 that includes only the voices of the speakers S1 and S5 and the output signal y2 that includes only the voices of the speaker S2 Become independent. Further, the output signals y1 and y2 and the output signals y3 and y4 are also independent. Therefore, the sound source separation unit 100 can separate the output signals y1 to y4 that are independent from each other from the input signal, and separates the near-end acoustic signal and the far-end acoustic signal included in the detection signals (m1, m2). can do.

また、上述では、遠端側に話者Ｓ３およびＳ４の２名が存在するとしたが、これに限定されない。遠端側に存在する話者は１名であってもよいし、３名以上であってもよい。 In the above description, there are two speakers S3 and S4 on the far end side, but the present invention is not limited to this. There may be one speaker on the far end side or three or more speakers.

まず、遠端側に存在する話者が１名である場合として例えば話者Ｓ３だけが存在する場合について説明する。音源分離部１００は、遠端音響信号に関しては、入力されるスピーカ入力信号の数分だけ分離する。本実施形態では、音源分離部１００に入力されるスピーカ入力信号の数は、ｓｐ１およびｓｐ２の２つである。よって、この場合、音源分離部１００は、出力信号ｙ３を話者Ｓ３の音声のみを含む信号として出力し、出力信号ｙ４を近端側の環境ノイズのみを含む無音信号として出力することになる。なお、この場合、話者Ｓ３の音声と環境ノイズは互いに独立であるため、話者Ｓ３の音声のみを含む出力信号ｙ３と、環境ノイズのみを含む出力信号ｙ４との間は独立となる。また、出力信号ｙ１およびｙ２と、出力信号ｙ３およびｙ４との間も独立となる。したがって、この場合であっても、音源分離部１００は、入力される信号から互いに独立な出力信号ｙ１〜ｙ４を分離することができ、検出信号（ｍ１、ｍ２）に含まれる近端音響信号と遠端音響信号とを分離することができる。 First, a case where only one speaker S3 exists will be described as a case where there is one speaker on the far end side. The sound source separation unit 100 separates the far-end acoustic signal by the number of input speaker input signals. In the present embodiment, the number of speaker input signals input to the sound source separation unit 100 is two, sp1 and sp2. Therefore, in this case, the sound source separation unit 100 outputs the output signal y3 as a signal including only the voice of the speaker S3, and outputs the output signal y4 as a silence signal including only environmental noise on the near end side. In this case, since the voice of the speaker S3 and the environmental noise are independent from each other, the output signal y3 including only the voice of the speaker S3 and the output signal y4 including only the environmental noise are independent. Further, the output signals y1 and y2 and the output signals y3 and y4 are also independent. Therefore, even in this case, the sound source separation unit 100 can separate the output signals y1 to y4 that are independent from each other from the input signal, and the near-end acoustic signal included in the detection signal (m1, m2). The far-end acoustic signal can be separated.

次に、遠端側に存在する話者が３名である場合として例えば話者Ｓ６がさらに存在する場合について説明する。この場合において例えば話者Ｓ６が話者Ｓ３に近い位置に存在するとすると、音源分離部１００は、出力信号ｙ３を話者Ｓ３およびＳ６の音声のみを含む信号として出力し、出力信号ｙ４を話者Ｓ４の音声のみを含む信号として出力することになる。話者Ｓ６が話者Ｓ３に近い位置に存在する場合、話者Ｓ３からマイクロホン３１までの伝達特性と、話者Ｓ６からマイクロホン３１までの伝達特性とが近似し、話者Ｓ３からマイクロホン４１までの伝達特性と、話者Ｓ６からマイクロホン４１までの伝達特性とが近似する。このため、話者Ｓ６の音声は、伝達特性が近似する話者Ｓ３の音声を含む出力信号ｙ３に含まれることになる。なお、この場合、話者Ｓ３、Ｓ４、Ｓ６の音声は互いに独立であるため、話者Ｓ３およびＳ６の音声のみを含む出力信号ｙ３と、話者Ｓ４の音声のみを含む音響信号ｙ４との間は独立となる。また、出力信号ｙ１およびｙ２と、出力信号ｙ３およびｙ４との間も独立となる。したがって、音源分離部１００は、入力される信号から互いに独立な出力信号ｙ１〜ｙ４を分離することができ、検出信号（ｍ１、ｍ２）に含まれる近端音響信号と遠端音響信号とを分離することができる。 Next, a case where there are three speakers S6 on the far end side, for example, will be described. In this case, for example, if the speaker S6 is present at a position close to the speaker S3, the sound source separation unit 100 outputs the output signal y3 as a signal including only the voices of the speakers S3 and S6, and outputs the output signal y4. It is output as a signal including only the sound of S4. When the speaker S6 is present at a position close to the speaker S3, the transfer characteristic from the speaker S3 to the microphone 31 and the transfer characteristic from the speaker S6 to the microphone 31 are approximated, and the transfer from the speaker S3 to the microphone 41 is approximated. The transfer characteristic and the transfer characteristic from the speaker S6 to the microphone 41 are approximated. For this reason, the voice of the speaker S6 is included in the output signal y3 including the voice of the speaker S3 whose transfer characteristics are approximate. In this case, since the voices of the speakers S3, S4, and S6 are independent from each other, the output signal y3 that includes only the voices of the speakers S3 and S6 and the acoustic signal y4 that includes only the voices of the speaker S4. Become independent. Further, the output signals y1 and y2 and the output signals y3 and y4 are also independent. Therefore, the sound source separation unit 100 can separate the output signals y1 to y4 that are independent from each other from the input signal, and separates the near-end acoustic signal and the far-end acoustic signal included in the detection signals (m1, m2). can do.

また、図１に示す音響システムでは、一例として、本実施形態に係るマルチチャンネルエコーキャンセラが近端側にのみ設けられているとしたが、遠端側にも設置してよいことは言うまでもない。 In the acoustic system shown in FIG. 1, as an example, the multi-channel echo canceller according to the present embodiment is provided only on the near end side, but it goes without saying that the multi-channel echo canceller may also be installed on the far end side.

（第２の実施形態）
図３を参照して、本発明の第２の実施形態に係るマルチチャンネルエコーキャンセラの構成について説明する。図３は、音響システムに用いられる第２の実施形態に係るマルチチャンネルエコーキャンセラの構成例を示す図である。図３に示す音響システムでは、近端側には、話者Ｓ１が音源（近端音源）として存在しており、遠端側には、話者Ｓ３およびＳ４が互いに異なる複数の音源（遠端音源）として存在しているとする。近端側には、遠端側の話者Ｓ３およびＳ４の音声からなる遠端音響信号を拡声するためのスピーカ１０および２０と、近端側の話者Ｓ１の音声からなる近端音響信号を検出するためのマイクロホン１１および２１とが設けられている。遠端側には、近端音響信号を拡声するためのスピーカ３０および４０と、遠端音響信号を検出するためのマイクロホン３１および４１とが設けられている。スピーカ（１０、２０、３０、４０）とマイクロホン（１１、２１、３１、４１）は、図８で説明したそれらと同様であり、同じ符号を付している。また図３に示す音響システムでは、一例として、本実施形態に係るマルチチャンネルエコーキャンセラが近端側にのみ設けられているとする。また図３に示す音響システムでは、一例として、近端側の話者Ｓ１と、遠端側の話者Ｓ３およびＳ４とが同時に会話をしている、いわゆるダブルトークが行われている状態であるとする。 (Second Embodiment)
With reference to FIG. 3, the configuration of a multi-channel echo canceller according to the second embodiment of the present invention will be described. FIG. 3 is a diagram illustrating a configuration example of the multi-channel echo canceller according to the second embodiment used in the acoustic system. In the acoustic system shown in FIG. 3, the speaker S1 exists as a sound source (near-end sound source) on the near end side, and a plurality of sound sources (far end) from which the speakers S3 and S4 are different from each other exist on the far end side. Suppose that it exists as a sound source. On the near-end side, speakers 10 and 20 for amplifying the far-end acoustic signal composed of the speech of the far-end speakers S3 and S4 and the near-end acoustic signal composed of the speech of the near-end speaker S1 are provided. Microphones 11 and 21 for detection are provided. On the far end side, speakers 30 and 40 for amplifying the near-end acoustic signal and microphones 31 and 41 for detecting the far-end acoustic signal are provided. The speakers (10, 20, 30, 40) and the microphones (11, 21, 31, 41) are the same as those described in FIG. 8, and are denoted by the same reference numerals. In the acoustic system shown in FIG. 3, as an example, it is assumed that the multichannel echo canceller according to the present embodiment is provided only on the near end side. In the acoustic system shown in FIG. 3, as an example, the near-end speaker S1 and the far-end speakers S3 and S4 are talking at the same time, so-called double talk is being performed. And

図３において、本実施形態に係るマルチチャンネルエコーキャンセラは、エコーキャンセル部２により構成される。エコーキャンセル部２は、第１の音源分離部２１０、第２の音源分離部２２０、変換部２３０〜２３５、逆変換部２４０および２４１により構成される。 In FIG. 3, the multi-channel echo canceller according to the present embodiment is configured by an echo cancellation unit 2. The echo cancellation unit 2 includes a first sound source separation unit 210, a second sound source separation unit 220, conversion units 230 to 235, and inverse conversion units 240 and 241.

上述した第１の実施形態に係るエコーキャンセル部１では、マイクロホン１１および２１に対して１つの音源分離部１００が設けられていた。これに対し、本実施形態に係るエコーキャンセル部２では、マイクロホン１１および２１それぞれに対応するように、第１の音源分離部２１０および第２の音源分離部２２０が設けられている。つまり、本実施形態では、近端側の１つのマイクロホンに対して１つの音源分離部を設けた構成となっている。なお、変換部２３０〜２３５は、エコーキャンセル部１の変換部１１０〜１１３と同じ動作を行うが、図３では便宜上、参照符号を変えている。また逆変換部２４０および２４１についても、エコーキャンセル部１の逆変換部１２０および１２１と同じ動作を行うが、便宜上、参照符号を変えている。以下、第１の実施形態と異なる点を中心に説明する。 In the echo cancellation unit 1 according to the first embodiment described above, one sound source separation unit 100 is provided for the microphones 11 and 21. On the other hand, in the echo cancellation unit 2 according to the present embodiment, the first sound source separation unit 210 and the second sound source separation unit 220 are provided so as to correspond to the microphones 11 and 21, respectively. That is, in the present embodiment, one sound source separation unit is provided for one microphone on the near end side. Note that the conversion units 230 to 235 perform the same operation as the conversion units 110 to 113 of the echo cancellation unit 1, but in FIG. The inverse transform units 240 and 241 perform the same operation as the inverse transform units 120 and 121 of the echo cancel unit 1, but the reference numerals are changed for convenience. Hereinafter, a description will be given focusing on differences from the first embodiment.

第１の音源分離部２１０は、変換部２３０において周波数領域（ω）に変換されたスピーカ入力信号ｓｐ２（ω）と、変換部２３１において周波数領域（ω）に変換されたスピーカ入力信号ｓｐ１（ω）と、変換部２３２において周波数領域（ω）に変換された検出信号ｍ１（ω）とを入力とする。第１の音源分離部２１０は、入力された信号に対し、独立成分分析に基づく音源分離処理を施す。この音源分離処理により、検出信号ｍ１（ω）に含まれる近端音響信号と遠端音響信号とが分離される。独立成分分析に基づく音源分離処理は、第１の実施形態の処理とほぼ同様の処理となるが、後述にて詳細に説明する。第１の音源分離部２１０は、分離した近端音響信号のみを出力信号ｙ１ａ（ω）として出力する。ここで、遠端音響信号は、エコーに相当する。したがって、第１の音源分離部２１０から近端音響信号のみが出力されることで、検出信号ｍ１（ω）に含まれる遠端音響信号をエコーとしてキャンセルすることができる。第１の音源分離部２１０から出力された出力信号ｙ１ａ（ω）は、逆変換部２４０において時間領域（ｔ）の信号に変換される。時間領域（ｔ）に変換された出力信号ｙ１ａ（ｔ）は、スピーカ３０へ出力され、スピーカ３０で拡声される。 The first sound source separation unit 210 includes a speaker input signal sp2 (ω) converted into the frequency domain (ω) by the conversion unit 230 and a speaker input signal sp1 (ω) converted into the frequency domain (ω) by the conversion unit 231. ) And the detection signal m1 (ω) converted into the frequency domain (ω) by the conversion unit 232 as inputs. The first sound source separation unit 210 performs sound source separation processing based on independent component analysis on the input signal. By this sound source separation process, the near-end acoustic signal and the far-end acoustic signal included in the detection signal m1 (ω) are separated. The sound source separation process based on the independent component analysis is substantially the same as the process of the first embodiment, but will be described in detail later. The first sound source separation unit 210 outputs only the separated near-end acoustic signal as the output signal y1a (ω). Here, the far-end acoustic signal corresponds to an echo. Therefore, by outputting only the near-end acoustic signal from the first sound source separation unit 210, the far-end acoustic signal included in the detection signal m1 (ω) can be canceled as an echo. The output signal y1a (ω) output from the first sound source separation unit 210 is converted into a signal in the time domain (t) by the inverse conversion unit 240. The output signal y1a (t) converted into the time domain (t) is output to the speaker 30 and is amplified by the speaker 30.

第２の音源分離部２２０は、変換部２３３において周波数領域（ω）に変換されたスピーカ入力信号ｓｐ２（ω）と、変換部２３４において周波数領域（ω）に変換されたスピーカ入力信号ｓｐ１（ω）と、変換部２３５において周波数領域（ω）に変換された検出信号ｍ２（ω）とを入力とする。第２の音源分離部２２０は、入力された信号に対し、独立成分分析に基づく音源分離処理を施す。この音源分離処理により、検出信号ｍ２（ω）に含まれる近端音響信号と遠端音響信号とが分離される。独立成分分析に基づく音源分離処理は、第１の音源分離部２１０と同様の処理となる。第２の音源分離部２２０は、分離した近端音響信号のみを出力信号ｙ１ｂ（ω）として出力する。ここで、遠端音響信号は、エコーに相当する。したがって、第２の音源分離部２２０から近端音響信号のみが出力されることで、検出信号ｍ２（ω）に含まれる遠端音響信号をエコーとしてキャンセルすることができる。第２の音源分離部２２０から出力された出力信号ｙ１ｂ（ω）は、逆変換部２４１において時間領域（ｔ）の信号に変換される。時間領域（ｔ）に変換された出力信号ｙ１ｂ（ｔ）は、スピーカ４０へ出力され、スピーカ４０で拡声される。 The second sound source separation unit 220 includes the speaker input signal sp2 (ω) converted into the frequency domain (ω) by the conversion unit 233 and the speaker input signal sp1 (ω) converted into the frequency domain (ω) by the conversion unit 234. ) And the detection signal m2 (ω) converted into the frequency domain (ω) by the conversion unit 235. The second sound source separation unit 220 performs sound source separation processing based on independent component analysis on the input signal. By this sound source separation process, the near-end acoustic signal and the far-end acoustic signal included in the detection signal m2 (ω) are separated. The sound source separation process based on the independent component analysis is the same process as the first sound source separation unit 210. The second sound source separation unit 220 outputs only the separated near-end acoustic signal as the output signal y1b (ω). Here, the far-end acoustic signal corresponds to an echo. Therefore, by outputting only the near-end acoustic signal from the second sound source separation unit 220, the far-end acoustic signal included in the detection signal m2 (ω) can be canceled as an echo. The output signal y1b (ω) output from the second sound source separation unit 220 is converted into a signal in the time domain (t) by the inverse conversion unit 241. The output signal y1b (t) converted to the time domain (t) is output to the speaker 40 and is amplified by the speaker 40.

以下、第１および第２の音源分離部２１０および２２０で行われる独立成分分析に基づく音源分離処理について詳細に説明する。ここでは、一例として第１の音源分離部２１０を用いて説明する。まず、第１の音源分離部２１０に入力される検出信号ｍ１（ω）とスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））について詳細に説明する。検出信号ｍ１（ω）とスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））は、式（８）のように表される。

式（８）において、話者Ｓ１の音声をｓ１（ω）、話者Ｓ３の音声をｓ３（ω）、話者Ｓ４の音声をｓ４（ω）とする。また、話者Ｓ１からマイクロホン１１までの伝達特性をａ１１（ω）、話者Ｓ１からマイクロホン２１までの伝達特性をａ１２（ω）、話者Ｓ３からマイクロホン３１までの伝達特性をａ３１（ω）、話者Ｓ３からマイクロホン４１までの伝達特性をａ３２（ω）、話者Ｓ４からマイクロホン３１までの伝達特性をａ４１（ω）、話者Ｓ４からマイクロホン４１までの伝達特性をａ４２（ω）とする。 Hereinafter, a sound source separation process based on independent component analysis performed by the first and second sound

source separation units

210 and 220 will be described in detail. Here, the first sound source separation unit 210 will be described as an example. First, the detection signal m1 (ω) and speaker input signals (sp1 (ω), sp2 (ω)) input to the first sound source separation unit 210 will be described in detail. The detection signal m1 (ω) and the speaker input signals (sp1 (ω), sp2 (ω)) are expressed as Expression (8).

In the equation (8), the voice of the speaker S1 is s1 (ω), the voice of the speaker S3 is s3 (ω), and the voice of the speaker S4 is s4 (ω). Further, the transfer characteristic from the speaker S1 to the microphone 11 is a11 (ω), the transfer characteristic from the speaker S1 to the microphone 21 is a12 (ω), the transfer characteristic from the speaker S3 to the microphone 31 is a31 (ω), The transfer characteristic from the speaker S3 to the microphone 41 is a32 (ω), the transfer characteristic from the speaker S4 to the microphone 31 is a41 (ω), and the transfer characteristic from the speaker S4 to the microphone 41 is a42 (ω).

次に、図４を参照して、第１の音源分離部２１０の詳細な構成について説明する。図４は、第１の音源分離部２１０の詳細な構成を示す図である。図４において、第１の音源分離部２１０は、分離部２１１および学習部２１２により構成される。 Next, a detailed configuration of the first sound source separation unit 210 will be described with reference to FIG. FIG. 4 is a diagram illustrating a detailed configuration of the first sound source separation unit 210. In FIG. 4, the first sound source separation unit 210 includes a separation unit 211 and a learning unit 212.

分離部２１１には、行列要素ｗｉｊ（行数ｉ、列数ｊは１〜３の整数）で構成される分離行列Ｗ（３、３）が設定されている。初期状態では、例えば単位行列が分離行列Ｗ（３、３）として設定されているとする。分離部２１１には、検出信号ｍ１（ω）とスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））とが入力される。分離部２１１は、設定された分離行列Ｗ（３、３）に基づく式（９）に従って、出力信号ｙ１ａ〜ｙ３ａをそれぞれ算出し、算出した出力信号ｙ１ａ〜ｙ３ａをそれぞれ出力する。具体的には、分離部２１１は、式（９）に示すように、検出信号ｍ１（ω）およびスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））で構成される入力ベクトルと、設定された分離行列Ｗ（３、３）とを乗算することにより、出力信号ｙ１ａ（ω）〜ｙ３ａ（ω）で構成される出力ベクトルを算出する。

In the separation unit 211, a separation matrix W (3, 3) configured by matrix elements wij (the number of rows i and the number of columns j is an integer of 1 to 3) is set. In the initial state, for example, it is assumed that the unit matrix is set as the separation matrix W (3, 3). The separation unit 211 receives a detection signal m1 (ω) and speaker input signals (sp1 (ω), sp2 (ω)). The separation unit 211 calculates the output signals y1a to y3a according to Equation (9) based on the set separation matrix W (3, 3), and outputs the calculated output signals y1a to y3a, respectively. Specifically, the separation unit 211 is set with an input vector composed of the detection signal m1 (ω) and the speaker input signals (sp1 (ω), sp2 (ω)) as shown in Expression (9). The output vector composed of the output signals y1a (ω) to y3a (ω) is calculated by multiplying the separation matrix W (3, 3).

学習部２１２は、出力信号ｙ１ａ（ω）〜ｙ３ａ（ω）を入力とし、独立成分分析に従って分離行列Ｗ（３、３）を学習する。具体的には、学習部２１２は、出力信号ｙ１ａ（ω）〜ｙ３ａ（ω）が互いに独立した信号となるように、分離行列Ｗ（３、３）を学習する。学習部２１２は、分離部２１１に設定された分離行列Ｗ（３、３）を、学習した分離行列Ｗ（３、３）に更新する。 The learning unit 212 receives the output signals y1a (ω) to y3a (ω) and learns the separation matrix W (3, 3) according to the independent component analysis. Specifically, the learning unit 212 learns the separation matrix W (3, 3) so that the output signals y1a (ω) to y3a (ω) are independent signals. The learning unit 212 updates the separation matrix W (3, 3) set in the separation unit 211 to the learned separation matrix W (3, 3).

以下、学習部２１２の学習方法についてより具体的に説明する。勾配法を用いた周波数領域の独立成分分析に一般的に用いられる学習式は、式（１０）のようになる。なお、独立成分分析に用いられる学習式は、第１の実施形態と同様、式（１０）に限定されるものではなく、他の学習式であってもよい。

式（１０）において、出力信号ｙ１ａ（ω）〜ｙ３ａ（ω）の要素は、周波数領域の複素信号になっており、分離行列Ｗ（３、３）ｉ、Ｗ（３、３）ｉ―１を構成する各行列要素は、複素数の係数になっている。Ｉは３×３の単位行列を示し、ε｛・｝は時間平均を示し、＊は複素共役信号を示す。φ（・）は非線形関数を示す。非線形関数としては、信号の確率密度関数の対数部分を微分したものに対応したものを用いるのがよく、一般的にはｔａｎｈ（・）を用いる。αは学習速度を制御するためのステップサイズパラメータを示す。ｉは学習回数を示し、右辺のＷ（３、３）ｉを左辺のＷ（３、３）ｉ―１に代入することで学習が行われる。εの括弧内に示される行列は、高次の相関行列である。 Hereinafter, the learning method of the learning unit 212 will be described more specifically. A learning formula generally used for frequency-domain independent component analysis using the gradient method is as shown in Formula (10). Note that the learning formula used for the independent component analysis is not limited to the formula (10), as in the first embodiment, and may be another learning formula.

In Expression (10), the elements of the output signals y1a (ω) to y3a (ω) are complex signals in the frequency domain, and the separation matrices W (3, 3) i, W (3, 3) i−1. Each matrix element that constitutes is a complex coefficient. I represents a 3 × 3 unit matrix, ε {·} represents a time average, and * represents a complex conjugate signal. φ (·) indicates a nonlinear function. As the nonlinear function, it is preferable to use a function corresponding to a derivative of the logarithm part of the probability density function of the signal, and generally tanh (·) is used. α represents a step size parameter for controlling the learning speed. i represents the number of learnings, and learning is performed by substituting W (3,3) i on the right side into W (3,3) i-1 on the left side. The matrix shown in parentheses for ε is a higher-order correlation matrix.

ここで、話者Ｓ１、Ｓ３、Ｓ４はすべて異なる話者であり、互いに独立した音源である。よって、式（８）中のｓ１（ω）、ｓ３（ω）、ｓ４（ω）は互い独立しており、互いに相関のない音声になるといえる。また、検出信号ｍ１（ω）は１つ入力され、この検出信号の数は近端側の話者Ｓ１の数と一致する。また、スピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））は２つ入力され、このスピーカ入力信号の数は遠端側の話者（Ｓ３およびＳ４）の数と一致する。したがって、これらの条件で学習部２１２が分離行列Ｗ（３、３）を学習し、当該学習が収束した場合、分離行列Ｗ（３、３）は、検出信号ｍ１（ω）とスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））から、ｓ１（ω）、ｓ３（ω）、ｓ４（ω）それぞれを分離することができる行列となる。つまり、学習が収束した分離行列Ｗ（３、３）に基づいて分離部１０１が算出した出力信号ｙ１ａには、検出信号ｍ１（ω）に含まれていたｓ１（ω）のみが含まれることになる。同様に、出力信号ｙ２ａには、スピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））に含まれていたｓ３（ω）のみが含まれ、出力信号ｙ３ａには、スピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））に含まれていたｓ４（ω）のみが含まれることになる。 Here, the speakers S1, S3, and S4 are all different speakers and are independent sound sources. Therefore, it can be said that s1 (ω), s3 (ω), and s4 (ω) in the equation (8) are independent from each other and the speech has no correlation with each other. One detection signal m1 (ω) is input, and the number of detection signals matches the number of near-end speakers S1. Also, two speaker input signals (sp1 (ω), sp2 (ω)) are input, and the number of speaker input signals matches the number of far-end speakers (S3 and S4). Therefore, when the learning unit 212 learns the separation matrix W (3, 3) under these conditions, and the learning converges, the separation matrix W (3, 3) becomes the detection signal m1 (ω) and the speaker input signal ( This is a matrix that can separate s1 (ω), s3 (ω), and s4 (ω) from sp1 (ω) and sp2 (ω)). That is, the output signal y1a calculated by the separation unit 101 based on the separation matrix W (3, 3) in which learning converges includes only s1 (ω) included in the detection signal m1 (ω). Become. Similarly, the output signal y2a includes only s3 (ω) included in the speaker input signals (sp1 (ω), sp2 (ω)), and the output signal y3a includes the speaker input signal (sp1 (ω ), Sp2 (ω)), only s4 (ω) included in sp2 (ω)) is included.

このように、学習部２１２が独立成分分析に従って分離行列Ｗ（３、３）を学習することで、分離部２１１は、検出信号ｍ１（ω）から近端音響信号を出力信号ｙ１ａとして分離することができるとともに、スピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））から遠端音響信号を出力信号ｙ２ａおよびｙ３ａとして分離することができる。なお、遠端音響信号である出力信号ｙ２ａおよびｙ３ａは、第１の音源分離部２１０からは出力されず、学習部２１２の学習にのみ用いられる。 Thus, the learning unit 212 learns the separation matrix W (3, 3) according to the independent component analysis, so that the separation unit 211 separates the near-end acoustic signal from the detection signal m1 (ω) as the output signal y1a. In addition, the far-end acoustic signal can be separated from the speaker input signals (sp1 (ω), sp2 (ω)) as output signals y2a and y3a. Note that the output signals y2a and y3a, which are far-end acoustic signals, are not output from the first sound source separation unit 210 and are used only for learning by the learning unit 212.

なお、第２の音源分離部２２０においても、第１の音源分離部２１０と同様の音源分離処理が施される。これにより、第２の音源分離部２２０からは、検出信号ｍ２（ω）に含まれていたｓ１（ω）のみを含む出力信号ｙ１ｂが出力される。 Note that the second sound source separation unit 220 also performs sound source separation processing similar to that of the first sound source separation unit 210. As a result, the second sound source separation unit 220 outputs an output signal y1b including only s1 (ω) included in the detection signal m2 (ω).

以下、出力信号ｙ１ａ〜ｙ３ａがどのような信号になるかについて、式（１１）を用いながら説明する。式（１１）は、式（９）に式（８）を代入し、さらに、分離部２１１に入力される信号をより詳細に表わしたものである。なお、式（１１）では、式（８）に示された（ω）の記載を省略している。

Hereinafter, what kind of signal the output signals y1a to y3a become will be described using Equation (11). Expression (11) is obtained by substituting Expression (8) into Expression (9) and further represents a signal input to the separation unit 211 in more detail. In Expression (11), the description of (ω) shown in Expression (8) is omitted.

分離行列Ｗ（３、３）の学習が収束した状態では、ｗ１１＝γ（任意の実数）、ｗ１２＝−ｈ１１γ、ｗ１３＝−ｈ２１γとなり、最終的には出力信号ｙ１ａはｙ１ａ＝ｓ１・ａ１１となる。つまり、出力信号ｙ１ａにおいて、検出信号ｍ１に含まれる話者Ｓ３およびＳ４の音声（ｓ３・ａ３１、ｓ３・ａ３２、ｓ４・ａ４１、ｓ４・ａ４２）がエコーとしてキャンセルされることになる。また分離行列Ｗ（３、３）の学習が収束した状態では、ｗ２１およびｗ３１はｗ２１＝ｗ３１＝０となり、行列要素（ｗ２２、ｗ２３、ｗ３２、ｗ３３）は、ｓ３（ω）とｓ４（ω）とを分離することが可能な伝達特性となる。これにより、出力信号ｙ２ａは、最終的にはｓ３のみを含む信号となり、出力信号ｙ３ａは、ｓ４のみを含む信号となる。このように分離行列Ｗ（３、３）が収束した状態では、スピーカ入力信号（ｓｐ１、ｓｐ２）に相関のある音声（ｓ３、ｓ４）が含まれていても、スピーカ１０およびスピーカ２０からマイクロホン１１までの伝達特性ｈ１１およびｈ２１が正しく推定できているといえる。 In the state where learning of the separation matrix W (3, 3) has converged, w11 = γ (arbitrary real number), w12 = −h11γ, w13 = −h21γ, and finally the output signal y1a is y1a = s1 · a11. Become. That is, in the output signal y1a, the voices (s3 · a31, s3 · a32, s4 · a41, s4 · a42) of the speakers S3 and S4 included in the detection signal m1 are canceled as echoes. In the state where learning of the separation matrix W (3, 3) has converged, w21 and w31 are w21 = w31 = 0, and the matrix elements (w22, w23, w32, w33) are s3 (ω) and s4 (ω). And transfer characteristics that can be separated from each other. Thereby, the output signal y2a finally becomes a signal including only s3, and the output signal y3a becomes a signal including only s4. In this state where the separation matrix W (3, 3) is converged, even if the speaker input signals (sp1, sp2) include correlated sounds (s3, s4), the speaker 10 and the speaker 20 to the microphone 11 are included. It can be said that the transfer characteristics h11 and h21 are correctly estimated.

なお、収束した分離行列Ｗ（３、３）を構成する各行列要素のうち、（ｗ１１）は、近端側の話者Ｓ１からマイクロホン１１までの各伝達特性に関するものである。（ｗ１１）は、ｓ１・ａ１１の信号レベルを規定するために用いられる。（ｗ１２、ｗ１３）は、近端側のスピーカ１０および２０からマイクロホン１１までの各伝達特性に関するものである。（ｗ１２、ｗ１３）は、検出信号ｍ１からエコー成分である話者Ｓ３およびＳ４の音声をキャンセルするために用いられる。（ｗ２２、ｗ２３、ｗ３２、ｗ３３）は、遠端側の話者Ｓ３およびＳ４からマイクロホン３１および４１までの各伝達特性に関するものである。（ｗ２２、ｗ２３、ｗ３２、ｗ３３）は、スピーカ入力信号（ｓｐ１、ｓｐ２）に含まれる話者Ｓ３の音声と話者Ｓ４の音声とを、出力信号ｙ２ａおよびｙ３ａとして分離するために用いられる。（ｗ２１、ｗ３１）は、検出信号ｍ１を用いて、スピーカ入力信号（ｓｐ１、ｓｐ２）に含まれる話者Ｓ３の音声と話者Ｓ４の音声とを、出力信号ｙ２ａおよびｙ３ａとして分離するために用いられる。 Of the matrix elements constituting the converged separation matrix W (3, 3), (w11) relates to the transfer characteristics from the near-end speaker S1 to the microphone 11. (W11) is used to define the signal level of s1 · a11. (W12, w13) relates to each transfer characteristic from the speakers 10 and 20 on the near end side to the microphone 11. (W12, w13) is used to cancel the voices of the speakers S3 and S4, which are echo components, from the detection signal m1. (W22, w23, w32, w33) relate to respective transfer characteristics from the far-end speakers S3 and S4 to the microphones 31 and 41. (W22, w23, w32, w33) are used to separate the speech of the speaker S3 and the speech of the speaker S4 included in the speaker input signals (sp1, sp2) as output signals y2a and y3a. (W21, w31) are used to separate the speech of the speaker S3 and the speech of the speaker S4 included in the speaker input signals (sp1, sp2) as the output signals y2a and y3a using the detection signal m1. It is done.

以上のように、本実施形態では、第１の音源分離部２１０は、検出信号ｍ１に含まれる近端音響信号と遠端音響信号とを分離し、分離した近端音響信号のみを出力信号ｙ１ａとして出力する。また、第２の音源分離部２２０は、検出信号ｍ２に含まれる近端音響信号と遠端音響信号とを分離し、分離した近端音響信号のみを出力信号ｙ１ｂとして出力する。これにより、スピーカ入力信号（ｓｐ１、ｓｐ２）に相関のある音声が含まれているか否かに関係なく、検出信号（ｍ１、ｍ２）に含まれる遠端音響信号をエコーとしてキャンセルすることができる。その結果、本実施形態では、マルチチャンネル再生時において、解の不定性を解決しつつ、音質劣化を生じさせることなく常に安定したエコーキャンセルを行うことができる。 As described above, in the present embodiment, the first sound source separation unit 210 separates the near-end acoustic signal and the far-end acoustic signal included in the detection signal m1, and outputs only the separated near-end acoustic signal as the output signal y1a. Output as. The second sound source separation unit 220 separates the near-end acoustic signal and the far-end acoustic signal included in the detection signal m2, and outputs only the separated near-end acoustic signal as the output signal y1b. Thereby, the far-end acoustic signal included in the detection signal (m1, m2) can be canceled as an echo regardless of whether or not the correlated sound is included in the speaker input signals (sp1, sp2). As a result, in the present embodiment, it is possible to always perform stable echo cancellation without causing deterioration in sound quality while solving indefiniteness of solutions during multi-channel reproduction.

また、本実施形態では、マイクロホン１１および２１それぞれに対応するように、第１の音源分離部２１０および第２の音源分離部２２０が設けられている。このため、第１の音源分離部２１０からは、検出信号ｍ１に含まれていた話者Ｓ１の音声ｓ１のみを含む出力信号ｙ１ａが出力され、第２の音源分離部２２０からは、検出信号ｍ２に含まれていた話者Ｓ１の音声ｓ１のみを含む出力信号ｙ１ｂが出力されることになる。出力信号ｙ１ａは、検出信号ｍ１に含まれていた話者Ｓ１の音声ｓ１のみを含むので、話者Ｓ１からマイクロホン１１へ向かう方向感を有する信号となる。同様に、出力信号ｙ１ｂは、検出信号ｍ２に含まれていた話者Ｓ１の音声ｓ１のみを含むので、話者Ｓ１からマイクロホン２１へ向かう方向感を有する信号となる。したがって、これらの出力信号ｙ１ａおよびｙ１ｂが遠端側のスピーカ３０および４０で拡声されると、話者Ｓ３およびＳ４は、拡声される話者Ｓ１の音声に対して方向感を感じることができる。 In the present embodiment, a first sound source separation unit 210 and a second sound source separation unit 220 are provided so as to correspond to the microphones 11 and 21, respectively. Therefore, the first sound source separation unit 210 outputs an output signal y1a including only the voice S1 of the speaker S1 included in the detection signal m1, and the second sound source separation unit 220 outputs the detection signal m2. The output signal y1b including only the voice s1 of the speaker S1 included in the output is output. Since the output signal y1a includes only the voice s1 of the speaker S1 included in the detection signal m1, the output signal y1a is a signal having a sense of direction from the speaker S1 toward the microphone 11. Similarly, since the output signal y1b includes only the voice s1 of the speaker S1 included in the detection signal m2, the output signal y1b is a signal having a sense of direction from the speaker S1 to the microphone 21. Therefore, when these output signals y1a and y1b are louded by the far-end speakers 30 and 40, the speakers S3 and S4 can feel a sense of direction with respect to the voice of the loudspeaker S1.

なお、上述では、ダブルトーク時の処理について説明したが、シングルトーク時（話者Ｓ３およびＳ４のみが会話している時）においてもダブルトーク時と同様の処理を行うことによって、エコーがキャンセルされることは言うまでもない。但し、シングルトーク時においては、検出信号ｍ１には話者Ｓ１の音声が含まれないので、第１の音源分離部２１０は、出力信号ｙ１ａを無音信号として出力することになる。実際には、話者Ｓ３およびＳ４に対して独立した音源からの音である近端側の環境ノイズなどが無音信号として出力される。 Although the processing at the time of double talk has been described above, the echo is canceled by performing the same processing at the time of single talk (when only the speakers S3 and S4 are talking). Needless to say. However, during single talk, the detection signal m1 does not include the voice of the speaker S1, so the first sound source separation unit 210 outputs the output signal y1a as a silence signal. Actually, the near-end environmental noise that is a sound from a sound source independent of the speakers S3 and S4 is output as a silence signal.

また、上述では、近端側に話者Ｓ１の１名が存在するとしたが、これに限定されない。近端側に存在する話者は２名以上であってもよい。 In the above description, it is assumed that there is one speaker S1 on the near end side, but the present invention is not limited to this. There may be two or more speakers on the near end.

近端側に存在する話者が２名である場合として例えば話者Ｓ２がさらに存在する場合について説明する。第１の音源分離部２１０および第２の音源分離部２２０は、近端音響信号に関しては、入力されるマイクロホンの検出信号の数分だけ分離する。本実施形態では、第１の音源分離部２１０に入力されるマイクロホンの検出信号の数は、ｍ１の１つであり、第２の音源分離部２２０に入力されるマイクロホンの検出信号の数は、ｍ２の１つである。よって、この場合、第１の音源分離部２１０からは、検出信号ｍ１に含まれていた話者Ｓ１およびＳ２の音声のみを含む出力信号ｙ１ａが出力され、第２の音源分離部２２０からは、検出信号ｍ２に含まれていた話者Ｓ１およびＳ２の音声のみを含む出力信号ｙ１ｂが出力されることになる。なお、この場合、話者Ｓ１〜Ｓ４の音声は互いに独立であるため、話者Ｓ１およびＳ２の音声のみを含む出力信号ｙ１ａと、話者Ｓ３の音声のみを含む出力信号ｙ２ａと、話者Ｓ４の音声のみを含む出力信号ｙ３ａとの間も独立となる。したがって、第１の音源分離部２１０は、入力される信号から互いに独立な出力信号ｙ１ａ〜ｙ３ａを分離することができ、検出信号ｍ１に含まれる近端音響信号と遠端音響信号とを分離することができる。このことは、第２の音源分離部２２０についても同様である。 As an example in which there are two speakers on the near end side, for example, a case where a speaker S2 further exists will be described. The first sound source separation unit 210 and the second sound source separation unit 220 separate the near-end acoustic signals by the number of input microphone detection signals. In the present embodiment, the number of microphone detection signals input to the first sound source separation unit 210 is one of m1, and the number of microphone detection signals input to the second sound source separation unit 220 is One of m2. Therefore, in this case, the first sound source separation unit 210 outputs the output signal y1a including only the voices of the speakers S1 and S2 included in the detection signal m1, and the second sound source separation unit 220 The output signal y1b including only the voices of the speakers S1 and S2 included in the detection signal m2 is output. In this case, since the voices of the speakers S1 to S4 are independent from each other, the output signal y1a including only the voices of the speakers S1 and S2, the output signal y2a including only the voice of the speaker S3, and the speaker S4 And the output signal y3a including only the voice of. Therefore, the first sound source separation unit 210 can separate the output signals y1a to y3a that are independent from each other from the input signal, and separates the near-end acoustic signal and the far-end acoustic signal included in the detection signal m1. be able to. The same applies to the second sound source separation unit 220.

また、上述では、遠端側に話者Ｓ３およびＳ４の２名が存在するとしたが、これに限定されない。遠端側に存在する話者は１名であってもよいし、３名以上であってもよい。この場合については、上述した第１の実施形態と同様であるので、説明を省略する。 In the above description, there are two speakers S3 and S4 on the far end side, but the present invention is not limited to this. There may be one speaker on the far end side or three or more speakers. Since this case is the same as that of the first embodiment described above, a description thereof will be omitted.

また、上述では、変換部２３０〜２３５は、第１の音源分離部２１０および第２の音源分離部２２０に入力される信号それぞれに対して１つずつ設けられていた。しかしながら、図５に示すように、一部の変換部を共用してもよい。図５は、一部の変換部を共用した場合を示す図である。図５において、変換部２３３は、スピーカ入力信号ｓｐ２を周波数領域（ω）に変換し、第１の音源分離部２１０および第２の音源分離部２２０それぞれに出力する。変換部２３４は、スピーカ入力信号ｓｐ１を周波数領域（ω）に変換し、第１の音源分離部２１０および第２の音源分離部２２０それぞれに出力する。このように、図５では、第１の音源分離部２１０および第２の音源分離部２２０に対して、変換部２３３と２３４とを共用している。このように、変換部２３３と２３４を共用することで、マルチチャンネルエコーキャンセラ全体の処理量を削減することができる。 In the above description, one conversion unit 230 to 235 is provided for each signal input to the first sound source separation unit 210 and the second sound source separation unit 220. However, some conversion units may be shared as shown in FIG. FIG. 5 is a diagram illustrating a case where some conversion units are shared. In FIG. 5, the conversion unit 233 converts the speaker input signal sp2 into the frequency domain (ω) and outputs the converted signal to the first sound source separation unit 210 and the second sound source separation unit 220. The conversion unit 234 converts the speaker input signal sp1 into the frequency domain (ω) and outputs it to the first sound source separation unit 210 and the second sound source separation unit 220, respectively. As described above, in FIG. 5, the conversion units 233 and 234 are shared with the first sound source separation unit 210 and the second sound source separation unit 220. In this way, by using the conversion units 233 and 234 in common, the processing amount of the entire multichannel echo canceller can be reduced.

また、図３に示す音響システムでは、一例として、本実施形態に係るマルチチャンネルエコーキャンセラが近端側にのみ設けられているとしたが、遠端側にも設置してよいことは言うまでもない。 In the acoustic system shown in FIG. 3, as an example, the multi-channel echo canceller according to the present embodiment is provided only on the near end side, but it goes without saying that the multi-channel echo canceller may also be installed on the far end side.

（第３の実施形態）
上述した第１の音源分離部２１０および第２の音源分離部２２０では、分離行列を構成する全ての行列要素を更新する構成であった。これに対し、分離行列を構成する各行列要素のうち、一部の行列要素を拘束する（一部の行列要素を０にする）ようにしてもよい。以下、図６を参照して、第１の音源分離部２１０および第２の音源分離部２２０分離行列の一部の行列要素を拘束する場合を第３の実施形態として説明する。図６は、第１の音源分離部２１０に設定された分離行列の一部を拘束した第１の音源分離部２１０ａの構成を示す図である。 (Third embodiment)
The first sound source separation unit 210 and the second sound source separation unit 220 described above are configured to update all matrix elements constituting the separation matrix. On the other hand, among the matrix elements constituting the separation matrix, some matrix elements may be constrained (some matrix elements are set to 0). Hereinafter, with reference to FIG. 6, a case in which some matrix elements of the first sound source separation unit 210 and the second sound source separation unit 220 are separated will be described as a third embodiment. FIG. 6 is a diagram illustrating a configuration of the first sound source separation unit 210a in which a part of the separation matrix set in the first sound source separation unit 210 is constrained.

図６において、第１の音源分離部２１０ａは、拘束型分離部２１１ａおよび拘束型学習部２１２ａにより構成される。拘束型分離部２１１ａには、行列要素ｗｉｊ（行数ｉ、列数ｊは１〜３の整数）で構成される分離行列Ｗａ（３、３）が設定されている。初期状態では、例えば単位行列が分離行列Ｗａ（３、３）として設定されているとする。拘束型分離部２１１ａには、検出信号ｍ１（ω）とスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））とが入力される。拘束型分離部２１１ａは、設定された分離行列Ｗａ（３、３）に基づく式（１２）に従って、出力信号ｙ１ａ〜ｙ３ａをそれぞれ算出し、算出した出力信号ｙ１ａ〜ｙ３ａをそれぞれ出力する。具体的には、拘束型分離部２１１ａは、式（１２）に示すように、検出信号ｍ１（ω）およびスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））で構成される入力ベクトルと、設定された分離行列Ｗａ（３、３）とを乗算することにより、出力信号ｙ１ａ（ω）〜ｙ３ａ（ω）で構成される出力ベクトルを算出する。

式（１２）に示された分離行列Ｗａ（３、３）において、行列要素である（ｗ２１、ｗ３１）、（ｗ２３、ｗ３２）は０となり、拘束されている。拘束型分離部２１１ａから出力された出力信号ｙ１ａ（ω）〜ｙ３ａ（ω）のうち、近端音響信号である出力信号ｙ１ａ（ω）のみ、第１の音源分離部２１０ａから出力される。 In FIG. 6, the first sound source separation unit 210a includes a constraint type separation unit 211a and a constraint type learning unit 212a. In the constraining separation unit 211a, a separation matrix Wa (3, 3) configured by matrix elements wij (the number of rows i and the number of columns j is an integer of 1 to 3) is set. In the initial state, for example, it is assumed that the unit matrix is set as the separation matrix Wa (3, 3). A detection signal m1 (ω) and speaker input signals (sp1 (ω), sp2 (ω)) are input to the constraining separation unit 211a. The constraining separation unit 211a calculates the output signals y1a to y3a according to Expression (12) based on the set separation matrix Wa (3, 3), and outputs the calculated output signals y1a to y3a, respectively. Specifically, as shown in Expression (12), the constraining separation unit 211a includes an input vector composed of a detection signal m1 (ω) and speaker input signals (sp1 (ω), sp2 (ω)); By multiplying the set separation matrix Wa (3, 3), an output vector composed of the output signals y1a (ω) to y3a (ω) is calculated.

In the separation matrix Wa (3, 3) shown in the equation (12), the matrix elements (w21, w31), (w23, w32) are 0 and are constrained. Of the output signals y1a (ω) to y3a (ω) output from the constraining separation unit 211a, only the output signal y1a (ω) that is a near-end acoustic signal is output from the first sound source separation unit 210a.

拘束型学習部２１２ａは、出力信号ｙ１ａ（ω）〜ｙ３ａ（ω）を入力とし、独立成分分析を行って拘束型分離部２１１ａに設定された分離行列Ｗａ（３、３）を学習する。具体的には、拘束型学習部２１２ａは、式（１３）に従って、分離行列Ｗａ（３、３）を学習する。拘束型学習部２１２ａは、拘束型分離部２１１ａに設定された分離行列Ｗａ（３、３）を、学習した分離行列Ｗａ（３、３）に更新する。

式（１３）に従えば、拘束型学習部２１２ａは、分離行列Ｗａ（３、３）を構成する各行列要素のうち、非ゼロとなる行列要素のみを更新することとなる。このような分離行列Ｗａ（３、３）の学習によっても、入力された信号に含まれる近端音響信号と遠端音響信号とを分離することができる。 The constraint learning unit 212a receives the output signals y1a (ω) to y3a (ω) as input, and performs independent component analysis to learn the separation matrix Wa (3, 3) set in the constraint separation unit 211a. Specifically, the constraint learning unit 212a learns the separation matrix Wa (3, 3) according to the equation (13). The constraint learning unit 212a updates the separation matrix Wa (3, 3) set in the constraint separation unit 211a to the learned separation matrix Wa (3, 3).

According to Expression (13), the constraint learning unit 212a updates only non-zero matrix elements among the matrix elements constituting the separation matrix Wa (3, 3). The learning of the separation matrix Wa (3, 3) can also separate the near-end acoustic signal and the far-end acoustic signal included in the input signal.

以下、分離行列の一部の行列要素を拘束する目的、および、行列要素の一部を拘束しても近端音響信号と遠端音響信号とを分離することができる理由について説明する。まず、上述した第２の実施形態では、分離行列が一旦収束した状態において、遠端音源が移動した場合や、遠端音源の数が多い場合などについては、特に言及していなかった。しかしながら、遠端音源が移動した場合や、遠端音源の数が多い場合、実際の伝達特性（ａ３１（ω）やａ３２（ω）など）が変動する。これにより、第２の実施形態のように分離行列の全係数を更新する構成では、エコーをキャンセルするために用いられる行列要素（ｗ１２、ｗ１３）が学習を介して一時的に変動することがある。行列要素（ｗ１２、ｗ１３）が一時的に変動した場合、近端音響信号と遠端音響信号との分離が不完全になり、エコーのキャンセル効果が一時的に劣化する。 Hereinafter, the purpose of constraining a part of the matrix elements of the separation matrix and the reason why the near-end acoustic signal and the far-end acoustic signal can be separated even if the part of the matrix element is constrained will be described. First, in the above-described second embodiment, no particular mention has been made of the case where the far-end sound source moves or the number of far-end sound sources is large when the separation matrix has once converged. However, when the far-end sound source moves or when the number of far-end sound sources is large, the actual transfer characteristics (a31 (ω), a32 (ω), etc.) vary. Thereby, in the configuration in which all the coefficients of the separation matrix are updated as in the second embodiment, the matrix elements (w12, w13) used for canceling the echo may temporarily change through learning. . When the matrix elements (w12, w13) are temporarily changed, the separation between the near-end acoustic signal and the far-end acoustic signal is incomplete, and the echo cancellation effect is temporarily deteriorated.

また、行列要素（ｗ２１、ｗ３１）は、検出信号ｍ１を用いて、スピーカ入力信号（ｓｐ１、ｓｐ２）に含まれる話者Ｓ３の音声と話者Ｓ４の音声とを、出力信号ｙ２ａおよびｙ３ａとして分離するために用いられる行列要素である。ここで、遠端音響信号である出力信号ｙ２ａおよびｙ３ａは、第１の音源分離部２１０からは出力されず、学習部２１２の学習にのみ用いられる。このため、検出信号ｍ１には、遠端音響信号である出力信号ｙ２ａおよびｙ３ａの分離に寄与する信号は含まれることはない。したがって、分離行列の学習が収束した状態では、ｗ２１＝ｗ３１＝０となるはずである。しかしながら、第２の実施形態のように分離行列の全行列要素を更新する構成では、遠端音源が移動したりして実際の伝達特性（ａ３１（ω）やａ３２（ω）など）が変動すると、学習を介して行列要素（ｗ２１、ｗ３１）も一時的に変動する。行列要素（ｗ２１、ｗ３１）が一時的に変動すると、行列要素（ｗ１１）も次の学習を介して一時的に変動することになる。これにより、近端音響信号と遠端音響信号との分離が不完全になり、エコーのキャンセル効果が一時的に劣化する。 The matrix elements (w21, w31) use the detection signal m1 to separate the speech of the speaker S3 and the speech of the speaker S4 included in the speaker input signals (sp1, sp2) as output signals y2a and y3a. Is a matrix element used to Here, the output signals y2a and y3a, which are far-end acoustic signals, are not output from the first sound source separation unit 210 but are used only for learning by the learning unit 212. For this reason, the detection signal m1 does not include a signal that contributes to separation of the output signals y2a and y3a, which are far-end acoustic signals. Therefore, in the state where learning of the separation matrix has converged, w21 = w31 = 0 should be obtained. However, in the configuration in which all matrix elements of the separation matrix are updated as in the second embodiment, when the far-end sound source moves or the actual transfer characteristics (a31 (ω), a32 (ω), etc.) vary. The matrix elements (w21, w31) also temporarily change through learning. When the matrix elements (w21, w31) are temporarily changed, the matrix element (w11) is also temporarily changed through the next learning. As a result, the separation between the near-end acoustic signal and the far-end acoustic signal becomes incomplete, and the echo cancellation effect temporarily deteriorates.

このようなエコーのキャンセル効果の一時的な劣化を防ぐため、本実施形態では、分離行列の一部の行列要素を拘束している。 In order to prevent such a temporary deterioration of the echo cancellation effect, in this embodiment, some matrix elements of the separation matrix are constrained.

以下、遠端音源の状態によってエコーのキャンセル効果が一時的に劣化するという現象を、数式上で説明する。式（１４）は、式（１０）の更新式のうち、右辺の第１項を展開したものである。

Hereinafter, the phenomenon that the echo canceling effect is temporarily deteriorated depending on the state of the far-end sound source will be described using mathematical formulas. Expression (14) is an expansion of the first term on the right side of the update expression of Expression (10).

式（１４）の右辺の第１項は、分離行列Ｗの学習における更新量ΔＷを示している。行列要素ε｛φ（ｙ_i）ｙ_j＊｝_i≠_jは、出力信号ｙｉ、ｙｊが互いに独立になれば、ε｛φ（ｙ_i）ｙ_j＊｝_i≠_j≒０となる。また、分離行列Ｗの学習が収束している状態では、更新量ΔＷは０（ゼロ）近傍を振動する。つまり、更新量ΔＷの全行列要素が０となる。 The first term on the right side of Equation (14) indicates the update amount ΔW in learning of the separation matrix W. The matrix element ε {φ (y _i ) y _j *} _i ≠ _j becomes ε {φ (y _i ) y _j *} _i ≠ _j ≈ 0 if the output signals y _{i and} y _j become independent from each other. In the state where learning of the separation matrix W has converged, the update amount ΔW oscillates in the vicinity of 0 (zero). That is, all matrix elements of the update amount ΔW are zero.

ここで、分離行列Ｗの学習が収束してエコーキャンセルが良好に行われている状態から遠端側の伝達特性（ａ３１（ω）やａ３２（ω）など）が変動した場合を考える。この場合、収束した分離行列Ｗの行列要素（ｗ２２、ｗ２３、ｗ３２、ｗ３３）の推定値が、実際の伝達特性と一致しなくなる。これにより、遠端音響信号に関する出力信号ｙ２ａとｙ３ａとの分離が不完全になる。すなわち、出力信号ｙ２ａとｙ３ａとの間の独立性が低下し、互いに相関をもつようになる。式（１４）でいえば、ε｛φ（ｙ_2a）ｙ_3a＊｝、ε｛φ（ｙ_3a）ｙ_2a＊｝が値を持つことになる。特に、遠端音源の数が多い場合、遠端側の伝達特性が常に変化するので、ε｛φ（ｙ_2a）ｙ_3a＊｝、ε｛φ（ｙ_3a）ｙ_2a＊｝が常に値を持つことになる。なお、式（１４）の右辺の第１項の２行目と３行目の各行列要素には、ε｛φ（ｙ_2a）ｙ_3a＊｝、ε｛φ（ｙ_3a）ｙ_2a＊｝が含まれている。このため、ε｛φ（ｙ_2a）ｙ_3a＊｝、ε｛φ（ｙ_3a）ｙ_2a＊｝が変動するということは、式（１４）の右辺の第１項の２行目と３行目の各行列要素が変動することを意味する。 Here, let us consider a case where the far-end transfer characteristics (a31 (ω), a32 (ω), etc.) have changed from the state where learning of the separation matrix W has converged and echo cancellation has been performed satisfactorily. In this case, the estimated values of the matrix elements (w22, w23, w32, w33) of the converged separation matrix W do not match the actual transfer characteristics. Thereby, the separation of the output signals y2a and y3a relating to the far-end acoustic signal becomes incomplete. In other words, the independence between the output signals y2a and y3a is reduced, and they are correlated with each other. In the expression (14), ε {φ (y _2a ) y _3a *} and ε {φ (y _3a ) y _2a *} have values. In particular, when the number of far-end sound sources is large, the transfer characteristics on the far-end side always change, so ε {φ (y _2a ) y _3a *} and ε {φ (y _3a ) y _2a *} always have values. Will have. It should be noted that ε {φ (y _2a ) y _3a *}, ε {φ (y _3a ) y _2a *} are included in the matrix elements in the second and third rows of the first term on the right side of Equation (14). It is included. For this reason, the fact that ε {φ (y _2a ) y _3a *} and ε {φ (y _3a ) y _2a *} fluctuate means that the second and third rows of the first term on the right side of equation (14) It means that each matrix element of the eye fluctuates.

式（１４）の右辺の第１項の２行目と３行目の各行列要素が変動すると、その変動に基づいて学習された分離行列Ｗの２行目と３行目の各行列要素（ｗ２１〜ｗ２３、ｗ３１〜ｗ３３）も変動する。このうち、行列要素（ｗ２３、ｗ３２）の変動に基づいて次の学習がなされると、分離行列Ｗの行列要素（ｗ１２、ｗ１３）が変動することになる。また、行列要素（ｗ２１、ｗ３１）の変動に基づいて次の学習がなされると、分離行列Ｗの行列要素（ｗ１１）が変動することになる。このような分離行列Ｗの１行目の各行列要素の変動により、エコーのキャンセル効果が一時的に劣化してしまう。 When the matrix elements in the second and third rows of the first term on the right side of Equation (14) change, the matrix elements in the second and third rows of the separation matrix W learned based on the change ( w21-w23, w31-w33) also vary. Among these, when the next learning is performed based on the change of the matrix elements (w23, w32), the matrix elements (w12, w13) of the separation matrix W will change. In addition, when the next learning is performed based on the change of the matrix elements (w21, w31), the matrix element (w11) of the separation matrix W changes. Due to such a variation of each matrix element in the first row of the separation matrix W, the echo cancellation effect is temporarily deteriorated.

そこで、本実施形態では、分離行列の行列要素（ｗ３２、ｗ２３）と行列要素（ｗ２１、ｗ３１）をそれぞれ０に拘束する。これにより、遠端側の伝達特性が変動しても、学習を介した分離行列Ｗの１行目の各行列要素の変動を防ぐことができ、エコーのキャンセル効果の一時的な劣化を防ぐことができる。 Therefore, in this embodiment, the matrix elements (w32, w23) and the matrix elements (w21, w31) of the separation matrix are constrained to 0, respectively. As a result, even if the transfer characteristic on the far end side fluctuates, it is possible to prevent fluctuation of each matrix element in the first row of the separation matrix W through learning, and to prevent temporary deterioration of the echo cancellation effect. Can do.

次に、分離行列Ｗａのように、行列要素（ｗ３２、ｗ２３）と行列要素（ｗ２１、ｗ３１）をそれぞれ０に拘束した場合の学習式を考える。単純に式（１０）の学習式を式（１２）の分離行列Ｗａに適用させると、式（１５）のようになる。

式（１５）の右辺の第１項を一旦展開して再度整理すると、式（１３）と同じ式になる。つまり、式（１３）は、分離行列の一部の行列要素を拘束することによって得られる学習式であるといえる。 Next, let us consider a learning formula when the matrix elements (w32, w23) and the matrix elements (w21, w31) are constrained to 0 as in the separation matrix Wa. If the learning formula of Formula (10) is simply applied to the separation matrix Wa of Formula (12), Formula (15) is obtained.

If the first term on the right side of Expression (15) is once expanded and rearranged, the same expression as Expression (13) is obtained. That is, equation (13) can be said to be a learning equation obtained by constraining some matrix elements of the separation matrix.

式（１３）において、分離行列Ｗａの学習が収束したとき、更新量ΔＷは０行列になる。更新量ΔＷが０行列になるということは、｛φ（ｙ_1a）ｙ_2a＊｝＝｛φ（ｙ_2a）ｙ_1a＊｝＝｛φ（ｙ_1a）ｙ_3a＊｝＝｛φ（ｙ_3a）ｙ_1a＊｝＝０、および、１−ε｛φ（ｙ_1a）ｙ_1a＊｝＝１−ε｛φ（ｙ_2a）ｙ_2a＊｝＝１−ε｛φ（ｙ_3a）ｙ_3a＊｝＝０であることを意味する。このことから、分離行列Ｗａの学習が収束したとき、近端音響信号である出力信号ｙ１ａと遠端音響信号である出力信号ｙ２ａとの間、近端音響信号である出力信号ｙ１ａと遠端音響信号である出力信号ｙ３ａとの間が独立になることがわかる。つまり、式（１３）に基づく学習が収束したとき、近端音響信号と遠端音響信号とが分離されることがわかる。 In Expression (13), when the learning of the separation matrix Wa converges, the update amount ΔW becomes 0 matrix. The update amount ΔW becomes a zero matrix means that {φ (y _1a ) y _2a *} = {φ (y _2a ) y _1a *} = {φ (y _1a ) y _3a *} = {φ (y _3a ) Y _1a *} = 0 and 1−ε {φ (y _1a ) y _1a *} = 1−ε {φ (y _2a ) y _2a *} = 1−ε {φ (y _3a ) y _3a * } = 0. Therefore, when the learning of the separation matrix Wa converges, the output signal y1a that is the near-end acoustic signal and the output signal y1a that is the near-end acoustic signal and the output signal y1a that is the near-end acoustic signal and the far-end acoustic signal It can be seen that the output signal y3a which is a signal becomes independent. That is, it can be seen that when the learning based on Expression (13) converges, the near-end acoustic signal and the far-end acoustic signal are separated.

なお、更新量ΔＷが０行列になることに関し、｛φ（ｙ_2a）ｙ_3a＊｝、｛φ（ｙ_3a）ｙ_2a＊｝の値は関係ない。このため、学習の収束時において、遠端音響信号である出力信号ｙ２ａと出力信号ｙ３ａの間の独立性はあってもなくてもどちらでもよいということになる。つまり、遠端音響信号については、必ずしも、互いに独立した出力信号ｙ２ａと出力信号ｙ３ａとが出力されないことになる。 Note that the values of {φ (y _2a ) y _3a *} and {φ (y _3a ) y _2a *} are irrelevant for the update amount ΔW to be a zero matrix. For this reason, at the time of convergence of learning, the output signal y2a and the output signal y3a, which are far-end acoustic signals, may or may not be independent. That is, for the far-end acoustic signal, the output signal y2a and the output signal y3a that are independent from each other are not necessarily output.

以上のように、分離行列の一部の行列要素を拘束することにより、遠端側の伝達特性が変動しても、学習を介した分離行列Ｗの１行目の各行列要素の変動を防ぐことができ、エコーのキャンセル効果の一時的な劣化を防ぐことができる。また、分離行列の一部の行列要素を拘束した場合、拘束していない場合と比べて演算量を削減することができる。 As described above, by constraining some matrix elements of the separation matrix, even if the transfer characteristic on the far end side varies, the variation of each matrix element in the first row of the separation matrix W through learning is prevented. And temporary deterioration of the echo canceling effect can be prevented. In addition, when a part of the matrix elements of the separation matrix is constrained, the amount of calculation can be reduced as compared with the case where it is not constrained.

なお、本実施形態における学習式として式（１３）を用いるとしたが、式（１６）を用いてもよい。式（１６）を用いても、近端音響信号と遠端音響信号とを分離することができる。

In addition, although Formula (13) was used as a learning formula in this embodiment, Formula (16) may be used. Even using Equation (16), the near-end acoustic signal and the far-end acoustic signal can be separated.

また、本実施形態では、上述した第１の音源分離部２１０および第２の音源分離部２２０に設定された分離行列に関して説明したが、これに限定されない。上述した音源分離部１００に設定された分離行列Ｗ（４、４）の一部の係数を拘束しても、本実施形態と同様の効果が得られる。以下、行数および列数が異なる分離行列に対して共通の拘束条件を式（１７）〜式（１９）に示す。なお、分離行列の行数及び列数は、それぞれ（Ｍ＋Ｋ）であるとする。Ｍは、音源分離部に検出信号を入力する近端側のマイクロホンの数を示し、Ｋは、音源分離部にスピーカ入力信号を入力する近端側のスピーカの数を示している。また、分離行列に乗じられる入力ベクトルは、近端側のマイクロホンの検出信号が分離行列の１行目からＭ行目までに対応するように、スピーカ入力信号が分離行列のＭ＋１行目からＭ＋Ｋ行目までに対応するように構成されるとする。また、式（１７）〜式（１９）において、ｉ（ｉ＝１〜Ｍ＋Ｋ）は行数を示し、ｊ（ｊ＝１〜Ｍ＋Ｋ）は列数を示している。 In the present embodiment, the separation matrix set in the first sound source separation unit 210 and the second sound source separation unit 220 described above has been described. However, the present invention is not limited to this. Even if a part of the coefficients of the separation matrix W (4, 4) set in the sound source separation unit 100 described above is constrained, the same effect as in the present embodiment can be obtained. Hereinafter, common constraint conditions for separation matrices having different numbers of rows and columns are shown in Equations (17) to (19). Note that the number of rows and the number of columns of the separation matrix are each (M + K). M indicates the number of near-end microphones that input detection signals to the sound source separation unit, and K indicates the number of near-end speakers that input speaker input signals to the sound source separation unit. Further, the input vector multiplied by the separation matrix is such that the speaker input signal is from the (M + 1) th row to the (M + K) th row of the separation matrix so that the detection signal of the near-end microphone corresponds to the first row to the Mth row of the separation matrix. Suppose that it is configured to respond to the eyes. In the equations (17) to (19), i (i = 1 to M + K) represents the number of rows, and j (j = 1 to M + K) represents the number of columns.

式（１７）は、遠端側の話者（Ｓ３およびＳ４）から遠端側のマイクロホン（３１および４１）までの各伝達特性に関する行列要素（３×３行列ではｗ２２、ｗ２３、ｗ３２、ｗ３３）についての拘束条件を示す式である。

式（１７）に示す拘束条件は、遠端音響信号について互いに独立な複数の出力信号にさらに分離させる必要がない場合に有効である。この場合であっても、近端音響信号と遠端音響信号とを分離することはできる。 Expression (17) is a matrix element (w22, w23, w32, w33 in the 3 × 3 matrix) regarding each transfer characteristic from the far-end speaker (S3 and S4) to the far-end microphone (31 and 41). It is a formula which shows the constraint conditions about.

The constraint condition shown in Expression (17) is effective when it is not necessary to further separate the far-end acoustic signal into a plurality of output signals independent of each other. Even in this case, the near-end acoustic signal and the far-end acoustic signal can be separated.

式（１８）は、近端側のマイクロホンの検出信号（ｍ１、ｍ２など）を用いて、スピーカ入力信号（ｓｐ１、ｓｐ２）に含まれる遠端側の話者の音声（Ｓ３およびＳ４）を、各出力信号（ｙ３、ｙ４など）として分離するために用いられる行列要素（３×３行列ではｗ２１、ｗ３１）についての拘束条件を示す式である。

式（１８）に示す拘束条件は、システム処理上、音源分離処理とは関係のない行列要素に関する条件であるため、式（１８）に示す拘束条件を適用しても、近端音響信号と遠端音響信号とを分離することはできる。 Equation (18) uses the near-end microphone detection signals (m1, m2, etc.) to calculate the far-end speaker's speech (S3 and S4) contained in the speaker input signals (sp1, sp2). It is an expression showing the constraint conditions for the matrix elements (w21, w31 in the 3 × 3 matrix) used to separate each output signal (y3, y4, etc.).

Since the constraint condition shown in Expression (18) is a condition related to matrix elements that are not related to the sound source separation process in system processing, even if the constraint condition shown in Expression (18) is applied, The edge acoustic signal can be separated.

式（１９）は、近端側の話者（Ｓ１およびＳ２など）から近端側のマイクロホン（１１および２１）までの各伝達特性に関する行列要素（３×３行列ではｗ１１）についての拘束条件を示す式である。

式（１９）に示す拘束条件は、近端音響信号について互いに独立な複数の出力信号にさらに分離させる必要がない場合に有効である。この場合であっても、近端音響信号と遠端音響信号とを分離することはできる。 Equation (19) expresses the constraint on the matrix elements (w11 in the 3 × 3 matrix) for each transfer characteristic from the near-end speakers (such as S1 and S2) to the near-end microphones (11 and 21). It is a formula which shows.

The constraint condition shown in Expression (19) is effective when it is not necessary to further separate the near-end acoustic signal into a plurality of output signals independent from each other. Even in this case, the near-end acoustic signal and the far-end acoustic signal can be separated.

（第４の実施形態）
第４の実施形態において、上述した音源分離部１００における音源分離処理をコンピュータプログラムを用いてコンピュータシステム上で実現する場合について説明する。コンピュータシステムは、マイクロプロセッサ、ＲＯＭ、ＲＡＭなどで構成される。ＲＡＭには、コンピュータプログラムが記憶されている。マイクロプロセッサが、コンピュータプログラムにしたがって動作することにより、音源分離部１００の音源分離処理が実現される。なお、コンピュータプログラムは、音源分離部１００の音源分離処理を実現するために、コンピュータシステムに対する指令を示す命令コードが複数個組み合わされて構成されたものである。また、第１の音源分離部２１０や第２の音源分離部２２０の音源分離処理を、コンピュータプログラムを用いてコンピュータシステム上で実現するようにしてもよい。 (Fourth embodiment)
In the fourth embodiment, a case will be described in which the sound source separation processing in the sound source separation unit 100 described above is realized on a computer system using a computer program. The computer system includes a microprocessor, ROM, RAM, and the like. A computer program is stored in the RAM. The sound source separation process of the sound source separation unit 100 is realized by the microprocessor operating according to the computer program. Note that the computer program is configured by combining a plurality of instruction codes indicating commands to the computer system in order to realize the sound source separation processing of the sound source separation unit 100. Further, the sound source separation processing of the first sound source separation unit 210 and the second sound source separation unit 220 may be realized on a computer system using a computer program.

図７を参照して、音源分離部１００の音源分離処理を実現するプログラム処理フローについて説明する。図７は、音源分離部１００の音源分離処理を実現するプログラム処理フローを示す図である。図７において、音源分離部１００の分離部１０１に初期行列として例えば単位行列が設定される（ステップＳ１）。ステップＳ１の次に、分離部１０１は、検出信号（ｍ１（ω）、ｍ２（ω））とスピーカ入力信号（ｓｐ１（ω）、ｓｐ２（ω））とを入力ベクトルとして入力する（ステップＳ２）。ステップＳ２の次に、分離部１０１は、現在設定された分離行列Ｗに基づく式（５）に従って、出力ベクトルである出力信号ｙ１〜ｙ４をそれぞれ算出する（ステップＳ３）。ステップＳ３の次に、学習部１０２は、ステップＳ３で算出された出力信号ｙ１〜ｙ４に基づいて、式（６）に従って分離行列Ｗを学習する（ステップＳ４）。具体的には、学習部１０２は、出力信号ｙ１〜ｙ４間の高次の相関（例えば、｛φ（ｙ₃）ｙ₂＊｝など）をそれぞれ算出することにより、高次の相関行列を算出する。そして、学習部１０２は、算出した高次の相関行列を用いて、更新すべき分離行列を学習する。ステップＳ４の次に、学習部１０２は、分離部１０１に現在設定された分離行列Ｗを、ステップＳ４で学習した分離行列Ｗに更新する（ステップＳ５）。ステップＳ５の次に、学習部１０２は、更新をＮ（１以上の整数）回行ったか否かを判断する（ステップＳ６）。更新がＮ回行われていない場合（ステップＳ６でＮｏ）、処理はステップＳ２に戻る。更新がＮ回行われた場合（ステップＳ６でＹｅｓ）、分離部１０１は、更新された分離行列Ｗに基づく式（５）に従って、出力ベクトルである出力信号ｙ１〜ｙ４をそれぞれ算出する（ステップＳ７）。このステップＳ７により、検出信号（ｍ１（ω）、ｍ２（ω））に含まれる近端音響信号と遠端音響信号とが分離される。ステップＳ７の次に、分離部１０１は、近端音響信号である出力信号ｙ１およびｙ２のみを出力する（ステッＳ８）。このステップＳ８により、検出信号（ｍ１（ω）、ｍ２（ω））に含まれる遠端音響信号がエコーとしてキャンセルされる。 A program processing flow for realizing the sound source separation processing of the sound source separation unit 100 will be described with reference to FIG. FIG. 7 is a diagram showing a program processing flow for realizing the sound source separation processing of the sound source separation unit 100. In FIG. 7, for example, a unit matrix is set as an initial matrix in the separation unit 101 of the sound source separation unit 100 (step S1). After step S1, the separation unit 101 inputs the detection signals (m1 (ω), m2 (ω)) and the speaker input signals (sp1 (ω), sp2 (ω)) as input vectors (step S2). . Following step S2, the separation unit 101 calculates output signals y1 to y4, which are output vectors, according to Expression (5) based on the currently set separation matrix W (step S3). After step S3, the learning unit 102 learns the separation matrix W according to the equation (6) based on the output signals y1 to y4 calculated in step S3 (step S4). Specifically, the learning unit 102 calculates a high-order correlation matrix by calculating high-order correlations (for example, {φ (y ₃ ) y ₂ *}, etc.) between the output signals y1 to y4. To do. Then, the learning unit 102 learns the separation matrix to be updated using the calculated higher-order correlation matrix. After step S4, the learning unit 102 updates the separation matrix W currently set in the separation unit 101 to the separation matrix W learned in step S4 (step S5). After step S5, the learning unit 102 determines whether or not the update has been performed N (an integer equal to or greater than 1) times (step S6). If the update has not been performed N times (No in step S6), the process returns to step S2. When the update has been performed N times (Yes in step S6), the separation unit 101 calculates the output signals y1 to y4, which are output vectors, according to Expression (5) based on the updated separation matrix W (step S7). ). By this step S7, the near-end acoustic signal and the far-end acoustic signal included in the detection signals (m1 (ω), m2 (ω)) are separated. After step S7, the separation unit 101 outputs only the output signals y1 and y2 that are near-end acoustic signals (step S8). By this step S8, the far-end acoustic signal included in the detection signals (m1 (ω), m2 (ω)) is canceled as an echo.

図７に示すプログラム処理を行うことにより、上述した音源分離部１００における音源分離処理をコンピュータシステム上で実現することができる。 By performing the program processing shown in FIG. 7, the sound source separation process in the sound source separation unit 100 described above can be realized on a computer system.

なお、行列要素を一部拘束した第１の音源分離部２１０ａなどについても、コンピュータプログラムを用いてコンピュータシステム上で実現することができる。この場合のプログラム処理は、図７に示すプログラム処理に対し、式（１７）〜式（１９）に示した所定の行列要素については、０に拘束しながら処理をする点で異なる。つまり、ステップＳ４およびＳ５において、学習部１０２は、所定の行列要素以外の行列要素のみ処理を行い、所定の行列要素については０で拘束しながら処理を行う。 Note that the first sound source separation unit 210a and the like in which the matrix elements are partly constrained can also be realized on a computer system using a computer program. The program processing in this case differs from the program processing shown in FIG. 7 in that the predetermined matrix elements shown in the equations (17) to (19) are processed while being restricted to 0. That is, in steps S4 and S5, the learning unit 102 processes only matrix elements other than predetermined matrix elements, and performs processing while restricting predetermined matrix elements to 0.

（その他変形例）
なお、上述した第１〜３の実施形態において本発明に係るマルチチャンネルエコーキャンセラを説明してきたが、本発明に係るエコーキャンセラは、上述した第１〜３の実施形態で説明した内容に限定されない。本発明に係るマルチチャンネルエコーキャンセラは、以下のような形態であってもよい。 (Other variations)
Although the multichannel echo canceller according to the present invention has been described in the first to third embodiments described above, the echo canceller according to the present invention is not limited to the contents described in the first to third embodiments. . The multi-channel echo canceller according to the present invention may have the following form.

（１）上述した第１〜３の実施形態に係るマルチチャンネルエコーキャンセラを構成する構成要素の一部または全部は、１個のシステムＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ：大規模集積回路）で構成されてもよい。なお、システムＬＳＩは、複数の構成要素を１個のチップ上に集積して製造され得る超多機能ＬＳＩである。システムＬＳＩにおいて、例えばマイクロプロセッサ、ＲＯＭ、ＲＡＭなどを含んで構成されるコンピュータシステムを実現することもできる。ＲＡＭには、コンピュータプログラムが記憶されている。マイクロプロセッサが、コンピュータプログラムにしたがって動作することにより、システムＬＳＩは、コンピュータシステムとしての機能を実現する。 (1) Some or all of the constituent elements of the multichannel echo canceller according to the first to third embodiments described above may be configured by one system LSI (Large Scale Integration). Good. The system LSI is a super multifunctional LSI that can be manufactured by integrating a plurality of components on a single chip. In the system LSI, for example, a computer system including a microprocessor, a ROM, a RAM, and the like can be realized. A computer program is stored in the RAM. As the microprocessor operates in accordance with the computer program, the system LSI realizes a function as a computer system.

（２）上述した第１〜３の実施形態に係るマルチチャンネルエコーキャンセラを構成する構成要素の一部または全部は、マルチチャンネルエコーキャンセラに脱着可能なＩＣカード、または単体のモジュールで構成されていてもよい。なお、ＩＣカードまたはモジュールは、マイクロプロセッサ、ＲＯＭ、ＲＡＭなどから構成されるコンピュータシステムを実現することもできる。ＲＡＭには、コンピュータプログラムが記憶されている。マイクロプロセッサが、コンピュータプログラムにしたがって動作することにより、ＩＣカードまたはモジュールは、コンピュータシステムとしての機能を実現する。また、ＩＣカードまたはモジュールは、上記（１）の超多機能ＬＳＩを含むとしてもよい。また、ＩＣカードまたはモジュールは、耐タンパ性を有するとしてもよい。 (2) Part or all of the components constituting the multichannel echo canceller according to the first to third embodiments described above are configured by an IC card that can be attached to and detached from the multichannel echo canceller, or a single module. Also good. Note that the IC card or module can also realize a computer system including a microprocessor, ROM, RAM, and the like. A computer program is stored in the RAM. The IC card or the module realizes a function as a computer system by the microprocessor operating according to the computer program. Further, the IC card or the module may include the super multifunctional LSI of the above (1). Further, the IC card or the module may have tamper resistance.

（３）本発明は、上述した第１〜第３の実施形態に基づくマルチチャンネルエコーキャンセル方法であってもよい。また、本発明は、マルチチャンネルエコーキャンセル方法をコンピュータ上で実現させるためのコンピュータプログラムであってもよいし、当該コンピュータプログラムからなるデジタル信号であってもよい。また、本発明は、上記コンピュータプログラムまたはデジタル信号を、コンピュータ読み取り可能な記録媒体（例えば、フレキシブルディスク、ハードディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、ＤＶＤ−ＲＯＭ、ＤＶＤ−ＲＡＭ、ＢＤ（Ｂｌｕ−ｒａｙＤｉｓｃ）、半導体メモリなど）に記録したものとしてもよい。また、本発明は、上記コンピュータプログラムまたはデジタル信号を、電気通信回線（無線通信回線、有線通信回線、インターネットを代表とするネットワーク回線、データ放送回線など）を経由して伝送されるものであってもよい。また、本発明は、マイクロプロセッサとメモリを備えたコンピュータシステム上で実現されるものであって、メモリに記憶されたコンピュータプログラムにしたがってマイクロプロセッサが動作することで実現されてもよい。また、本発明は、上記コンピュータプログラムまたはデジタル信号を記録媒体に記録して移送することにより（または、ネットワーク等を経由して移送することにより）、独立した他のコンピュータシステム上で実現されてもよい。 (3) The present invention may be a multi-channel echo cancellation method based on the first to third embodiments described above. The present invention may be a computer program for realizing the multi-channel echo cancellation method on a computer, or may be a digital signal composed of the computer program. The present invention also provides a computer-readable recording medium (for example, a flexible disk, a hard disk, a CD-ROM, an MO, a DVD, a DVD-ROM, a DVD-RAM, a BD (Blu-ray Disc). ), A semiconductor memory, etc.). In the present invention, the computer program or the digital signal is transmitted via an electric communication line (wireless communication line, wired communication line, network line represented by the Internet, data broadcasting line, etc.). Also good. The present invention is realized on a computer system including a microprocessor and a memory, and may be realized by the microprocessor operating in accordance with a computer program stored in the memory. Further, the present invention may be realized on another independent computer system by recording the computer program or the digital signal on a recording medium and transferring it (or by transferring it via a network or the like). Good.

（４）上述した第１〜第３の実施形態と上述した（１）〜（３）の変形例とを適宜組み合わせてもよい。 (4) You may combine suitably the 1st-3rd embodiment mentioned above and the modification of (1)-(3) mentioned above.

本発明に係るマルチチャンネルエコーキャンセラは、マルチチャンネル再生時において音質劣化が生じることなく常に安定したエコーキャンセルを行うことができ、ダブルトーク時やシングルトーク時に関係なく安定したエコーキャンセルを行うことを可能にするものであり、会議システムやハンズフリー電話の他、ガイドアナウンス再生時や音楽再生時における音声認識システム等にも適用される。 The multi-channel echo canceller according to the present invention can always perform stable echo cancellation without deterioration of sound quality during multi-channel playback, and can perform stable echo cancellation regardless of double talk or single talk. In addition to a conference system and a hands-free phone, it is applied to a voice recognition system at the time of guide announcement playback and music playback.

音響システムに用いられる第１の実施形態に係るマルチチャンネルエコーキャンセラの構成例を示す図The figure which shows the structural example of the multichannel echo canceller which concerns on 1st Embodiment used for an acoustic system. 音源分離部１００の詳細な構成を示す図The figure which shows the detailed structure of the sound source separation part 100. 音響システムに用いられる第２の実施形態に係るマルチチャンネルエコーキャンセラの構成例を示す図The figure which shows the structural example of the multichannel echo canceller which concerns on 2nd Embodiment used for an acoustic system. 第１の音源分離部２１０の詳細な構成を示す図The figure which shows the detailed structure of the 1st sound source separation part 210. 一部の変換部を共用した場合を示す図Figure showing a case where some conversion units are shared 第１の音源分離部２１０に設定された分離行列の一部を拘束した第１の音源分離部２１０ａの構成を示す図The figure which shows the structure of the 1st sound source separation part 210a which restrained a part of separation matrix set to the 1st sound source separation part 210. FIG. 音源分離部１００の音源分離処理を実現するプログラム処理フローを示す図The figure which shows the program processing flow which implement | achieves the sound source separation process of the sound source separation part 100 従来の適応フィルタを用いたマルチチャンネルエコーキャンセラ９の構成を示す図The figure which shows the structure of the multichannel echo canceller 9 using the conventional adaptive filter.

符号の説明Explanation of symbols

１、２エコーキャンセル部
１０、２０、３０、４０スピーカ
１１、２１、３１、４１マイクロホン
１００音源分離部
１０１、２１１、２１１ａ分離部
１０２、２１２、２１２ａ学習部
１１０〜１１３、２３０〜２３５変換部
１２０、１２１、２４０、２４１逆変換部
２１０、２１０ａ第１の音源分離部
２２０第２の音源分離部 DESCRIPTION OF SYMBOLS 1, 2, Echo cancellation part 10, 20, 30, 40 Speaker 11, 21, 31, 41 Microphone 100 Sound source separation part 101, 211, 211a Separation part 102, 212, 212a Learning part 110-113, 230-235 Conversion part 120 , 121, 240, 241 Inverse conversion unit 210, 210a First sound source separation unit 220 Second sound source separation unit

Claims

第１の場所と第２の場所との間で相互に音響信号を伝送し、当該第１の場所に存在する１つ以上の音源から発せられる第１の音響信号は前記第１の場所に設けられる複数のマイクロホンで検出され、当該第２の場所に存在する１つ以上の音源から発せられる第２の音響信号は前記第２の場所に設けられる複数のマイクロホンで検出され、前記第１の場所から前記第２の場所へ伝送された音響信号は前記第２の場所に設けられた複数のスピーカから拡声され、前記第２の場所から前記第１の場所へ伝送された音響信号は前記第１の場所に設けられた複数のスピーカから拡声される音響システムに用いられ、前記第１の場所に設けられるマルチチャンネルエコーキャンセラであって、
前記第２の場所から前記第１の場所へ伝送されたスピーカ入力信号が前記第１の場所に設けられた複数のスピーカで拡声された音響信号と、前記第１の音響信号とを検出する、前記第１の場所に設けられた複数のマイクロホンの検出信号と、前記スピーカ入力信号とを取得する取得部と、
各前記検出信号に含まれる第１の音響信号と、各前記検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号とを分離するための分離行列であって、前記第１の場所に設けられた複数のスピーカから当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第１の部分行列と、
前記第２の場所に存在する１以上の音源から当該第２の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第２の部分行列と、
各前記検出信号から当該検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号を分離する複数の行列要素で構成される第３の部分行列と、
前記第１の場所に存在する１以上の音源から当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第４の部分行列と
を含む分離行列に対して、
各前記検出信号および各前記スピーカ入力信号により構成される入力ベクトルを前記分離行列に対して乗算することにより、各前記検出信号に含まれる第１の音響信号と第２の音響信号とを分離し、当該分離した第１の音響信号を出力するエコーキャンセル部と
を備え、
前記分離行列は、
前記第２の部分行列を、対角以外の構成要素が０である対角行列として独立成分分析に従って学習される
ことを特徴するマルチチャンネルエコーキャンセラ。 An acoustic signal is transmitted between the first location and the second location, and the first acoustic signal emitted from one or more sound sources existing at the first location is provided at the first location. It is detected by a plurality of microphones for a second acoustic signal generated from one or more sound sources present in the second location is detected by a plurality of microphones provided in the second location, the first location acoustic signal transmitted to the second location from a loudspeaker of a plurality of speakers provided in the second location, an acoustic signal transmitted to said first location from the second location the first used in acoustic systems loudspeaker from a plurality of speakers provided in the first place, a multi-channel echo canceller that provided in the first location,
Detecting an acoustic signal in which a speaker input signal transmitted from the second location to the first location is amplified by a plurality of speakers provided in the first location, and the first acoustic signal ; An acquisition unit for acquiring detection signals of a plurality of microphones provided in the first location and the speaker input signal;
A separation matrix for separating a first acoustic signal included in each of the detection signals and a second acoustic signal included in the amplified acoustic signal included in each of the detection signals, A first submatrix composed of a plurality of matrix elements relating to respective transfer characteristics from a plurality of speakers provided at one place to a plurality of microphones provided at the first place;
A second sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the second location to a plurality of microphones provided at the second location;
A third sub-matrix composed of a plurality of matrix elements for separating the second acoustic signal included in the amplified acoustic signal included in the detection signal from each of the detection signals;
A fourth sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the first location to a plurality of microphones provided at the first location;
For a separation matrix containing
The first acoustic signal and the second acoustic signal included in each detection signal are separated by multiplying the separation matrix by an input vector constituted by each detection signal and each speaker input signal. An echo canceling unit for outputting the separated first acoustic signal;
With
The separation matrix is
The multi-channel echo canceller, wherein the second sub-matrix is learned according to independent component analysis as a diagonal matrix in which components other than the diagonal are zero .

第１の場所と第２の場所との間で相互に音響信号を伝送し、当該第１の場所に存在する１つ以上の音源から発せられる第１の音響信号は前記第１の場所に設けられる複数のマイクロホンで検出され、当該第２の場所に存在する１つ以上の音源から発せられる第２の音響信号は前記第２の場所に設けられる複数のマイクロホンで検出され、前記第１の場所から前記第２の場所へ伝送された音響信号は前記第２の場所に設けられた複数のスピーカから拡声され、前記第２の場所から前記第１の場所へ伝送された音響信号は前記第１の場所に設けられた複数のスピーカから拡声される音響システムに用いられ、前記第１の場所に設けられるマルチチャンネルエコーキャンセラであって、An acoustic signal is transmitted between the first location and the second location, and the first acoustic signal emitted from one or more sound sources existing at the first location is provided at the first location. The second acoustic signal detected by the plurality of microphones and emitted from one or more sound sources existing at the second location is detected by the plurality of microphones provided at the second location, and the first location The acoustic signal transmitted from the second location to the second location is amplified from a plurality of speakers provided at the second location, and the acoustic signal transmitted from the second location to the first location is the first location. A multi-channel echo canceller provided in the first location, which is used in an acoustic system that is loudspeaked from a plurality of speakers provided in the location;
前記第２の場所から前記第１の場所へ伝送されたスピーカ入力信号が前記第１の場所に設けられた複数のスピーカで拡声された音響信号と、前記第１の音響信号とを検出する、前記第１の場所に設けられた複数のマイクロホンと、Detecting an acoustic signal in which a speaker input signal transmitted from the second location to the first location is amplified by a plurality of speakers provided in the first location, and the first acoustic signal; A plurality of microphones provided in the first location;
前記第１の場所に設けられた複数のマイクロホンで検出された検出信号と、前記スピーカ入力信号とを取得する取得部と、An acquisition unit for acquiring detection signals detected by a plurality of microphones provided in the first place, and the speaker input signal;
各前記検出信号に含まれる第１の音響信号と、各前記検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号とを分離するための分離行列であって、A separation matrix for separating a first acoustic signal included in each of the detection signals and a second acoustic signal included in the amplified acoustic signal included in each of the detection signals,
前記第１の場所に設けられた複数のスピーカから当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成さる第１の部分行列と、A first submatrix composed of a plurality of matrix elements relating to respective transfer characteristics from a plurality of speakers provided at the first location to a plurality of microphones provided at the first location;
前記第２の場所に存在する１以上の音源から当該第２の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第２の部分行列と、A second sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the second location to a plurality of microphones provided at the second location;
各前記検出信号から当該検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号を分離する複数の行列要素で構成される第３の部分行列と、A third sub-matrix composed of a plurality of matrix elements for separating the second acoustic signal included in the amplified acoustic signal included in the detection signal from each of the detection signals;
前記第１の場所に存在する１以上の音源から当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第４の部分行列とA fourth sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the first location to a plurality of microphones provided at the first location;
を含む分離行列に対して、For a separation matrix containing
各前記検出信号および各前記スピーカ入力信号により構成される入力ベクトルを前記分離行列に対して乗算することにより、各前記検出信号に含まれる第１の音響信号と第２の音響信号とを分離するエコーキャンセル部と、The first acoustic signal and the second acoustic signal included in each detection signal are separated by multiplying the separation matrix by an input vector constituted by each detection signal and each speaker input signal. An echo canceling unit;
前記分離した第１の音響信号を出力する出力部とAn output unit for outputting the separated first acoustic signal;
を備え、With
前記分離行列は、前記第２の部分行列を、対角以外の構成要素が０である対角行列として独立成分分析に従って学習されることを特徴するマルチチャンネルエコーキャンセラ。The multi-channel echo canceller is characterized in that the separation matrix is learned according to an independent component analysis, with the second sub-matrix being a diagonal matrix whose components other than the diagonal are zero.

第１の場所と第２の場所との間で相互に音響信号を伝送し、当該第１の場所に存在する１つ以上の音源から発せられる第１の音響信号は前記第１の場所に設けられる複数のマイクロホンで検出され、当該第２の場所に存在する１つ以上の音源から発せられる第２の音響信号は前記第２の場所に設けられる複数のマイクロホンで検出され、前記第１の場所から前記第２の場所へ伝送された音響信号は前記第２の場所に設けられた複数のスピーカから拡声され、前記第２の場所から前記第１の場所へ伝送された音響信号は前記第１の場所に設けられた複数のスピーカから拡声される音響システムに用いられ、前記第１の場所に設けられるマルチチャンネルエコーキャンセラであって、An acoustic signal is transmitted between the first location and the second location, and the first acoustic signal emitted from one or more sound sources existing at the first location is provided at the first location. The second acoustic signal detected by the plurality of microphones and emitted from one or more sound sources existing at the second location is detected by the plurality of microphones provided at the second location, and the first location The acoustic signal transmitted from the second location to the second location is amplified from a plurality of speakers provided at the second location, and the acoustic signal transmitted from the second location to the first location is the first location. A multi-channel echo canceller provided in the first location, which is used in an acoustic system that is loudspeaked from a plurality of speakers provided in the location;
前記第２の場所から前記第１の場所へ伝送されたスピーカ入力信号を拡声する、前記第１の場所に設けられた複数のスピーカと、A plurality of speakers provided at the first location for amplifying speaker input signals transmitted from the second location to the first location;
前記第１の場所に設けられた複数のスピーカで拡声された音響信号と、前記第１の音響信号とを検出する、前記第１の場所に設けられた複数のマイクロホンと、A plurality of microphones provided at the first location for detecting an acoustic signal amplified by a plurality of speakers provided at the first location and the first acoustic signal;
前記第１の場所に設けられた複数のマイクロホンで検出された検出信号と、前記スピーカ入力信号とを取得する取得部と、An acquisition unit for acquiring detection signals detected by a plurality of microphones provided in the first place, and the speaker input signal;
各前記検出信号に含まれる第１の音響信号と、各前記検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号とを分離するための分離行列であって、A separation matrix for separating a first acoustic signal included in each of the detection signals and a second acoustic signal included in the amplified acoustic signal included in each of the detection signals,
前記第１の場所に設けられた複数のスピーカから当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成さる第１の部分行列と、A first submatrix composed of a plurality of matrix elements relating to respective transfer characteristics from a plurality of speakers provided at the first location to a plurality of microphones provided at the first location;
前記第２の場所に存在する１以上の音源から当該第２の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第２の部分行列と、A second sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the second location to a plurality of microphones provided at the second location;
各前記検出信号から当該検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号を分離する複数の行列要素で構成される第３の部分行列と、A third sub-matrix composed of a plurality of matrix elements for separating the second acoustic signal included in the amplified acoustic signal included in the detection signal from each of the detection signals;
前記第１の場所に存在する１以上の音源から当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第４の部分行列とA fourth sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the first location to a plurality of microphones provided at the first location;
を含む分離行列に対して、For a separation matrix containing
各前記検出信号および各前記スピーカ入力信号により構成される入力ベクトルを前記分離行列に対して乗算することにより、各前記検出信号に含まれる第１の音響信号と第２の音響信号とを分離するエコーキャンセル部と、The first acoustic signal and the second acoustic signal included in each detection signal are separated by multiplying the separation matrix by an input vector constituted by each detection signal and each speaker input signal. An echo canceling unit;
前記分離した第１の音響信号を出力する出力部とAn output unit for outputting the separated first acoustic signal;
を備え、With
前記分離行列は、前記第２の部分行列を、対角以外の構成要素が０である対角行列として独立成分分析に従って学習されることを特徴するマルチチャンネルエコーキャンセラ。The multi-channel echo canceller is characterized in that the separation matrix is learned according to an independent component analysis, with the second sub-matrix being a diagonal matrix whose components other than the diagonal are zero.

前記分離行列は、
前記第３の部分行列を、各構成要素が全て０であるゼロ行列として独立成分分析に従って学習されることを特徴する、請求項１から請求項３までのいずれか１項に記載のマルチチャンネルエコーキャンセラ。 The separation matrix is
The multi-channel echo according to any one of claims 1 to 3, wherein the third sub-matrix is learned according to independent component analysis as a zero matrix in which each component is all zero. Canceller.

前記分離行列は、
前記第４の部分行列を、対角以外の構成要素が０である対角行列として独立成分分析に従って学習されることを特徴する、請求項１から請求項４までのいずれか１項に記載のマルチチャンネルエコーキャンセラ。 The separation matrix is
The said 4th submatrix is learned according to an independent component analysis as a diagonal matrix whose components other than a diagonal are 0 , The any one of Claim 1- Claim 4 characterized by the above-mentioned. Multi channel echo canceller.

第１の場所と第２の場所との間で相互に音響信号を伝送し、当該第１の場所に存在する１つ以上の音源から発せられる第１の音響信号は前記第１の場所に設けられる２つのマイクロホンで検出され、当該第２の場所に存在する１つ以上の音源から発せられる第２の音響信号は前記第２の場所に設けられる２つのマイクロホンで検出され、前記第１の場所から前記第２の場所へ伝送された音響信号は前記第２の場所に設けられた２つのスピーカから拡声され、前記第２の場所から前記第１の場所へ伝送された音響信号は前記第１の場所に設けられた複数のスピーカから拡声される音響システムに用いられ、前記第１の場所に設けられるマルチチャンネルエコーキャンセラであって、An acoustic signal is transmitted between the first location and the second location, and the first acoustic signal emitted from one or more sound sources existing at the first location is provided at the first location. The second acoustic signal detected by the two microphones detected and emitted from one or more sound sources existing at the second location is detected by the two microphones provided at the second location, and the first location The acoustic signal transmitted from the second location to the second location is amplified by two speakers provided at the second location, and the acoustic signal transmitted from the second location to the first location is the first location. A multi-channel echo canceller provided in the first location, which is used in an acoustic system that is loudspeaked from a plurality of speakers provided in the location;
前記第２の場所から前記第１の場所へ伝送された２つのスピーカ入力信号ｓｐ１、ｓｐ２が前記第１の場所に設けられた２つのスピーカで拡声された音響信号と、前記第１の音響信号とを検出する、前記第１の場所に設けられた２つのマイクロホンの検出信号ｍ１、ｍ２と、An acoustic signal in which two speaker input signals sp1 and sp2 transmitted from the second location to the first location are amplified by two speakers provided in the first location, and the first acoustic signal Detecting signals m1, m2 of two microphones provided in the first place,
前記スピーカ入力信号とThe speaker input signal and
を取得する取得部と、An acquisition unit for acquiring
前記２つのマイクロホンの検出信号に含まれる第１の音響信号ｙ１、ｙ２と、前記２つのマイクロホンの検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号ｙ３、ｙ４とを分離するための分離行列Ｗであって、First acoustic signals y1 and y2 included in the detection signals of the two microphones, and second acoustic signals y3 and y4 included in the amplified acoustic signal included in the detection signals of the two microphones. A separation matrix W for separation,

で表される分離行列Ｗと、A separation matrix W represented by:
前記２つのマイクロホンの検出信号および前記２つのスピーカ入力信号により構成される入力ベクトルＩであって、An input vector I composed of detection signals of the two microphones and the two speaker input signals,

で表される入力ベクトルＩとを用いて、前記２つのマイクロホンの検出信号に含まれる第１の音響信号ｙ１、ｙ２と第２の音響信号ｙ３、ｙ４とをAnd the first acoustic signals y1 and y2 and the second acoustic signals y3 and y4 included in the detection signals of the two microphones.

で表される行列演算により分離し、当該分離した第１の音響信号ｙ１、ｙ２を出力するエコーキャンセル部とAnd an echo canceling unit that outputs the separated first acoustic signals y1 and y2 by a matrix operation represented by
を備え、With
前記分離行列は、Ｗ３４＝Ｗ４３＝０として独立成分分析に従って学習されるThe separation matrix is learned according to independent component analysis with W34 = W43 = 0.
ことを特徴するマルチチャンネルエコーキャンセラ。Multi-channel echo canceller characterized by that.

前記分離行列は、Ｗ３１＝Ｗ３２＝Ｗ４１＝Ｗ４２＝０として独立成分分析に従って学習されることを特徴する、請求項６に記載のマルチチャンネルエコーキャンセラ。The multi-channel echo canceller according to claim 6, wherein the separation matrix is learned according to independent component analysis as W31 = W32 = W41 = W42 = 0.

前記分離行列は、Ｗ１２＝Ｗ２１＝０として独立成分分析に従って学習されることを特徴する、請求項６または７に記載のマルチチャンネルエコーキャンセラ。The multi-channel echo canceller according to claim 6 or 7, wherein the separation matrix is learned according to independent component analysis with W12 = W21 = 0.

第１の場所と第２の場所との間で相互に音響信号を伝送し、当該第１の場所に存在する１つ以上の音源から発せられる第１の音響信号は前記第１の場所に設けられる複数のマイクロホンで検出され、当該第２の場所に存在する１つ以上の音源から発せられる第２の音響信号は前記第２の場所に設けられる複数のマイクロホンで検出され、前記第１の場所から前記第２の場所へ伝送された音響信号は前記第２の場所に設けられた複数のスピーカから拡声され、前記第２の場所から前記第１の場所へ伝送された音響信号は前記第１の場所に設けられた複数のスピーカから拡声される音響システムに用いられ、前記第１の場所に設けられるマルチチャンネルエコーキャンセラであって、An acoustic signal is transmitted between the first location and the second location, and the first acoustic signal emitted from one or more sound sources existing at the first location is provided at the first location. The second acoustic signal detected by the plurality of microphones and emitted from one or more sound sources existing at the second location is detected by the plurality of microphones provided at the second location, and the first location The acoustic signal transmitted from the second location to the second location is amplified from a plurality of speakers provided at the second location, and the acoustic signal transmitted from the second location to the first location is the first location. A multi-channel echo canceller provided in the first location, which is used in an acoustic system that is loudspeaked from a plurality of speakers provided in the location;
前記第２の場所から前記第１の場所へ伝送されたスピーカ入力信号が前記第１の場所に設けられた複数のスピーカで拡声された音響信号と、前記第１の音響信号とを検出する、前記第１の場所に設けられた複数のマイクロホンの検出信号と、前記スピーカ入力信号とを取得する取得部と、Detecting an acoustic signal in which a speaker input signal transmitted from the second location to the first location is amplified by a plurality of speakers provided in the first location, and the first acoustic signal; An acquisition unit for acquiring detection signals of a plurality of microphones provided in the first location and the speaker input signal;
各前記検出信号に含まれる第１の音響信号と、各前記検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号とを分離するための分離行列であって、前記第１の場所に設けられた複数のスピーカから当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第１の部分行列と、A separation matrix for separating a first acoustic signal included in each of the detection signals and a second acoustic signal included in the amplified acoustic signal included in each of the detection signals, A first submatrix composed of a plurality of matrix elements relating to respective transfer characteristics from a plurality of speakers provided at one place to a plurality of microphones provided at the first place;
前記第２の場所に存在する１以上の音源から当該第２の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第２の部分行列と、A second sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the second location to a plurality of microphones provided at the second location;
各前記検出信号から当該検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号を分離する複数の行列要素で構成される第３の部分行列と、A third sub-matrix composed of a plurality of matrix elements for separating the second acoustic signal included in the amplified acoustic signal included in the detection signal from each of the detection signals;
前記第１の場所に存在する１以上の音源から当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第４の部分行列とA fourth sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the first location to a plurality of microphones provided at the first location;
を含む分離行列に対して、For a separation matrix containing
各前記検出信号および各前記スピーカ入力信号により構成される入力ベクトルを前記分離行列に対して乗算することにより、各前記検出信号に含まれる第１の音響信号と第２の音響信号とを分離し、当該分離した第１の音響信号を出力するエコーキャンセル部とThe first acoustic signal and the second acoustic signal included in each detection signal are separated by multiplying the separation matrix by an input vector constituted by each detection signal and each speaker input signal. An echo canceling unit for outputting the separated first acoustic signal;
を備え、With
前記分離行列は、The separation matrix is
前記第３の部分行列を、各構成要素が全て０であるゼロ行列として独立成分分析に従って学習されることを特徴するマルチチャンネルエコーキャンセラ。A multi-channel echo canceller, wherein the third sub-matrix is learned in accordance with independent component analysis as a zero matrix in which each component is all zero.

第１の場所と第２の場所との間で相互に音響信号を伝送し、当該第１の場所に存在する１つ以上の音源から発せられる第１の音響信号は前記第１の場所に設けられる複数のマイクロホンで検出され、当該第２の場所に存在する１つ以上の音源から発せられる第２の音響信号は前記第２の場所に設けられる複数のマイクロホンで検出され、前記第１の場所から前記第２の場所へ伝送された音響信号は前記第２の場所に設けられた複数のスピーカから拡声され、前記第２の場所から前記第１の場所へ伝送された音響信号は前記第１の場所に設けられた複数のスピーカから拡声される音響システムに用いられ、前記第１の場所に設けられるマルチチャンネルエコーキャンセラであって、An acoustic signal is transmitted between the first location and the second location, and the first acoustic signal emitted from one or more sound sources existing at the first location is provided at the first location. The second acoustic signal detected by the plurality of microphones and emitted from one or more sound sources existing at the second location is detected by the plurality of microphones provided at the second location, and the first location The acoustic signal transmitted from the second location to the second location is amplified from a plurality of speakers provided at the second location, and the acoustic signal transmitted from the second location to the first location is the first location. A multi-channel echo canceller provided in the first location, which is used in an acoustic system that is loudspeaked from a plurality of speakers provided in the location;
前記第２の場所から前記第１の場所へ伝送されたスピーカ入力信号が前記第１の場所に設けられた複数のスピーカで拡声された音響信号と、前記第１の音響信号とを検出する、前記第１の場所に設けられた複数のマイクロホンの検出信号と、前記スピーカ入力信号とを取得する取得部と、Detecting an acoustic signal in which a speaker input signal transmitted from the second location to the first location is amplified by a plurality of speakers provided in the first location, and the first acoustic signal; An acquisition unit for acquiring detection signals of a plurality of microphones provided in the first location and the speaker input signal;
各前記検出信号に含まれる第１の音響信号と、各前記検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号とを分離するための分離行列であって、前記第１の場所に設けられた複数のスピーカから当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第１の部分行列と、A separation matrix for separating a first acoustic signal included in each of the detection signals and a second acoustic signal included in the amplified acoustic signal included in each of the detection signals, A first submatrix composed of a plurality of matrix elements relating to respective transfer characteristics from a plurality of speakers provided at one place to a plurality of microphones provided at the first place;
前記第２の場所に存在する１以上の音源から当該第２の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第２の部分行列と、A second sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the second location to a plurality of microphones provided at the second location;
各前記検出信号から当該検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号を分離する複数の行列要素で構成される第３の部分行列と、A third sub-matrix composed of a plurality of matrix elements for separating the second acoustic signal included in the amplified acoustic signal included in the detection signal from each of the detection signals;
前記第１の場所に存在する１以上の音源から当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第４の部分行列とA fourth sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the first location to a plurality of microphones provided at the first location;
を含む分離行列に対して、For a separation matrix containing
各前記検出信号および各前記スピーカ入力信号により構成される入力ベクトルを前記分離行列に対して乗算することにより、各前記検出信号に含まれる第１の音響信号と第２の音響信号とを分離し、当該分離した第１の音響信号を出力するエコーキャンセル部とThe first acoustic signal and the second acoustic signal included in each detection signal are separated by multiplying the separation matrix by an input vector constituted by each detection signal and each speaker input signal. An echo canceling unit for outputting the separated first acoustic signal;
を備え、With
前記分離行列は、The separation matrix is
前記第４の部分行列を、対角以外の構成要素が０である対角行列として独立成分分析に従って学習されることを特徴するマルチチャンネルエコーキャンセラ。The multi-channel echo canceller, wherein the fourth sub-matrix is learned according to independent component analysis as a diagonal matrix in which components other than the diagonal are zero.

第１の場所と第２の場所との間で相互に音響信号を伝送し、当該第１の場所に存在する１つ以上の音源から発せられる第１の音響信号は前記第１の場所に設けられる複数のマイクロホンで検出され、当該第２の場所に存在する１つ以上の音源から発せられる第２の音響信号は前記第２の場所に設けられる複数のマイクロホンで検出され、前記第１の場所から前記第２の場所へ伝送された音響信号は前記第２の場所に設けられた複数のスピーカから拡声され、前記第２の場所から前記第１の場所へ伝送された音響信号は前記第１の場所に設けられた複数のスピーカから拡声される音響システムに用いられる、第１の場所に対するマルチチャンネルエコーキャンセルを行うマルチチャンネルエコーキャンセル方法であって、
前記第２の場所から前記第１の場所へ伝送されたスピーカ入力信号が前記第１の場所に設けられた複数のスピーカで拡声された音響信号と、前記第１の音響信号とを検出する、前記第１の場所に設けられた複数のマイクロホンの検出信号と、前記スピーカ入力信号とを取得する取得ステップと、
各前記検出信号に含まれる第１の音響信号と、各前記検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号とを分離するための分離行列であって、前記第１の場所に設けられた複数のスピーカから当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第１の部分行列と、
前記第２の場所に存在する１以上の音源から当該第２の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第２の部分行列と、
各前記検出信号から当該検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号を分離する複数の行列要素で構成される第３の部分行列と、
前記第１の場所に存在する１以上の音源から当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第４の部分行列と
を含む分離行列に対して、
各前記検出信号および各前記スピーカ入力信号により構成される入力ベクトルを前記分離行列に対して乗算することにより、各前記検出信号に含まれる第１の音響信号と第２の音響信号とを分離するステップと、
当該分離した第１の音響信号を出力することによって、各前記検出信号に含まれる第２の音響信号をエコーとしてキャンセルするキャンセルステップと、
前記分離行列の前記第２の部分行列を、対角以外の構成要素が０である対角行列として独立成分分析に従って学習する学習ステップとを有することを特徴するマルチチャンネルエコーキャンセル方法。 An acoustic signal is transmitted between the first location and the second location, and the first acoustic signal emitted from one or more sound sources existing at the first location is provided at the first location. The second acoustic signal detected by the plurality of microphones and emitted from one or more sound sources existing at the second location is detected by the plurality of microphones provided at the second location, and the first location The acoustic signal transmitted from the second location to the second location is amplified from a plurality of speakers provided at the second location, and the acoustic signal transmitted from the second location to the first location is the first location. A multi-channel echo cancellation method for performing multi -channel echo cancellation for a first location, used in an acoustic system that is loudspeaked from a plurality of speakers provided at a location ,
Detecting an acoustic signal in which a speaker input signal transmitted from the second location to the first location is amplified by a plurality of speakers provided in the first location, and the first acoustic signal; An acquisition step of acquiring detection signals of a plurality of microphones provided in the first location and the speaker input signal;
A separation matrix for separating a first acoustic signal included in each of the detection signals and a second acoustic signal included in the amplified acoustic signal included in each of the detection signals, A first submatrix composed of a plurality of matrix elements relating to respective transfer characteristics from a plurality of speakers provided at one place to a plurality of microphones provided at the first place;
A second sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the second location to a plurality of microphones provided at the second location;
A third sub-matrix composed of a plurality of matrix elements for separating the second acoustic signal included in the amplified acoustic signal included in the detection signal from each of the detection signals;
A fourth sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the first location to a plurality of microphones provided at the first location;
For a separation matrix containing
The first acoustic signal and the second acoustic signal included in each detection signal are separated by multiplying the separation matrix by an input vector constituted by each detection signal and each speaker input signal. Steps,
Canceling the second acoustic signal included in each of the detection signals as an echo by outputting the separated first acoustic signal; and
A learning step of learning the second sub-matrix of the separation matrix as a diagonal matrix whose components other than the diagonal are 0 according to independent component analysis .

第１の場所と第２の場所との間で相互に音響信号を伝送し、当該第１の場所に存在する１つ以上の音源から発せられる第１の音響信号は前記第１の場所に設けられる複数のマイクロホンで検出され、当該第２の場所に存在する１つ以上の音源から発せられる第２の音響信号は前記第２の場所に設けられる複数のマイクロホンで検出され、前記第１の場所から前記第２の場所へ伝送された音響信号は前記第２の場所に設けられた複数のスピーカから拡声され、前記第２の場所から前記第１の場所へ伝送された音響信号は前記第１の場所に設けられた複数のスピーカから拡声される音響システムに用いられる、第１の場所に対するマルチチャンネルエコーキャンセルを行うための、コンピュータに実行させるプログラムであって、
前記コンピュータに、
前記第２の場所から前記第１の場所へ伝送されたスピーカ入力信号が前記第１の場所に設けられた複数のスピーカで拡声された音響信号と、前記第１の音響信号とを検出する、前記第１の場所に設けられた複数のマイクロホンの検出信号と、前記スピーカ入力信号とを取得する取得ステップと、
各前記検出信号に含まれる第１の音響信号と、各前記検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号とを分離するための分離行列であって、前記第１の場所に設けられた複数のスピーカから当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第１の部分行列と、
前記第２の場所に存在する１以上の音源から当該第２の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第２の部分行列と、
各前記検出信号から当該検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号を分離する複数の行列要素で構成される第３の部分行列と、
前記第１の場所に存在する１以上の音源から当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第４の部分行列と
を含む分離行列に対して、
各前記検出信号および各前記スピーカ入力信号により構成される入力ベクトルを前記分離行列に対して乗算することにより、各前記検出信号に含まれる第１の音響信号と第２の音響信号とを分離するステップと、
当該分離した第１の音響信号を出力することによって、各前記検出信号に含まれる第２の音響信号をエコーとしてキャンセルするキャンセルステップと、
前記分離行列の前記第２の部分行列を、対角以外の構成要素が０である対角行列として独立成分分析に従って学習する学習ステップとを実行させるプログラム。 An acoustic signal is transmitted between the first location and the second location, and the first acoustic signal emitted from one or more sound sources existing at the first location is provided at the first location. The second acoustic signal detected by the plurality of microphones and emitted from one or more sound sources existing at the second location is detected by the plurality of microphones provided at the second location, and the first location The acoustic signal transmitted from the second location to the second location is amplified from a plurality of speakers provided at the second location, and the acoustic signal transmitted from the second location to the first location is the first location. A program for causing a computer to perform multi-channel echo cancellation for a first location, used in an acoustic system that is loudspeaked from a plurality of speakers provided at a location ,
In the computer,
Detecting an acoustic signal in which a speaker input signal transmitted from the second location to the first location is amplified by a plurality of speakers provided in the first location, and the first acoustic signal; An acquisition step of acquiring detection signals of a plurality of microphones provided in the first location and the speaker input signal;
A separation matrix for separating a first acoustic signal included in each of the detection signals and a second acoustic signal included in the amplified acoustic signal included in each of the detection signals, A first submatrix composed of a plurality of matrix elements relating to respective transfer characteristics from a plurality of speakers provided at one place to a plurality of microphones provided at the first place;
A second sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the second location to a plurality of microphones provided at the second location;
A third sub-matrix composed of a plurality of matrix elements for separating the second acoustic signal included in the amplified acoustic signal included in the detection signal from each of the detection signals;
A fourth sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the first location to a plurality of microphones provided at the first location;
For a separation matrix containing
The first acoustic signal and the second acoustic signal included in each detection signal are separated by multiplying the separation matrix by an input vector constituted by each detection signal and each speaker input signal. Steps,
Canceling the second acoustic signal included in each of the detection signals as an echo by outputting the separated first acoustic signal; and
A program for executing a learning step of learning the second sub-matrix of the separation matrix as a diagonal matrix in which components other than the diagonal are 0 according to independent component analysis .

第１の場所と第２の場所との間で相互に音響信号を伝送し、当該第１の場所に存在する１つ以上の音源から発せられる第１の音響信号は前記第１の場所に設けられる複数のマイクロホンで検出され、当該第２の場所に存在する１つ以上の音源から発せられる第２の音響信号は前記第２の場所に設けられる複数のマイクロホンで検出され、前記第１の場所から前記第２の場所へ伝送された音響信号は前記第２の場所に設けられた複数のスピーカから拡声され、前記第２の場所から前記第１の場所へ伝送された音響信号は前記第１の場所に設けられた複数のスピーカから拡声される音響システムに用いられる、第１の場所に対するマルチチャンネルエコーキャンセルを行う集積回路であって、
前記第２の場所から前記第１の場所へ伝送されたスピーカ入力信号が前記第１の場所に設けられた複数のスピーカで拡声された音響信号と、前記第１の音響信号とを検出する、前記第１の場所に設けられた複数のマイクロホンの検出信号と、前記スピーカ入力信号とを取得する取得部と、
各前記検出信号に含まれる第１の音響信号と、各前記検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号とを分離するための分離行列であって、前記第１の場所に設けられた複数のスピーカから当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第１の部分行列と、
前記第２の場所に存在する１以上の音源から当該第２の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第２の部分行列と、
各前記検出信号から当該検出信号に含まれる前記拡声された音響信号に含まれる前記第２の音響信号を分離する複数の行列要素で構成される第３の部分行列と、
前記第１の場所に存在する１以上の音源から当該第１の場所に設けられた複数のマイクロホンまでの各伝達特性に関する複数の行列要素で構成される第４の部分行列と
を含む分離行列に対して、
各前記検出信号および各前記スピーカ入力信号により構成される入力ベクトルを前記分離行列に対して乗算することにより、各前記検出信号に含まれる第１の音響信号と第２の音響信号とを分離し、当該分離した第１の音響信号を出力するエコーキャンセル部と
を備え、
前記分離行列は、
前記第２の部分行列を、対角以外の構成要素が０である対角行列として独立成分分析に従って学習される、集積回路。 An acoustic signal is transmitted between the first location and the second location, and the first acoustic signal emitted from one or more sound sources existing at the first location is provided at the first location. The second acoustic signal detected by the plurality of microphones and emitted from one or more sound sources existing at the second location is detected by the plurality of microphones provided at the second location, and the first location The acoustic signal transmitted from the second location to the second location is amplified from a plurality of speakers provided at the second location, and the acoustic signal transmitted from the second location to the first location is the first location. An integrated circuit that performs multi-channel echo cancellation for a first location, used in an acoustic system that is loudspeaked from a plurality of speakers provided at the location ,
Detecting an acoustic signal in which a speaker input signal transmitted from the second location to the first location is amplified by a plurality of speakers provided in the first location, and the first acoustic signal; An acquisition unit for acquiring detection signals of a plurality of microphones provided in the first location and the speaker input signal;
A separation matrix for separating a first acoustic signal included in each of the detection signals and a second acoustic signal included in the amplified acoustic signal included in each of the detection signals, A first submatrix composed of a plurality of matrix elements relating to respective transfer characteristics from a plurality of speakers provided at one place to a plurality of microphones provided at the first place;
A second sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the second location to a plurality of microphones provided at the second location;
A third sub-matrix composed of a plurality of matrix elements for separating the second acoustic signal included in the amplified acoustic signal included in the detection signal from each of the detection signals;
A fourth sub-matrix composed of a plurality of matrix elements relating to each transfer characteristic from one or more sound sources present at the first location to a plurality of microphones provided at the first location;
For a separation matrix containing
The first acoustic signal and the second acoustic signal included in each detection signal are separated by multiplying the separation matrix by an input vector constituted by each detection signal and each speaker input signal. An echo canceling unit for outputting the separated first acoustic signal;
With
The separation matrix is
An integrated circuit in which the second sub-matrix is learned according to independent component analysis as a diagonal matrix in which components other than the diagonal are zero .