JP5698110B2

JP5698110B2 - Multi-channel echo cancellation method, multi-channel echo cancellation apparatus, and program

Info

Publication number: JP5698110B2
Application number: JP2011261375A
Authority: JP
Inventors: 江村　暁; 暁江村; 羽田　陽一; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-11-30
Filing date: 2011-11-30
Publication date: 2015-04-08
Anticipated expiration: 2031-11-30
Also published as: JP2013115681A

Description

本発明は、マルチチャネル拡声通話系において音響エコーを消去するマルチチャネルエコー消去方法、マルチチャネルエコー消去装置、およびプログラムに関する。 The present invention relates to a multi-channel echo cancellation method, a multi-channel echo cancellation apparatus, and a program for canceling acoustic echo in a multi-channel loudspeaker communication system.

近年、マルチチャネル再生技術は、ステレオから５．１チャネルへとチャネル数拡大の方向に進んでいる。しかし、音が高い立体感を持って再生されるリスニングエリアは狭く、スィートスポット化しており、スィートスポットの外では音の立体感がかなり低減することが知られている。マルチチャネル再生をもちいるテレプレゼンスＴＶ会議は、１０人以上の参加者にも均しく音の立体感を提供可能なリスニングエリアの広い再生が求められている。このようなマルチチャネル再生技術として、Wave Field Synthesis（以下、ＷＦＳと略す。）の研究開発が盛んに進められている（非特許文献１参照）。 In recent years, multi-channel playback technology has progressed in the direction of expanding the number of channels from stereo to 5.1 channels. However, it is known that the listening area where the sound is reproduced with a high three-dimensional feeling is narrow and is a sweet spot, and the three-dimensional sound is considerably reduced outside the sweet spot. Telepresence TV conferences that use multi-channel playback require playback of a wide listening area that can provide a three-dimensional sound equally to more than 10 participants. As such a multi-channel reproduction technique, research and development of Wave Field Synthesis (hereinafter abbreviated as WFS) has been actively promoted (see Non-Patent Document 1).

テレプレゼンス会議のようなマルチチャネル拡声型の双方向通信会議システムでは、受話音声がスピーカから再生されマイクロホンに収音されて音響エコーが生じるため、そのまま送信されると通話の障害や不快感などの問題が生じる。快適な通話環境を実現するために、スピーカからマイクロホンに音響的に回り込む信号成分を、マイクロホン収音信号から消去する音響エコーキャンセラを備えることがある。 In a two-channel communication conference system of multi-channel loudspeaker type such as a telepresence conference, the received voice is reproduced from the speaker and picked up by a microphone to generate an acoustic echo. Problems arise. In order to realize a comfortable calling environment, an acoustic echo canceller that erases a signal component that circulates acoustically from a speaker to a microphone may be provided.

マルチチャネル通信会議システムがＭ（≧２）チャネル再生系とＮ（≧１）チャネル収音系からなるとき、音響エコーキャンセラは、図１の構成によりエコー消去を行う。図１を参照して従来のマルチチャネルエコー消去装置１０の動作を説明する。従来のマルチチャネルエコー消去装置１０は、Ｍ（１≦ｍ≦Ｍ）個のスピーカ２_１〜２_Ｍ、Ｎ（１≦ｎ≦Ｎ）個のマイクロホン３_１〜３_Ｎ、受話信号ベクトル変換部１００、Ｎ個のエコーレプリカ生成部２００_１〜２００_Ｎ、エコー消去部５００を備える。受話信号はスピーカ２_１〜２_Ｍで音響信号として再生され、音響エコー経路を経てマイクロホン３_１〜３_Ｎに回り込む。受話信号は受話信号ベクトル変換部１００でベクトル化され、エコーレプリカ生成部２００_１〜２００_Ｎはベクトル化された受話信号とエコー経路推定値からエコーレプリカを生成する。エコー消去部５００は、マイクロホン収音信号からエコーレプリカを引くことでエコー消去を行う。エコー消去部５００の出力である誤差信号は、エコーレプリカ生成部２００_１〜２００_Ｎに入力される。エコーレプリカ生成部２００_１〜２００_Ｎは、受話信号と誤差信号から、エコー経路推定値を更新する。エコー経路が精度よく推定された状態では、エコー信号とエコーレプリカ信号がほぼ等しくなり、誤差信号中にエコーは殆ど含まれなくなる。 When the multi-channel communication conference system includes an M (≧ 2) channel reproduction system and an N (≧ 1) channel sound collection system, the acoustic echo canceller performs echo cancellation with the configuration shown in FIG. The operation of the conventional multi-channel echo canceling apparatus 10 will be described with reference to FIG. The conventional multi-channel echo canceller 10 includes M (1 ≦ m ≦ M) speakers 2 ₁ to 2 _M , N (1 ≦ n ≦ N) microphones 3 _{1 to} 3 _N , and a received signal vector conversion unit 100. , N echo replica generators 200 ₁ to 200 _N and an echo canceller 500. The received signal is reproduced as an acoustic signal by the speakers 2 _{1 to} 2 _M , and passes around the microphones 3 _{1 to} 3 _N through an acoustic echo path. The received signal is vectorized by the received signal vector conversion unit 100, and the echo replica generation units 200 ₁ to 200 _N generate an echo replica from the vectorized received signal and the echo path estimation value. The echo cancellation unit 500 performs echo cancellation by subtracting an echo replica from the microphone sound pickup signal. The error signal that is the output of the echo canceller 500 is input to the echo replica generators 200 ₁ to 200 _N. The echo replica generation units 200 ₁ to 200 _N update the echo path estimation value from the received signal and the error signal. When the echo path is accurately estimated, the echo signal and the echo replica signal are substantially equal, and the echo is hardly included in the error signal.

演算量を削減するために、時間領域ではなく周波数領域でエコー計算とフィルタ係数更新を行うアルゴリズムが提案されている（非特許文献２参照）。図２は非特許文献２のアルゴリズムをマルチチャネルに適用したマルチチャネルエコー消去装置１１の構成を示すブロック図である。図３は従来のマルチチャネルエコー消去装置１１の動作を示すフローチャートである。以下、図２，３を参照しながら、マルチチャネルエコー消去装置１１の動作を詳細に説明する。なお、以下の説明では、ｋは時間を表し、ｆは周波数を表し、ｊはフレーム番号を表す。 In order to reduce the amount of calculation, an algorithm that performs echo calculation and filter coefficient update in the frequency domain instead of the time domain has been proposed (see Non-Patent Document 2). FIG. 2 is a block diagram showing a configuration of a multi-channel echo canceller 11 that applies the algorithm of Non-Patent Document 2 to multi-channels. FIG. 3 is a flowchart showing the operation of the conventional multi-channel echo canceller 11. Hereinafter, the operation of the multichannel echo canceller 11 will be described in detail with reference to FIGS. In the following description, k represents time, f represents frequency, and j represents a frame number.

従来のマルチチャネルエコー消去装置１１は、Ｍ（１≦ｍ≦Ｍ）個のスピーカ２_１〜２_Ｍと、Ｎ（１≦ｎ≦Ｎ）個のマイクロホン３_１〜３_Ｎと、受話信号ベクトル変換部１１０と、Ｎ個のエコーレプリカ生成部２１０_１〜２１０_Ｎと、逆ＦＦＴ部４００と、エコー消去部５１０と、ＦＦＴ部６００を備える。 The conventional multi-channel echo canceller 11 includes M (1 ≦ m ≦ M) speakers 2 ₁ to 2 _M , N (1 ≦ n ≦ N) microphones 3 _{1 to} 3 _N , and received signal vector conversion. Unit 110, N echo replica generation units 210 _{1 to} 210 _N , an inverse FFT unit 400, an echo cancellation unit 510, and an FFT unit 600.

受話信号ベクトル変換部１１０は、Ｍチャネルの受話信号ｘ_ｍ（ｋ）をＬサンプルごとにブロック化し、１フレーム＝２Ｌサンプルとして、１フレーム分を高速フーリエ変換により周波数領域に変換し、式（１）のように受話信号Ｘ_ｍ（ｆ，ｊ）を生成する（Ｓ１１０）。ここで、Ｌは自然数であり、フレーム分割数ＤはＬを割り切る自然数であり、高速フーリエ変換を簡略化・高速化するために、Ｌを２のべき乗にとることが多い。 The received signal vector conversion unit 110 blocks the M channel received signal x _m (k) for each L samples, converts 1 frame = 2L samples, and converts one frame into the frequency domain by fast Fourier transform. The received signal X _m (f, j) is generated as shown in FIG. Here, L is a natural number, and the frame division number D is a natural number that divides L. In order to simplify and speed up the fast Fourier transform, L is often a power of 2.

エコーレプリカ生成部２１０_ｎは、周波数ｆごとに、式（２）のように、受話信号Ｘ_ｍ（ｆ，ｊ）とフィルタ係数Ｈ_ｍ，ｎ（ｆ，ｊ）を掛けることで、受話信号Ｘ_ｍ（ｆ，ｊ）をフィルタ処理し、これをＭチャネル分加算する。これによりエコーレプリカＹ＾_ｎ（ｆ，ｊ）を求める（Ｓ２１０）。 The echo replica generation unit 210 _n multiplies the received signal X _m (f, j) by the filter coefficient H _{m, n} (f, j) for each frequency f as shown in Expression (2), thereby receiving the received signal X Filter _m (f, j) and add it for M channels. Thus, the echo replica Y ^ _n (f, j) is obtained (S210).

逆ＦＦＴ部４００は、エコーレプリカＹ＾_ｎ（ｆ，ｊ）を逆高速フーリエ変換により時間領域に変換し、式（３）のようにエコーレプリカｙ＾_ｎ（ｊ）を求める（Ｓ４００）。 The inverse FFT unit 400 converts the echo replica Y ^ _n (f, j) into the time domain by inverse fast Fourier transform, and obtains the echo replica y ^ _n (j) as shown in Equation (3) (S400).

ここで、０_ＬはＬ×Ｌの零行列、Ｉ_ＬはＬ×Ｌの単位行列である。 Here, 0 _L is an L × L zero matrix, and _IL is an L × L unit matrix.

エコー消去部５１０は、時間領域でＮ個のマイクロホン３_１〜３_Ｎから収音されるＮチャネルの送話信号ｙ_ｎ（ｊ）とエコーレプリカｙ＾_ｎ（ｊ）から誤差信号ｅ_ｎ（ｊ）を求める（Ｓ５１０）。 The echo canceling unit 510 receives an error signal e _n (j) from the N-channel transmission signal y _n (j) and the echo replica y ^ _n (j) collected from the _N microphones 3 _{1 to} 3 _{N in} the time domain. ) Is obtained (S510).

ＦＦＴ部６００は、誤差信号ｅ_ｎ（ｊ）を高速フーリエ変換により周波数領域に変換し、式（４）のように誤差信号Ｅ_ｎ（ｆ，ｊ）を求める（Ｓ６００）。 The FFT unit 600 converts the error signal e _n (j) to the frequency domain by fast Fourier transform, and obtains the error signal E _n (f, j) as shown in Equation (4) (S600).

エコーレプリカ生成部２１０_ｎは、誤差信号Ｅ_ｎ（ｆ，ｊ）と受話信号Ｘ_ｍ（ｆ，ｊ）から、式（５）のようにフィルタ係数の修正量ｄＨ_ｍ，ｎ（ｆ，ｊ）を求める。 The echo replica generation unit 210 _n uses the error signal E _n (f, j) and the received signal X _m (f, j) to adjust the filter coefficient correction amount dH _{m, n} (f, j) as shown in Equation (5). Ask for.

ただし、^＊（上付きアスタリスク）は、複素共役を表す。 However, ^* (superscript asterisk) represents a complex conjugate.

次に、各チャネルのフィルタ係数Ｈ_ｍ，ｎ（ｆ，ｊ）を、式（６）のように更新する（Ｓ７００）。 Next, the filter coefficient H _{m, n} (f, j) of each channel is updated as shown in Expression (6) (S700).

ここで、ｐ（ｆ，ｊ）は、周波数成分ごとに、Ｎチャネル分の送話信号パワーの総和を、式（７）のように計算することで求めたものであり、修正量ｄＨ_ｍ，ｎ（ｆ，ｊ）を補正している。 Here, p (f, j) is obtained by calculating the sum of transmission signal powers for N channels for each frequency component as shown in Expression (7), and the correction amount dH _{m, n} (f, j) is corrected.

ただし、μは０〜１の値をとるステップサイズであり、δは分母が０になることを防止するための微小な正定数であり、βは０〜１の値をとるパワー計算で短時間平均をとるための平滑化定数である。 However, μ is a step size that takes a value of 0 to 1, δ is a small positive constant for preventing the denominator from becoming 0, and β is a short time in power calculation that takes a value of 0 to 1. This is a smoothing constant for averaging.

A. J. Berkhout, D. de Vries, and P. Vogel, “Acoustic control by wave field synthesis”, Journal of Acoustic Society of America, vol. 93, no. 5, pp.2764-2778 (1993).A. J. Berkhout, D. de Vries, and P. Vogel, “Acoustic control by wave field synthesis”, Journal of Acoustic Society of America, vol. 93, no. 5, pp.2764-2778 (1993). D. Mansour and A. H. Gray, “Unconstrained Frequency-Domain Adaptive Filter”, IEEE Trans. on Acoust., Speech, Signal Processing, vol. ASSP-30, No. 5, pp.726-734 (1982).D. Mansour and A. H. Gray, “Unconstrained Frequency-Domain Adaptive Filter”, IEEE Trans. On Acoust., Speech, Signal Processing, vol. ASSP-30, No. 5, pp. 726-734 (1982).

しかしながら、非特許文献１に記載されたＷＦＳは、ある地点での音波面を取得し別の地点で再合成するために、数十以上のマイクロホンと数十以上のスピーカを必要とする。そのため、ＷＦＳを双方向テレビ会議に導入しようとする場合、スピーカ・マイクロホンの間の音響パスはチャネル数の二乗になるため、エコーキャンセラが推定する音響パス数はチャネル数の二乗で急激に増大する。 However, WFS described in Non-Patent Document 1 requires tens or more microphones and tens or more speakers in order to acquire a sound wave surface at a certain point and re-synthesize at another point. Therefore, when introducing WFS into a two-way video conference, the acoustic path between the speaker and microphone becomes the square of the number of channels, so the number of acoustic paths estimated by the echo canceller increases rapidly with the square of the number of channels. .

また、エコーキャンセラの演算量を下げるために、非特許文献２に記載された周波数領域でエコー計算とフィルタ係数更新を行う方法を用いても、エコーレプリカ生成とフィルタ係数更新の処理は、入力チャネル数と出力チャネル数の積、すなわちＭ×Ｎで増大してしまう。 Even if the method of performing echo calculation and filter coefficient update in the frequency domain described in Non-Patent Document 2 is used to reduce the computation amount of the echo canceller, the process of echo replica generation and filter coefficient update is performed in the input channel. It increases by the product of the number and the number of output channels, that is, M × N.

本発明はこのような点に鑑みてなされたものであり、マルチチャネル拡声通話系において音響エコーを消去するマルチチャネルエコー消去処理における演算量を低減することができるマルチチャネルエコー消去装置を提供することを目的とする。 The present invention has been made in view of the above points, and provides a multi-channel echo canceling apparatus capable of reducing the amount of computation in multi-channel echo canceling processing for canceling acoustic echoes in a multi-channel loudspeaker communication system. With the goal.

上記の課題を解決するために、本発明のマルチチャネルエコー消去装置は、Ｍ（１≦ｍ≦Ｍ）個のスピーカと直線上に等間隔に配置されたＮ（３≦ｎ≦Ｎ）個のマイクロホンと受話信号ベクトル変換部とエコーレプリカ生成部とエコーレプリカ空間補間部と逆ＦＦＴ部とエコー消去部とＦＦＴ部を備える。ｋは時間を表し、ｆは周波数を表し、ｊはフレーム番号を表し、ｉは適応フィルタのタップ番号を表し、ｍはスピーカの番号を表し、ｎはマイクロホンの番号を表し、ＳｅｔＮ（ｆ）は直線状に等間隔に配置されたＮ個のマイクロホンから周波数ｆが低いほど空間サンプリング間隔が広くなるように選択されるマイクロホンの番号の集合を表すとする。受話信号ベクトル変換部は、スピーカから出力されるＭチャネルの受話信号ｘ_ｍ（ｋ）を、チャネルｍごとに、周波数領域に変換して、受話信号Ｘ_ｍ（ｆ，ｊ）を生成する。エコーレプリカ生成部は、受話信号Ｘ_ｍ（ｆ，ｊ）が入力されると、当該受話信号Ｘ_ｍ（ｆ，ｊ）から、周波数ｆごとに、ＳｅｔＮ（ｆ）に含まれるマイクロホンｎについて、タップ数がＩ（１≦ｉ≦Ｉ、Ｉは１以上）のフィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を用いて、空間的に間引かれたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）を生成する。エコーレプリカ空間補間部は、空間的に間引かれたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）から、周波数ｆごとに、空間補間を行い、エコーレプリカＹ＾_ｎ（ｆ，ｊ）を生成する。逆ＦＦＴ部は、エコーレプリカＹ＾_ｎ（ｆ，ｊ）を、時間領域に変換して、エコーレプリカｙ＾_ｎ（ｊ）を生成する。エコー消去部は、マイクロホンから収音されるＮチャネルの送話信号ｙ_ｎ（ｊ）とエコーレプリカｙ＾_ｎ（ｊ）から、誤差信号ｅ_ｎ（ｋ）を生成する。ＦＦＴ部は、誤差信号ｅ_ｎ（ｋ）を、周波数領域に変換して、誤差信号Ｅ_ｎ（ｆ，ｊ）を生成する。エコーレプリカ生成部は、誤差信号Ｅ_ｎ（ｆ，ｊ）が入力されると、当該誤差信号Ｅ_ｎ（ｆ，ｊ）と受話信号Ｘ_ｍ（ｆ，ｊ）から、周波数ｆごとに、ＳｅｔＮ（ｆ）に含まれるマイクロホンｎについて、修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を求め、当該修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を用いて、フィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を更新する。 In order to solve the above problems, the multi-channel echo canceller of the present invention has M (1 ≦ m ≦ M) speakers and N (3 ≦ n ≦ N) speakers arranged at equal intervals on a straight line. A microphone, a received signal vector conversion unit, an echo replica generation unit, an echo replica space interpolation unit, an inverse FFT unit, an echo cancellation unit, and an FFT unit are provided. k represents time, f represents frequency, j represents frame number, i represents adaptive filter tap number, m represents speaker number, n represents microphone number, and SetN (f) is It is assumed that a set of microphone numbers selected so that the spatial sampling interval becomes wider as the frequency f becomes lower from N microphones arranged at equal intervals in a straight line is assumed. The received signal vector conversion unit converts the M channel received signal x _m (k) output from the speaker into a frequency domain for each channel m, and generates a received signal X _m (f, j). When the reception signal X _m (f, j) is input, the echo replica generation unit taps the microphone n included in SetN (f) for each frequency f from the reception signal X _m (f, j). Echo replica Y ^ _{SetN (f)} (f ₎ spatially thinned using filter coefficients H _{m, n, i} (f, j) having a number I (1 ≦ i ≦ I, I is 1 or more) , J). The echo replica spatial interpolation unit performs spatial interpolation for each frequency f from the spatially thinned echo replica Y ^ _{SetN (f)} (f, j), and obtains the echo replica Y ^ _n (f, j). Generate. The inverse FFT unit converts the echo replica Y ^ _n (f, j) into the time domain and generates an echo replica y ^ _n (j). Echo cancellation unit, the transmission signals of N channels picked up from the microphone _y n (j) and the echo replica _{y ^} n (j), generates an error signal _e n (k). FFT unit, an error signal _e n (k), is converted into the frequency domain to generate an error signal _E n (f, j). When the error signal E _n (f, j) is input, the echo replica generation unit sets Set N (f) for each frequency f from the error signal E _n (f, j) and the received signal X _m (f, j). f) A correction amount dH _{m, n, i} (f, j) is obtained for the microphone n included in f), and a filter coefficient H _{m, n,} _i is obtained using the correction amount dH _{m, n, i} (f, j) _. Update _i (f, j).

本発明のマルチチャネルエコー消去装置によれば、周波数ごとにエコー信号の空間方向形状に着目し、低い周波数では空間補間を用いてエコーレプリカを生成することで、エコーレプリカの生成とフィルタ係数の更新における演算回数を低減し、マルチチャネル拡声通話系におけるエコー消去処理全体での演算量を低減することができる。 According to the multi-channel echo canceller of the present invention, focusing on the spatial shape of the echo signal for each frequency, generating an echo replica using spatial interpolation at a low frequency, generating an echo replica and updating a filter coefficient The number of computations can be reduced, and the amount of computation in the entire echo cancellation processing in a multi-channel loudspeaker communication system can be reduced.

従来のマルチチャネルエコー消去装置１０の構成を示すブロック図。1 is a block diagram showing a configuration of a conventional multi-channel echo canceller 10. 従来のマルチチャネルエコー消去装置１１の構成を示すブロック図。The block diagram which shows the structure of the conventional multichannel echo cancellation apparatus 11. FIG. 従来のマルチチャネルエコー消去装置１１の動作を示すフローチャート。6 is a flowchart showing the operation of a conventional multi-channel echo canceller 11. エコー信号の周波数ごとの空間形状を示した図。The figure which showed the spatial shape for every frequency of an echo signal. 周波数ごとの間引きパターンを示した図。The figure which showed the thinning pattern for every frequency. 実施例１のマルチチャネルエコー消去装置２０の構成を示すブロック図。1 is a block diagram illustrating a configuration of a multichannel echo canceller 20 according to a first embodiment. 実施例１のマルチチャネルエコー消去装置２０の動作を示すフローチャート。3 is a flowchart showing the operation of the multi-channel echo cancellation apparatus 20 according to the first embodiment. エコーレプリカの空間補間方法を示した図。The figure which showed the spatial interpolation method of an echo replica. 誤差信号の合成方法を示した図。The figure which showed the synthetic | combination method of the error signal. 実施例１の実験結果を示した図。The figure which showed the experimental result of Example 1. FIG. 実施例２のマルチチャネルエコー消去装置３０の構成を示すブロック図。FIG. 4 is a block diagram showing a configuration of a multi-channel echo cancellation apparatus 30 according to the second embodiment. 実施例２のマルチチャネルエコー消去装置３０の動作を示すフローチャート。9 is a flowchart showing the operation of the multi-channel echo cancellation apparatus 30 according to the second embodiment. 実施例２の実験結果を示した図。The figure which showed the experimental result of Example 2. FIG. 実施例３のマルチチャネルエコー消去装置４０の構成を示すブロック図。FIG. 6 is a block diagram showing a configuration of a multi-channel echo cancellation apparatus 40 according to the third embodiment. 実施例３の帯域分割音声スイッチ部７００の構成を示すブロック図。FIG. 9 is a block diagram showing a configuration of a band division voice switch unit 700 according to the third embodiment. 実施例３のマルチチャネルエコー消去装置４０の動作を示すフローチャート。10 is a flowchart showing the operation of the multi-channel echo cancellation apparatus 40 according to the third embodiment.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the structure part which has the same function, and duplication description is abbreviate | omitted.

[本発明の原理]
実施例の説明に先立ち、本発明のマルチチャネルエコー消去方法の原理を説明する。本発明は、周波数ごとのエコー信号の空間方向の形状に着目し、低い周波数では空間補間を用いてエコーレプリカを生成することで、エコーキャンセラ演算量の急激な増大を抑えることができる。マイクロホンが等間隔に直線状に並んでいる場合、各周波数でのエコー信号の空間形状は、周波数が低いほど滑らかになる。それは、音波の波長λと周波数ｆの間に、波長λ×周波数ｆ＝音速ｃ（一定）の関係があるために、周波数ｆが低いほど波長λは長くなるためである。さらにナイキスト定理によれば、低い周波数ではサンプリング間隔をマイクロホン間隔よりもずっと大きくすることができる。これは、低い周波数では空間サンプリングの間隔を広げられること、すなわちサンプリングを間引けること、を意味する。 [Principle of the present invention]
Prior to the description of the embodiments, the principle of the multi-channel echo cancellation method of the present invention will be described. The present invention focuses on the shape of the echo signal in the spatial direction for each frequency and generates an echo replica using spatial interpolation at a low frequency, thereby suppressing a rapid increase in the amount of echo canceller computation. When the microphones are arranged in a straight line at equal intervals, the spatial shape of the echo signal at each frequency becomes smoother as the frequency is lower. This is because there is a relationship of wavelength λ × frequency f = sound speed c (constant) between the wavelength λ of the sound wave and the frequency f, and the wavelength λ becomes longer as the frequency f is lower. Furthermore, according to the Nyquist theorem, the sampling interval can be much larger than the microphone interval at low frequencies. This means that the spatial sampling interval can be increased at low frequencies, that is, sampling can be thinned out.

ここではＷＦＳ収音再生系の一例として、直線状スピーカアレーと直線状マイクロホンアレーをサンプリング周波数１６ｋＨｚで動作させることを考える。ナイキスト定理より、マイクロホン間隔すなわち空間波のサンプリング間隔はλ／２以下である必要がある。帯域上限８ｋＨｚの波をサンプリングするには、音速を３３０［ｍ／ｓ］として、マイクロホン間隔を２ｃｍにする必要がある。しかし周波数が低くなるほど波形は滑らかになる。図４にこの状況を示す。３３本のマイクロホンで収音したエコー信号１フレーム分を高速フーリエ変換により周波数分解して得られた、４ｋＨｚ，２ｋＨｚ，１ｋＨｚ，５００Ｈｚ，２５０Ｈｚ，１２５Ｈｚのエコー信号の空間形状をそれぞれプロットしている。なお、実線は実数成分を表し、点線は虚数成分を表している。滑らかな波形に対して空間サンプリングを間引きすることができる。すなわち低い周波数については、全マイクロホンではなく一部のマイクロホンでエコー消去処理を行い、残りのマイクロホンについては適応フィルタを用いずに一部のマイクロホンのエコー消去済み信号から、空間補間によって全マイクロホンのエコー消去済み信号を求めることができる。 Here, as an example of the WFS sound collection / reproduction system, consider that a linear speaker array and a linear microphone array are operated at a sampling frequency of 16 kHz. According to the Nyquist theorem, the microphone interval, that is, the sampling interval of the spatial wave needs to be λ / 2 or less. In order to sample a wave having a band upper limit of 8 kHz, it is necessary to set the sound speed to 330 [m / s] and the microphone interval to 2 cm. However, the lower the frequency, the smoother the waveform. FIG. 4 shows this situation. The spatial shapes of echo signals of 4 kHz, 2 kHz, 1 kHz, 500 Hz, 250 Hz, and 125 Hz obtained by frequency-resolving one frame of echo signals collected by 33 microphones by fast Fourier transform are plotted. The solid line represents the real number component, and the dotted line represents the imaginary number component. Spatial sampling can be thinned out for smooth waveforms. That is, for low frequencies, echo cancellation processing is performed with some microphones instead of all microphones, and echoes of all microphones are echoed from the echo canceled signals of some microphones without using an adaptive filter for the remaining microphones by spatial interpolation. An erased signal can be determined.

図５に、マイクロホン数９、周波数帯域分割数８の場合について、周波数ｆごとの送話信号の間引きパターンＳｅｔＮ（ｆ）を示す。横軸はマイク位置であり、縦軸は周波数である。ｆの値が大きいほど周波数が高いことを表している。ｆ＝３，４では、空間サンプリングが１／２に間引かれている。ｆ＝１，２では空間サンプリングが１／４に間引かれている。 FIG. 5 shows a thinning pattern SetN (f) of a transmission signal for each frequency f when the number of microphones is 9 and the number of frequency band divisions is 8. The horizontal axis is the microphone position, and the vertical axis is the frequency. The larger the value of f, the higher the frequency. At f = 3, 4, the spatial sampling is decimated by half. At f = 1, 2, spatial sampling is thinned out to 1/4.

送話信号の間引きパターンＳｅｔＮ（ｆ）は、周波数ｆの波長λ（ｆ）＝ｃ（音速）／ｆから求められる。マイクロホンが等間隔に並んでいる場合、Ｓ_Ｎはマイクロホン間隔を表すとして、Ｓ_Ｎ×Ｑ≦λ（ｆ）／２を満たす最大の自然数Ｑを求め、間引き間隔Ｑ（ｆ）をＱ以下に設定する。例えば、マイクロホン数Ｎ＝９、間引き間隔Ｑ（ｆ）＝４の場合、ＳｅｔＮ（ｆ）＝｛１，５，９｝と設定することができる。また、間引き間隔Ｑ（ｆ）＝２の場合、ＳｅｔＮ（ｆ）＝｛１，３，５，７，９｝と設定することができる。 The thinned pattern SetN (f) of the transmission signal is obtained from the wavelength λ (f) = c (sound speed) / f of the frequency f. When microphones are arranged at equal intervals, S _N represents the microphone interval, and the maximum natural number Q satisfying S _N × Q ≦ λ (f) / 2 is obtained, and the thinning interval Q (f) is set to be equal to or less than Q. To do. For example, when the number of microphones N = 9 and the thinning interval Q (f) = 4, SetN (f) = {1, 5, 9} can be set. Further, when the thinning interval Q (f) = 2, SetN (f) = {1, 3, 5, 7, 9} can be set.

上記の原理に基づいて、低い周波数でエコーレプリカの生成処理とフィルタ係数の更新処理を間引くことで演算量を下げることが可能になる。 Based on the above principle, it is possible to reduce the amount of calculation by thinning out the echo replica generation process and the filter coefficient update process at a low frequency.

図６、図７を参照して、本発明の実施例１に係るマルチチャネルエコー消去装置２０の動作を詳細に説明する。図６は本発明の実施例１に係るマルチチャネルエコー消去装置２０の構成を示すブロック図である。図７は本発明の実施例１に係るマルチチャネルエコー消去装置２０の動作を示すフローチャートである。本実施例では、周波数領域適応アルゴリズムとして、「E. Moulines, O. A. Amrane, and Y. Grenier, “The Generalized Multidelay Adaptive Filter: Structure and Convergence Analysis”, IEEE Trans. on SP, vol. 43, no. 1 (1995).」に記載された方法を用いた場合について説明する。以下では、適応フィルタは時間方向Ｉ個（Ｉは１以上）に分割されているものとして説明する。 With reference to FIGS. 6 and 7, the operation of the multi-channel echo canceling apparatus 20 according to the first embodiment of the present invention will be described in detail. FIG. 6 is a block diagram showing the configuration of the multi-channel echo cancellation apparatus 20 according to the first embodiment of the present invention. FIG. 7 is a flowchart showing the operation of the multi-channel echo cancellation apparatus 20 according to the first embodiment of the present invention. In this embodiment, the frequency domain adaptation algorithm is “E. Moulines, OA Amrane, and Y. Grenier,“ The Generalized Multidelay Adaptive Filter: Structure and Convergence Analysis ”, IEEE Trans. On SP, vol. 43, no. 1 (1995). "Will be described. In the following description, it is assumed that the adaptive filter is divided into I pieces (I is 1 or more) in the time direction.

以下、実際に行われる手続きの順に説明してゆく。本実施例のマルチチャネルエコー消去装置２０は、Ｍ（１≦ｍ≦Ｍ）個のスピーカ２_１〜２_Ｍと、Ｎ（３≦ｎ≦Ｎ）個のマイクロホン３_１〜３_Ｎと、受話信号ベクトル変換部１１０と、Ｎ個のエコーレプリカ生成部２２０_１〜２２０_Ｎと、エコーレプリカ空間補間部３００と、逆ＦＦＴ部４００と、エコー消去部５１０と、ＦＦＴ部６００を備える。 In the following, description will be made in the order of procedures actually performed. The multi-channel echo canceling apparatus 20 of this embodiment includes M (1 ≦ m ≦ M) speakers 2 ₁ to 2 _M , N (3 ≦ n ≦ N) microphones 3 _{1 to} 3 _N, and a received signal. A vector conversion unit 110, N echo replica generation units 220 _{1 to} 220 _N , an echo replica space interpolation unit 300, an inverse FFT unit 400, an echo cancellation unit 510, and an FFT unit 600 are provided.

受話信号ベクトル変換部１１０は、Ｍチャネルの受話信号ｘ_ｍ（ｋ）をＬサンプルごとにブロック化し、１フレーム＝２Ｌサンプルとして、１フレーム分を高速フーリエ変換により周波数領域に変換し、上記式（１）のように受話信号Ｘ_ｍ（ｆ，ｊ）を生成する（Ｓ１１０）。 The received signal vector conversion unit 110 blocks the M channel received signal x _m (k) for each L samples, converts 1 frame = 2L samples, and converts one frame into the frequency domain by fast Fourier transform, The received signal X _m (f, j) is generated as in 1) (S110).

エコーレプリカ生成部２２０_ｎは、周波数ｆ≦Ｌ＋１について、周波数ｆごとに、ＳｅｔＮ（ｆ）に含まれるｎについて、式（８）のように、受話信号Ｘ_ｍ（ｆ，ｊ）とフィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を掛けることで、受話信号Ｘ_ｍ（ｆ，ｊ）をフィルタ処理し、Ｍチャネル分加算する。これにより空間的に間引きされたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）を求める（Ｓ２２０）。 For each frequency f, the echo replica generation unit 220 _n receives the received signal X _m (f, j) and the filter coefficient H for n included in SetN (f) for each frequency f as shown in Equation (8). By multiplying _{m, n, i} (f, j), the received signal X _m (f, j) is filtered and added for M channels. Thereby, the echo replica Y ^ _{SetN (f)} (f, j) spatially thinned is obtained (S220).

エコーレプリカ空間補間部３００は、周波数ｆ≦Ｌ＋１について、周波数ｆごとに、エコーレプリカ生成部２１０_ｎの生成する空間的に間引きされたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）を用いて、空間補間によりエコーレプリカＹ＾_ｎ（ｆ，ｊ）を求める（Ｓ３００）。より具体的には、間引きによりスキップされたチャネルｎ、すなわちＳｅｔＮ（ｆ）に含まれないｎについて、ＳｅｔＮ（ｆ）に含まれるチャネルのエコー消去済み信号を空間補間することでエコーレプリカＹ＾_ｎ（ｆ，ｊ）を求める。マイクロホンが等間隔で直線状に配置されているときは、空間的に間引きされたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）とｓｉｎｃ関数を使って、式（９）のように空間補間が可能である。 The echo replica spatial interpolation unit 300 uses, for each frequency f, the spatially thinned echo replica Y ^ _{SetN (f)} (f, j) generated by the echo replica generation unit 210 _n for the frequency f ≦ L + 1. Then, the echo replica Y ^ _n (f, j) is obtained by spatial interpolation (S300). More specifically, the echo replica Y ^ _{n is} obtained by spatially interpolating the echo canceled signal of the channel included in SetN (f) for the channel n skipped by decimation, that is, n not included in SetN (f). Find (f, j). When the microphones are arranged in a straight line at equal intervals, spatial interpolation is performed as shown in Equation (9) using the spatially thinned echo replica Y ^ _{SetN (f)} (f, j) and the sinc function. Is possible.

なお、実際の計算では、ｓｉｎｃ関数を有限長で打ち切る必要がある。ｓｉｎｃ関数を打ち切る範囲としては、間引き間隔Ｑ（ｆ）に基づいて、式（１０）もしくは式（１１）のように設定することができる。この場合には、ｓｉｎｃ関数と乗算するエコーレプリカは４点もしくは６点で済む。 In actual calculation, the sinc function needs to be cut off with a finite length. The range in which the sinc function is terminated can be set as in Expression (10) or Expression (11) based on the thinning interval Q (f). In this case, only 4 or 6 echo replicas are required for multiplication with the sinc function.

続いて、エコーレプリカ空間補間部３００は、周波数ｆ＞Ｌ＋１について、周波数ｆごとに、実数信号の高速フーリエ変換結果に関する対称性を用いて、式（１２）のようにエコーレプリカＹ＾_ｎ（ｆ，ｊ）を求める（Ｓ３００）。ここで、ｃｏｎｊは複素共役を取ることを表している。 Subsequently, the echo replica space interpolation unit 300 uses the symmetry regarding the fast Fourier transform result of the real signal for each frequency f with respect to the frequency f> L + 1, and the echo replica Y ^ _n (f , J) is obtained (S300). Here, conj represents taking a complex conjugate.

逆ＦＦＴ部４００は、エコーレプリカＹ＾_ｎ（ｆ，ｊ）を逆高速フーリエ変換により時間領域に変換し、上記式（３）のようにエコーレプリカｙ＾_ｎ（ｊ）を求める（Ｓ４００）。 The inverse FFT unit 400 converts the echo replica Y ^ _n (f, j) into the time domain by inverse fast Fourier transform, and obtains the echo replica y ^ _n (j) as in the above equation (3) (S400).

ＦＦＴ部６００は、誤差信号ｅ_ｎ（ｊ）を高速フーリエ変換により周波数領域に変換し、上記式（４）のように誤差信号Ｅ_ｎ（ｆ，ｊ）を求める（Ｓ６００）。受信信号ｘ_ｍ（ｋ）をフレーム化するときフレーム分割数ＤをＤ≧２に設定した場合には、フレーム番号ｊで求めた誤差信号Ｅ_ｎ（ｋ，ｊ）と、一つ前のフレーム番号ｊ−１で求めた誤差信号Ｅ_ｎ（ｋ，ｊ−１）を窓かけ処理を経て合成して出力する。図９はＤ＝２の場合の合成処理を示す図である。具体的には、ｊフレーム目で求めた誤差信号ｅ_ｎ（ｋ_０＋ｔ，ｊ）を式（１３）のように表し、Ｗ_Ｈは長さ２Ｌ／Ｄのハニング窓を表すとして、合成後の誤差信号ｅ_ｎ’（ｋ_０＋ｔ）は、式（１４）のように表すことができる。 The FFT unit 600 converts the error signal e _n (j) into the frequency domain by fast Fourier transform, and obtains the error signal E _n (f, j) as in the above equation (4) (S600). When the frame division number D is set to D ≧ 2 when the received signal x _m (k) is framed, the error signal E _n (k, j) obtained from the frame number j and the previous frame number The error signal E _n (k, j−1) obtained at j−1 is synthesized through a windowing process and output. FIG. 9 is a diagram showing a composition process when D = 2. Specifically, represents j th frame by obtained error signal _{_{e n (k 0 + t,}} j) a by the equation (13), as _{W H} represents a Hanning window of length 2L / D, after synthesis The error signal e _n ′ (k ₀ + t) can be expressed as in Expression (14).

エコーレプリカ生成部２２０_ｎは、ＳｅｔＮ（ｆ）に含まれるｎについてのみ、誤差信号Ｅ_ｎ（ｆ，ｊ）と受話信号Ｘ_ｍ（ｆ，ｊ）から、式（１５）のように、フィルタ係数の修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を求める。 The echo replica generation unit 220 _n uses the error _coefficient E _n (f, j) and the received signal X _m (f, j) only for n included in SetN (f), as shown in Expression (15). Correction amount dH _{m, n, i} (f, j) is obtained.

次に、ＳｅｔＮ（ｆ）に含まれるｎについてのみ、各チャネルのフィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を、式（１６）のように更新する（Ｓ７１０）。 Next, for only n included in SetN (f), the filter coefficient H _{m, n, i} (f, j) of each channel is updated as shown in Expression (16) (S710).

ここで、ｐ（ｆ，ｊ）は、周波数成分ごとに、Ｎチャネル分の送話信号パワーの総和を、式（１７）のように計算することで求めたものであり、修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を補正している。 Here, p (f, j) is obtained by calculating the sum of transmission signal powers for N channels for each frequency component as shown in Expression (17), and the correction amount dH _{m, n, i} (f, j) are corrected.

ここで、μは０〜１の値をとるステップサイズであり、δは分母が０になることを防止するための微小な正定数であり、βは０〜１の値をとるパワー計算で短時間平均をとるための平滑化定数である。 Here, μ is a step size that takes a value from 0 to 1, δ is a small positive constant for preventing the denominator from becoming 0, and β is a short power calculation that takes a value from 0 to 1. This is a smoothing constant for taking a time average.

上記から明らかなように、本実施例では、ＳｅｔＮ（ｆ）に含まれないｎについては、エコーレプリカｙ＾_ｎ（ｊ）も修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）もフィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）も計算されないため、マルチチャネルエコー消去処理全体での演算量を低減することができる。 As is apparent from the above, in this embodiment, for n not included in SetN (f), the echo replica y ^ _n (j) and the correction amount dH _{m, n, i} (f, j) are both filter coefficients H. _{Since m, n, i} (f, j) are not calculated, the amount of calculation in the entire multi-channel echo cancellation process can be reduced.

[実施例１の実験結果]
本実施例の効果を確認するために、シミュレーションを行った。残響時間２００ｍｓの部屋で、直線状スピーカアレー（３３素子、間隔２ｃｍ）と直線状マイクロホンアレー（３３素子、間隔２ｃｍ）を２ｍ離して平行に配置し、スピーカ・マイク間の全エコー経路インパルス応答をシミュレータにより生成した。サンプリング周波数は１６ｋＨｚに設定した。空間補間でエコーレプリカを求める際に、間引き間隔Ｑ（ｆ）に基づいて上記式（１０）を使用した。 [Experimental result of Example 1]
In order to confirm the effect of this example, a simulation was performed. In a room with a reverberation time of 200 ms, a linear speaker array (33 elements, spacing 2 cm) and a linear microphone array (33 elements, spacing 2 cm) are placed 2 m apart in parallel, and the total echo path impulse response between the speaker and microphone is Generated by simulator. The sampling frequency was set to 16 kHz. When obtaining an echo replica by spatial interpolation, the above equation (10) was used based on the thinning interval Q (f).

受話信号は、別途ピンクノイズを音源として、３３マイクロホンによる収音を模擬して生成した。図１０に本実施例の構成によるエコー消去処理結果を示す。図１０では３３チャネル中の第１，５，１１，１７，１９チャネルについて、送話信号と誤差信号のレベルを、受話信号を実線で、誤差信号を破線でプロットしている。実線と破線の差だけエコーが消去されていることを表している。いずれもエコーを良好に消去していることが分かる。 The reception signal was generated by simulating sound collection by 33 microphones using pink noise as a sound source separately. FIG. 10 shows the result of echo cancellation processing according to the configuration of this embodiment. In FIG. 10, for the first, fifth, eleventh, seventeenth, and nineteenth channels among the 33 channels, the levels of the transmission signal and the error signal are plotted with a solid line for the reception signal and a broken line with the error signal. This shows that the echo is erased by the difference between the solid line and the broken line. It can be seen that both echoes are erased well.

[変形例]
本実施例では、周波数領域のフィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）が、どの周波数ｆでもＩタップを持つ場合を説明している。本実施例の変形例として、フィルタ係数の高域部分でタップ数をＩタップより減らすことが考えられる。例えば、人の声の成分が相対的に小さい４ｋＨｚを超える周波数ではタップ数を１とすることなどが考えられる。これにより、全体の演算量をさらに下げることが可能になる。 [Modification]
In the present embodiment, the case where the frequency domain filter coefficient H _{m, n, i} (f, j) has I taps at any frequency f is described. As a modification of the present embodiment, it is conceivable to reduce the number of taps from the I tap in the high frequency part of the filter coefficient. For example, it is conceivable to set the number of taps to 1 at a frequency exceeding 4 kHz where the human voice component is relatively small. Thereby, it is possible to further reduce the total calculation amount.

演算量をさらに下げるために、適応フィルタに入力する受話信号についても低い周波数で間引きを行ってもよい。周波数ｆごとの受話信号の間引きパターンＳｅｔＭ（ｆ）は、ＳｅｔＮ（ｆ）と同様に、周波数ｆの波長λ（ｆ）＝ｃ／ｆから求められる。スピーカが等間隔に並んでいる場合、Ｓ_Ｍはスピーカ間隔を表すとして、Ｓ_Ｍ×Ｑ≦λ（ｆ）／２を満たす最大の自然数Ｑを求め、間引き間隔Ｑ_Ｍ（ｆ）をＱ以下に設定する。 In order to further reduce the amount of calculation, the received signal input to the adaptive filter may be thinned out at a low frequency. The reception signal thinning pattern SetM (f) for each frequency f is obtained from the wavelength λ (f) = c / f of the frequency f, similarly to SetN (f). When the speakers are arranged at equal intervals, S _M represents the speaker interval, the maximum natural number Q satisfying S _M × Q ≦ λ (f) / 2 is obtained, and the thinning interval Q _M (f) is set to Q or less. Set.

図１１、図１２を参照して、本発明の実施例２に係るマルチチャネルエコー消去装置３０の動作を詳細に説明する。図１１は本発明の実施例２に係るマルチチャネルエコー消去装置３０の構成を示すブロック図である。図１２は本発明の実施例２に係るマルチチャネルエコー消去装置３０の動作を示すフローチャートである。実施例２のマルチチャネルエコー消去装置３０は、実施例１のマルチチャネルエコー消去装置２０と比較して、エコーレプリカ生成部２２０_ｎの替わりに、エコーレプリカ生成部２３０_ｎを備える点が相違する。 With reference to FIGS. 11 and 12, the operation of the multi-channel echo canceling apparatus 30 according to the second embodiment of the present invention will be described in detail. FIG. 11 is a block diagram showing a configuration of a multi-channel echo cancellation apparatus 30 according to the second embodiment of the present invention. FIG. 12 is a flowchart showing the operation of the multi-channel echo cancellation apparatus 30 according to the second embodiment of the present invention. The multi-channel echo canceller 30 according to the second embodiment is different from the multi-channel echo canceller 20 according to the first embodiment in that an echo replica generator 230 _n is provided instead of the echo replica generator 220 _n .

エコーレプリカ生成部２３０_ｎは、周波数ｆ≦Ｌ＋１について、周波数ｆごとに、ｍがＳｅｔＭ（ｆ）に含まれる受話信号Ｘ_ｍ（ｆ，ｊ）を抽出する（Ｓ２３１）。続いて、エコーレプリカ生成部２３０_ｎは、周波数ｆ≦Ｌ＋１について、周波数ｆごとに、ＳｅｔＮ（ｆ）に含まれるｎについて、式（１８）のように、受話信号Ｘ_ｍ（ｆ，ｊ）とフィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を掛けることで、受話信号Ｘ_ｍ（ｆ，ｊ）をフィルタ処理し、Ｍチャネル分加算する。これにより空間的に間引きされたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）を求める（Ｓ２３２）。 The echo replica generation unit 230 _n extracts the received signal X _m (f, j) in which _m is included in SetM (f) for each frequency f with respect to the frequency f ≦ L + 1 (S231). Subsequently, the echo replica generation unit 230 _n sets the received signal X _m (f, j) and _n for frequency f ≦ L + 1 to n included in SetN (f) for each frequency f as shown in Expression (18). By multiplying the filter coefficient H _{m, n, i} (f, j), the received signal X _m (f, j) is filtered and added for M channels. As a result, the echo replica Y ^ _{SetN (f)} (f, j) spatially thinned is obtained (S232).

また、エコーレプリカ生成部２３０_ｎは、ｍがＳｅｔＭ（ｆ）に含まれ、かつｎがＳｅｔＮ（ｆ）に含まれる場合にのみ、誤差信号Ｅ_ｎ（ｆ，ｊ）と受話信号Ｘ_ｍ（ｆ，ｊ）から、上記式（１５）のように、フィルタ係数の修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を求める。次に、ｍがＳｅｔＭ（ｆ）に含まれ、かつｎがＳｅｔＮ（ｆ）に含まれる場合にのみ、各チャネルのフィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を、上記式（１６）のように更新する（Ｓ７２０）。ただし、修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を補正するｐ（ｆ，ｊ）は、式（１９）のように計算して求める。 Further, the echo replica generation unit 230 _n only includes the error signal E _n (f, j) and the received signal X _m (f) when m is included in SetM (f) and n is included in SetN (f). , J), the correction amount dH _{m, n, i} (f, j) of the filter coefficient is obtained as in the above equation (15). Next, only when m is included in SetM (f) and n is included in SetN (f), the filter coefficient H _{m, n, i} (f, j) of each channel is _expressed by the above equation (16). (S720). However, p (f, j) for correcting the correction amount dH _{m, n, i} (f, j) is obtained by calculation as shown in Expression (19).

上記から明らかなように、本実施例では、ｍ∈ＳｅｔＭ（ｆ）かつｎ∈ＳｅｔＮ（ｆ）のｍ，ｎについてのみ、エコーレプリカＹ＾_ｎ（ｆ，ｊ）と修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）が計算され、フィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）が更新される。このため、上記以外のｍ，ｎについては、エコーレプリカＹ＾_ｎ（ｆ，ｊ）も修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）もフィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）も計算されないため、演算量をさらに低減することができる。 As is clear from the above, in this embodiment, the echo replica Y ^ _n (f, j) and the correction amount dH _{m, n, m} only for m and n of m∈SetM (f) and n∈SetN (f) _{. i} (f, j) is calculated and the filter coefficients _{Hm, n, i} (f, j) are updated. Therefore, for m and n other than the above, the echo replica Y ^ _n (f, j) and the correction amount dH _{m, n, i} (f, j) are both filter coefficients H _{m, n, i} (f, j). Is not calculated, the amount of calculation can be further reduced.

なお、本実施例では、受話側の信号が間引かれているため、フィルタ係数と特定スピーカ・特定マイク間のエコー経路特性は１対1に対応しなくなる。 In this embodiment, since the signal on the receiving side is thinned out, the filter coefficient and the echo path characteristic between the specific speaker and the specific microphone do not correspond one-to-one.

[実施例２の実験結果]
本実施例の効果を確認するために、実施例１と同一の設定でシミュレーションを行った。図１３に本実施例の構成によるエコー消去処理結果を示す。図１３では３３チャネル中の第１、５、１１、１７、１９チャネルについて、送話信号と誤差信号のレベルを、受話信号を実線で、誤差信号を破線でプロットしている。実線と破線の差だけエコーが消去されていることを表している。いずれもエコーを良好に消去していることが分かる。 [Experimental result of Example 2]
In order to confirm the effect of this example, a simulation was performed with the same settings as in Example 1. FIG. 13 shows the result of echo cancellation processing according to the configuration of this embodiment. In FIG. 13, for the first, fifth, eleventh, seventeenth and nineteenth channels among the 33 channels, the levels of the transmission signal and error signal are plotted with the solid line for the reception signal and the broken line with the error signal. This shows that the echo is erased by the difference between the solid line and the broken line. It can be seen that both echoes are erased well.

図１４、図１５、図１６を参照して、本発明の実施例３に係るマルチチャネルエコー消去装置４０の動作を詳細に説明する。図１４は本発明の実施例３に係るマルチチャネルエコー消去装置４０の構成を示すブロック図である。図１６は本発明の実施例３に係るマルチチャネルエコー消去装置４０の動作を示すフローチャートである。実施例３のマルチチャネルエコー消去装置４０は、実施例１のマルチチャネルエコー消去装置２０と比較して、さらに帯域分割音声スイッチ部８００を備える点が相違する。図１５は帯域分割音声スイッチ部８００の構成を示すブロック図である。受話信号および送話信号を帯域分割音声スイッチ部８００で低域成分と高域成分に分け、高域成分のエコーを送受話状態に応じて減衰させることで制御し、低域成分のエコーを適応フィルタで消去して制御する。 The operation of the multichannel echo canceller 40 according to the third embodiment of the present invention will be described in detail with reference to FIGS. FIG. 14 is a block diagram showing a configuration of a multi-channel echo canceling apparatus 40 according to Embodiment 3 of the present invention. FIG. 16 is a flowchart showing the operation of the multi-channel echo canceling apparatus 40 according to the third embodiment of the present invention. The multi-channel echo canceling apparatus 40 according to the third embodiment is different from the multi-channel echo canceling apparatus 20 according to the first embodiment in that it further includes a band division voice switch unit 800. FIG. 15 is a block diagram showing a configuration of the band division voice switch unit 800. The received signal and the transmitted signal are divided into a low-frequency component and a high-frequency component by the band division voice switch unit 800, and the echo of the high-frequency component is controlled in accordance with the transmission / reception state to adapt the low-frequency component echo. Control by erasing with a filter.

帯域分割音声スイッチ部８００は、送受話判定部８１０と、Ｍ個の受話側高域減衰部８２０_１〜８２０_Ｍと、Ｎ個の受話側高域減衰部８３０_１〜８３０_Ｎを備える。受話側高域減衰部８２０_ｍは、ハイパスフィルタ（以下、ＨＰＦと略す。）８２１０と、ローパスフィルタ（以下、ＬＰＦと略す。）８２２０と、信号減衰器８２３０と、信号加算器８２４０を備える。送話側高域減衰部８３０_ｎは、ＨＰＦ８３１０と、ＬＰＦ８３２０と、信号減衰器８３３０と、信号加算器８３４０を備える。 The band division voice switch unit 800 includes a transmission / reception determination unit 810, M reception side high frequency attenuation units 820 _{1 to} 820 _M , and N reception side high frequency attenuation units 830 _{1 to} 830 _N. The reception-side high-frequency attenuation unit 820 _m includes a high-pass filter (hereinafter abbreviated as HPF) 8210, a low-pass filter (hereinafter abbreviated as LPF) 8220, a signal attenuator 8230, and a signal adder 8240. The transmission-side high-frequency attenuation unit 830 _n includes an HPF 8310, an LPF 8320, a signal attenuator 8330, and a signal adder 8340.

受話側高域減衰部８２０_ｍは、受話信号にＨＰＦ８２１０とＬＰＦ８２２０を適用し、低域信号と高域信号に分割する。信号減衰器８２３０は、ＨＰＦ８２１０の出力する高域信号を指定分減衰する。信号加算器８２４０は、信号減衰器８２３０の出力する信号とＬＰＦ８２２０の出力する信号を加算する。送話側高域減衰部８２０_ｎは、送話信号にＨＰＦ８３１０とＬＰＦ８３２０を適用し、低域信号と高域信号に分割する。信号減衰器８３３０は、ＨＰＦ８３１０の出力する高域信号を指定分減衰する。信号加算器８３４０は、信号減衰器８３３０の出力する信号と、ＬＰＦ８３２０の出力する信号を加算する。 The reception-side high-frequency attenuation unit 820 _m applies the HPF 8210 and the LPF 8220 to the reception signal and divides them into a low-frequency signal and a high-frequency signal. The signal attenuator 8230 attenuates the high frequency signal output from the HPF 8210 by a specified amount. The signal adder 8240 adds the signal output from the signal attenuator 8230 and the signal output from the LPF 8220. The transmission side high frequency attenuating unit 820 _n applies the HPF 8310 and the LPF 8320 to the transmission signal and divides the signal into a low frequency signal and a high frequency signal. The signal attenuator 8330 attenuates the high frequency signal output from the HPF 8310 by a specified amount. The signal adder 8340 adds the signal output from the signal attenuator 8330 and the signal output from the LPF 8320.

送受話判定部８１０は、ＬＰＦ８２２０の出力するＭチャネルの信号とＬＰＦ８３２０の出力するＮチャネルの信号を用いて、送受話判定を行う（Ｓ８１０）。受話状態と判定したときには受話側高域信号のみを減衰させる（Ｓ８２０）。送話状態と判定したときには送話側高域信号のみを減衰させる（Ｓ８３０）。信号減衰器８２２０および信号減衰器８３２０に指定する減衰量は、残留エコーが気にならないレベルになるように、３〜４０ｄＢの範囲で設定する。 The transmission / reception determination unit 810 performs transmission / reception determination using the M channel signal output from the LPF 8220 and the N channel signal output from the LPF 8320 (S810). When it is determined that the call is in the receiving state, only the receiving high frequency signal is attenuated (S820). When it is determined that the transmission state is established, only the high frequency signal on the transmission side is attenuated (S830). The amount of attenuation specified for the signal attenuator 8220 and the signal attenuator 8320 is set in the range of 3 to 40 dB so that the residual echo is at a level that does not matter.

本実施例によると、エコーレプリカ生成部２２０_ｍが高域成分に対応するフィルタ係数Ｈ_ｍ，ｎ（ｆ，ｊ）をエコーレプリカ生成に使用しないため、全体での演算量を低減することができる。ただし、フレーム間で異音が生じないことを保証するために受話信号ベクトル変換部１１０が受話信号をフレーム化する際に、フレーム分割数ＤをＤ≧２に設定する必要がある。 According to the present embodiment, since the echo replica generation unit 220 _m does not use the filter coefficient H _{m, n} (f, j) corresponding to the high-frequency component for echo replica generation, the total amount of computation can be reduced. . However, when the received signal vector conversion unit 110 frames the received signal in order to ensure that no abnormal sound is generated between frames, it is necessary to set the frame division number D to D ≧ 2.

＜プログラム、記録媒体＞
上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 <Program, recording medium>
The various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Needless to say, other modifications are possible without departing from the spirit of the present invention.

また、上述の構成をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。 Further, when the above-described configuration is realized by a computer, processing contents of functions that each device should have are described by a program. The processing functions are realized on the computer by executing the program on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

この発明によるマルチチャネルエコー消去装置は、複数チャネルの再生系と複数チャネルの収音系からなるマルチチャネル通信会議システムに利用することができる。 The multi-channel echo canceling apparatus according to the present invention can be used in a multi-channel communication conference system including a multi-channel reproduction system and a multi-channel sound pickup system.

１０，２０，３０，４０マルチチャネルエコー消去装置
２スピーカ
３マイクロホン
１００，１１０受話信号ベクトル変換部
２００，２１０，２２０，２３０エコーレプリカ生成部
３００エコーレプリカ空間補間部
４００逆ＦＦＴ部
５００，５１０エコー消去部
６００ＦＦＴ部
８００帯域分割音声スイッチ部
８１０送受話判定部
８２０受話側高域減衰部
８３０送話側高域減衰部
８２１０，８３１０ハイパスフィルタ
８２２０，８３２０ローパスフィルタ
８２３０，８３３０信号減衰器
８２４０，８３４０信号加算器 10, 20, 30, 40 Multi-channel echo canceller 2 Speaker 3 Microphone 100, 110 Received signal vector conversion unit 200, 210, 220, 230 Echo replica generation unit 300 Echo replica spatial interpolation unit 400 Inverse FFT unit 500, 510 Echo cancellation Unit 600 FFT unit 800 band division voice switch unit 810 transmission / reception determination unit 820 reception side high frequency attenuation unit 830 transmission side high frequency attenuation unit 8210, 8310 high pass filter 8220, 8320 low pass filter 8230, 8330 signal attenuator 8240, 8340 signal Adder

Claims

ｋは時間を表し、ｆは周波数を表し、ｊはフレーム番号を表し、ｉは適応フィルタのタップ番号を表し、ｍはスピーカの番号を表し、ｎはマイクロホンの番号を表し、ＳｅｔＮ（ｆ）は直線状に等間隔に配置されたＮ（３≦ｎ≦Ｎ）個のマイクロホンから周波数ｆが低いほど空間サンプリング間隔が広くなるように選択されるマイクロホンの番号の集合を表すとして、
受話信号ベクトル変換部が、Ｍ（１≦ｍ≦Ｍ）個のスピーカから出力されるＭチャネルの受話信号ｘ_ｍ（ｋ）を、チャネルｍごとに、周波数領域に変換して、受話信号Ｘ_ｍ（ｆ，ｊ）を生成する受話信号ベクトル変換ステップと、
エコーレプリカ生成部が、前記受話信号Ｘ_ｍ（ｆ，ｊ）から、周波数ｆごとに、前記ＳｅｔＮ（ｆ）に含まれるマイクロホンｎについて、タップ数がＩ（１≦ｉ≦Ｉ、Ｉは１以上）のフィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を用いて、空間的に間引かれたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）を生成するエコーレプリカ生成ステップと、
エコーレプリカ空間補間部が、前記空間的に間引かれたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）から、周波数ｆごとに、空間補間を行い、エコーレプリカＹ＾_ｎ（ｆ，ｊ）を生成するエコーレプリカ空間補間ステップと、
逆ＦＦＴ部が、前記エコーレプリカＹ＾_ｎ（ｆ，ｊ）を、時間領域に変換して、エコーレプリカｙ＾_ｎ（ｊ）を生成する逆ＦＦＴステップと、
エコー消去部が、直線上に等間隔に配置されたＮ個のマイクロホンから収音されるＮチャネルの送話信号ｙ_ｎ（ｊ）と前記エコーレプリカｙ＾_ｎ（ｊ）から、誤差信号ｅ_ｎ（ｋ）を生成するエコー消去ステップと、
ＦＦＴ部が、前記誤差信号ｅ_ｎ（ｋ）を、周波数領域に変換して、誤差信号Ｅ_ｎ（ｆ，ｊ）を生成するＦＦＴステップと、
前記エコーレプリカ生成部が、前記誤差信号Ｅ_ｎ（ｆ，ｊ）と前記受話信号Ｘ_ｍ（ｆ，ｊ）から、周波数ｆごとに、前記ＳｅｔＮ（ｆ）に含まれるマイクロホンｎについて、修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を求め、当該修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を用いて、前記フィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を更新するフィルタ係数更新ステップと、
を含むことを特徴とするマルチチャネルエコー消去方法。 k represents time, f represents frequency, j represents frame number, i represents adaptive filter tap number, m represents speaker number, n represents microphone number, and SetN (f) is Assuming a set of microphone numbers selected from N (3 ≦ n ≦ N) microphones arranged at equal intervals in a straight line so that the spatial sampling interval becomes wider as the frequency f is lower ,
The received signal vector conversion unit converts the M channel received signal x _m (k) output from M (1 ≦ m ≦ M) speakers into the frequency domain for each channel m, and receives the received signal X _m. A received signal vector conversion step of generating (f, j);
The echo replica generation unit determines that the number of taps is I (1 ≦ i ≦ I, I is 1 or more) for the microphone n included in the SetN (f) for each frequency f from the received signal X _m (f, j). Echo replica generation step for generating spatially thinned echo replica Y ^ _{SetN (f)} (f, j) using filter coefficients H _{m, n, i} (f, j) of
The echo replica spatial interpolation unit performs spatial interpolation for each frequency f from the spatially thinned echo replica Y ^ _{SetN (f)} (f, j), and echo replica Y ^ _n (f, j). An echo replica spatial interpolation step to generate
An inverse FFT unit that converts the echo replica Y ^ _n (f, j) into the time domain to generate an echo replica y ^ _n (j);
Echo cancellation part, from the a transmission signal y _n of the N channels picked up from the N microphones that are equally spaced on a straight line _(j) echo replica y ^ n _(j), the error signal e _n An echo cancellation step to generate (k);
FFT section, the error signal _e n (k), and FFT steps are transformed into the frequency domain to generate an error signal _E n (f, j),
The echo replica generator generates a correction amount dH for the microphone n included in the SetN (f) for each frequency f from the error signal E _n (f, j) and the received signal X _m (f, j). _{m, n, i} (f, j) is obtained, and the filter coefficient H _{m, n, i} (f, j) is updated using the correction amount dH _{m, n, i} (f, j). An update step;
A multi-channel echo cancellation method comprising:

ｋは時間を表し、ｆは周波数を表し、ｊはフレーム番号を表し、ｉは適応フィルタのタップ番号を表し、ｍはスピーカの番号を表し、ＳｅｔＭ（ｆ）は直線状に等間隔に配置されたＭ（３≦ｍ≦Ｍ）個のスピーカから周波数ｆが低いほど空間サンプリング間隔が広くなるように選択されるスピーカの番号の集合を表し、ｎはマイクロホンの番号を表し、ＳｅｔＮ（ｆ）は直線状に等間隔に配置されたＮ（３≦ｎ≦Ｎ）個のマイクロホンから周波数ｆが低いほど空間サンプリング間隔が広くなるように選択されるマイクロホンの番号の集合を表すとして、
受話信号ベクトル変換部が、直線上に等間隔に配置されたＭ個のスピーカから出力されるＭチャネルの受話信号ｘ_ｍ（ｋ）を、チャネルｍごとに、周波数領域に変換して、受話信号Ｘ_ｍ（ｆ，ｊ）を生成する受話信号ベクトル変換ステップと、
エコーレプリカ生成部が、前記受話信号Ｘ_ｍ（ｆ，ｊ）から、周波数ｆごとに、前記ＳｅｔＮ（ｆ）に含まれるマイクロホンｎと前記ＳｅｔＭ（ｆ）に含まれるスピーカｍの組み合わせについて、タップ数がＩ（１≦ｉ≦Ｉ、Ｉは１以上）のフィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を用いて、空間的に間引かれたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）を生成するエコーレプリカ生成ステップと、
エコーレプリカ空間補間部が、前記空間的に間引かれたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）から、周波数ｆごとに、空間補間を行い、エコーレプリカＹ＾_ｎ（ｆ，ｊ）を生成するエコーレプリカ空間補間ステップと、
逆ＦＦＴ部が、前記エコーレプリカＹ＾_ｎ（ｆ，ｊ）を、時間領域に変換して、エコーレプリカｙ＾_ｎ（ｊ）を生成する逆ＦＦＴステップと、
エコー消去部が、直線上に等間隔に配置されたＮ個のマイクロホンから収音されるＮチャネルの送話信号ｙ_ｎ（ｊ）と前記エコーレプリカｙ＾_ｎ（ｊ）から、誤差信号ｅ_ｎ（ｋ）を生成するエコー消去ステップと、
ＦＦＴ部が、前記誤差信号ｅ_ｎ（ｋ）を、周波数領域に変換して、誤差信号Ｅ_ｎ（ｆ，ｊ）を生成するＦＦＴステップと、
前記エコーレプリカ生成部が、前記誤差信号Ｅ_ｎ（ｆ，ｊ）と前記受話信号Ｘ_ｍ（ｆ，ｊ）から、周波数ｆごとに、前記ＳｅｔＮ（ｆ）に含まれるマイクロホンｎと前記ＳｅｔＭ（ｆ）に含まれるスピーカｍの組み合わせについて、修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を求め、当該修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を用いて、前記フィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を更新するフィルタ係数更新ステップと、
を含むことを特徴とするマルチチャネルエコー消去方法。 k represents time, f represents frequency, j represents a frame number, i represents an adaptive filter tap number, m represents a speaker number, and SetM (f) is linearly arranged at equal intervals. Represents a set of speaker numbers selected so that the spatial sampling interval becomes wider as the frequency f is lower from M (3 ≦ m ≦ M) speakers , n represents a microphone number, and SetN (f) is Assuming a set of microphone numbers selected from N (3 ≦ n ≦ N) microphones arranged at equal intervals in a straight line so that the spatial sampling interval becomes wider as the frequency f is lower ,
Received signal vector conversion unit, a received signal of M channels output from the M speakers arranged at equal intervals on a straight line x _{m (k),} for each channel m, is converted to the frequency domain, the received signal A received signal vector conversion step of generating X _m (f, j);
The echo replica generation unit determines the number of taps for the combination of the microphone n included in the SetN (f) and the speaker m included in the SetM (f) for each frequency f from the received signal X _m (f, j). Is a spatially decimated echo replica Y ^ _{SetN (f)} (f, j) using filter coefficients H _{m, n, i} (f, j) with I (1 ≦ i ≦ I, I is 1 or more). an echo replica generation step for generating j);
The echo replica spatial interpolation unit performs spatial interpolation for each frequency f from the spatially thinned echo replica Y ^ _{SetN (f)} (f, j), and echo replica Y ^ _n (f, j). An echo replica spatial interpolation step to generate
An inverse FFT unit that converts the echo replica Y ^ _n (f, j) into the time domain to generate an echo replica y ^ _n (j);
Echo cancellation part, from the a transmission signal y _n of the N channels picked up from the N microphones that are equally spaced on a straight line _(j) echo replica y ^ n _(j), the error signal e _n An echo cancellation step to generate (k);
FFT section, the error signal _e n (k), and FFT steps are transformed into the frequency domain to generate an error signal _E n (f, j),
The echo replica generator generates the microphone n and the SetM (f) included in the SetN (f) for each frequency f from the error signal E _n (f, j) and the received signal X _m (f, j). ), The correction amount dH _{m, n, i} (f, j) is obtained, and the filter coefficient H _m, _{n, i} (f, j) is obtained using the correction amount dH _{m, n, i} (f, j) _. a filter coefficient update step for updating _{n, i} (f, j);
A multi-channel echo cancellation method comprising:

請求項１または２に記載のマルチチャネルエコー消去方法であって、
Ｓはマイクロホン間隔を表し、λ（ｆ）は周波数ｆの波長を表すとして、
前記ＳｅｔＮ（ｆ）は、Ｓ×Ｑ≦λ（ｆ）／２を満たす最大の自然数Ｑを求め、当該Ｑ以下である間引き間隔Ｑ（ｆ）を決定し、当該間引き間隔Ｑ（ｆ）に基づいて選択される
ことを特徴とするマルチチャネルエコー消去方法。 The multi-channel echo cancellation method according to claim 1 or 2,
S represents the microphone interval, and λ (f) represents the wavelength of frequency f.
The SetN (f) calculates the maximum natural number Q that satisfies S × Q ≦ λ (f) / 2, determines a thinning interval Q (f) that is equal to or less than the Q, and is based on the thinning interval Q (f). A multi-channel echo cancellation method characterized by being selected.

請求項１から３のいずれかに記載のマルチチャネルエコー消去方法であって、
Ｓはマイクロホン間隔を表し、λ（ｆ）は周波数ｆの波長を表すとして、
前記エコーレプリカ空間補間ステップは、周波数ｆごとに、Ｓ×Ｑ≦λ（ｆ）／２を満たす最大の自然数Ｑを求め、当該Ｑ以下である間引き間隔Ｑ（ｆ）を決定し、

のように、空間補間を行う
ことを特徴とするマルチチャネルエコー消去方法。 The multi-channel echo cancellation method according to any one of claims 1 to 3,
S represents the microphone interval, and λ (f) represents the wavelength of frequency f.
The echo replica space interpolation step obtains a maximum natural number Q that satisfies S × Q ≦ λ (f) / 2 for each frequency f, and determines a thinning interval Q (f) that is equal to or less than Q.

A multi-channel echo cancellation method characterized by performing spatial interpolation as described above.

請求項１から４のいずれかに記載のマルチチャネルエコー消去方法であって、
前記フィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）は、周波数ｆごとに可変であり、所定の周波数を超える周波数に対応するフィルタ係数のタップ数が、所定の周波数以下の周波数に対応するフィルタ係数のタップ数よりも少なく設定されている
ことを特徴とするマルチチャネルエコー消去方法。 The multi-channel echo cancellation method according to any one of claims 1 to 4,
The filter coefficient H _{m, n, i} (f, j) is variable for each frequency f, and the number of taps of the filter coefficient corresponding to a frequency exceeding a predetermined frequency corresponds to a frequency equal to or lower than the predetermined frequency. A multi-channel echo cancellation method characterized by being set to be smaller than the number of coefficient taps.

請求項１から５のいずれかに記載のマルチチャネルエコー消去方法であって、
帯域分割音声スイッチ部が、前記受話信号ｘ_ｍ（ｋ）を、チャネルｍごとに、高域成分を減衰する受話側高域減衰ステップと、
前記帯域分割音声スイッチ部が、前記誤差信号ｅ_ｎ（ｋ）を、チャネルｎごとに、高域成分を減衰する送話側高域減衰ステップと、
前記帯域分割音声スイッチ部が、前記受話信号ｘ_ｍ（ｋ）の低域成分と前記誤差信号ｅ_ｎ（ｋ）の低域成分を用いて、送受話判定を行う送受話判定ステップを、さらに有し、
前記送受話判定ステップにおいて、受話状態と判定した場合には、前記受話側高域減衰ステップを実行し、送話状態と判定した場合には、前記送話側高域減衰ステップを実行する
ことを特徴とするマルチチャネルエコー消去方法。 A multi-channel echo cancellation method according to any of claims 1 to 5,
A band division voice switch unit receives the reception signal x _m (k) for each channel m, and a reception side high frequency attenuation step for attenuating a high frequency component;
The band division voice switch unit transmits the error signal e _n (k) for each channel n by a transmitting side high frequency attenuation step for attenuating a high frequency component;
The band division speech switch unit, using the low-frequency component of the received signal x the low-frequency component of _{m (k)} the error signal e _{n (k),} the handset determination step of performing handset judgment, further Yes And
In the transmission / reception determination step, when the reception state is determined, the reception side high frequency attenuation step is executed, and when the transmission state is determined, the transmission side high frequency attenuation step is executed. A characteristic multi-channel echo cancellation method.

ｋは時間を表し、ｆは周波数を表し、ｊはフレーム番号を表し、ｉは適応フィルタのタップ番号を表し、ｍはスピーカの番号を表し、ｎはマイクロホンの番号を表し、ＳｅｔＮ（ｆ）は直線状に等間隔に配置されたＮ（３≦ｎ≦Ｎ）個のマイクロホンから周波数ｆが低いほど空間サンプリング間隔が広くなるように選択されるマイクロホンの番号の集合を表すとして、
Ｍ（１≦ｍ≦Ｍ）個のスピーカと、
直線上に等間隔に配置されたＮ個のマイクロホンと、
前記スピーカから出力されるＭチャネルの受話信号ｘ_ｍ（ｋ）を、チャネルｍごとに、周波数領域に変換して、受話信号Ｘ_ｍ（ｆ，ｊ）を生成する受話信号ベクトル変換部と、
前記受話信号Ｘ_ｍ（ｆ，ｊ）が入力されると、当該受話信号Ｘ_ｍ（ｆ，ｊ）から、周波数ｆごとに、前記ＳｅｔＮ（ｆ）に含まれるマイクロホンｎについて、タップ数がＩ（１≦ｉ≦Ｉ、Ｉは１以上）のフィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を用いて、空間的に間引かれたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）を生成し、誤差信号Ｅ_ｎ（ｆ，ｊ）が入力されると、当該誤差信号Ｅ_ｎ（ｆ，ｊ）と前記受話信号Ｘ_ｍ（ｆ，ｊ）から、周波数ｆごとに、前記ＳｅｔＮ（ｆ）に含まれるマイクロホンｎについて、修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を求め、当該修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を用いて、前記フィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を更新するエコーレプリカ生成部と、
前記空間的に間引かれたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）から、周波数ｆごとに、空間補間を行い、エコーレプリカＹ＾_ｎ（ｆ，ｊ）を生成するエコーレプリカ空間補間部と、
前記エコーレプリカＹ＾_ｎ（ｆ，ｊ）を、時間領域に変換して、エコーレプリカｙ＾_ｎ（ｊ）を生成する逆ＦＦＴ部と、
前記マイクロホンから収音されるＮチャネルの送話信号ｙ_ｎ（ｊ）と前記エコーレプリカｙ＾_ｎ（ｊ）から、誤差信号ｅ_ｎ（ｋ）を生成するエコー消去部と、
前記誤差信号ｅ_ｎ（ｋ）を、周波数領域に変換して、前記誤差信号Ｅ_ｎ（ｆ，ｊ）を生成するＦＦＴ部と、
を備えることを特徴とするマルチチャネルエコー消去装置。 k represents time, f represents frequency, j represents frame number, i represents adaptive filter tap number, m represents speaker number, n represents microphone number, and SetN (f) is Assuming a set of microphone numbers selected from N (3 ≦ n ≦ N) microphones arranged at equal intervals in a straight line so that the spatial sampling interval becomes wider as the frequency f is lower ,
M (1 ≦ m ≦ M) speakers,
N microphones arranged at equal intervals on a straight line;
An M-channel received signal x _m (k) output from the speaker is converted into a frequency domain for each channel m to generate a received signal X _m (f, j);
When the received signal X _m (f, j) is input, the number of taps of the microphone n included in the SetN (f) is I (() for each frequency f from the received signal X _m (f, j). 1 ≦ i ≦ I, where I is 1 or more) using a filter coefficient H _{m, n, i} (f, j), spatially thinned echo replica Y ^ _{SetN (f)} (f, j) When the error signal E _n (f, j) is input, the SetN (f) is generated for each frequency f from the error signal E _n (f, j) and the received signal X _m (f, j). ), The correction amount dH _{m, n, i} (f, j) is obtained, and the filter coefficient H _{m, n,} _i is calculated using the correction amount dH _{m, n, i} (f, j) _. an echo replica generation unit for updating _i (f, j);
Echo replica spatial interpolation for performing spatial interpolation for each frequency f from the spatially thinned echo replica Y ^ _{SetN (f)} (f, j) to generate echo replica Y ^ _n (f, j) And
An inverse FFT unit that converts the echo replica Y ^ _n (f, j) into a time domain to generate an echo replica y ^ _n (j);
From transmission signal _y n of the N channels picked up (j) and the echo replica _{y ^} n (j) from the microphone, the echo cancellation unit which generates an error signal _e n (k),
The error signal _e n (k), and the FFT unit for converting the frequency domain, the error signal _E n (f, j) to produce a,
A multi-channel echo canceling apparatus comprising:

ｋは時間を表し、ｆは周波数を表し、ｊはフレーム番号を表し、ｉは適応フィルタのタップ番号を表し、ｍはスピーカの番号を表し、ＳｅｔＭ（ｆ）は直線状に等間隔に配置されたＭ（３≦ｍ≦Ｍ）個のスピーカから周波数ｆが低いほど空間サンプリング間隔が広くなるように選択されるスピーカの番号の集合を表し、ｎはマイクロホンの番号を表し、ＳｅｔＮ（ｆ）は直線状に等間隔に配置されたＮ（３≦ｎ≦Ｎ）個のマイクロホンから周波数ｆが低いほど空間サンプリング間隔が広くなるように選択されるマイクロホンの番号の集合を表すとして、
直線上に等間隔に配置されたＭ個のスピーカと、
直線上に等間隔に配置されたＮ個のマイクロホンと、
前記スピーカから出力されるＭチャネルの受話信号ｘ_ｍ（ｋ）を、チャネルｍごとに、周波数領域に変換して、受話信号Ｘ_ｍ（ｆ，ｊ）を生成する受話信号ベクトル変換部と、
前記受話信号Ｘ_ｍ（ｆ，ｊ）が入力されると、当該受話信号Ｘ_ｍ（ｆ，ｊ）から、周波数ｆごとに、前記ＳｅｔＮ（ｆ）に含まれるマイクロホンｎと前記ＳｅｔＭ（ｆ）に含まれるスピーカｍの組み合わせについて、タップ数がＩ（１≦ｉ≦Ｉ、Ｉは１以上）のフィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を用いて、空間的に間引かれたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）を生成し、誤差信号Ｅ_ｎ（ｆ，ｊ）が入力されると、当該誤差信号Ｅ_ｎ（ｆ，ｊ）と前記受話信号Ｘ_ｍ（ｆ，ｊ）から、周波数ｆごとに、前記ＳｅｔＮ（ｆ）に含まれるマイクロホンｎと前記ＳｅｔＭ（ｆ）に含まれるスピーカｍの組み合わせについて、修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を求め、当該修正量ｄＨ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を用いて、前記フィルタ係数Ｈ_{ｍ，ｎ，ｉ}（ｆ，ｊ）を更新するエコーレプリカ生成部と、
前記空間的に間引かれたエコーレプリカＹ＾_{ＳｅｔＮ（ｆ）}（ｆ，ｊ）から、周波数ｆごとに、空間補間を行い、エコーレプリカＹ＾_ｎ（ｆ，ｊ）を生成するエコーレプリカ空間補間部と、
前記エコーレプリカＹ＾_ｎ（ｆ，ｊ）を、時間領域に変換して、エコーレプリカｙ＾_ｎ（ｊ）を生成する逆ＦＦＴ部と、
前記マイクロホンから収音されるＮチャネルの送話信号ｙ_ｎ（ｊ）と前記エコーレプリカｙ＾_ｎ（ｊ）から、誤差信号ｅ_ｎ（ｋ）を生成するエコー消去部と、
前記誤差信号ｅ_ｎ（ｋ）を、周波数領域に変換して、前記誤差信号Ｅ_ｎ（ｆ，ｊ）を生成するＦＦＴ部と、
を備えることを特徴とするマルチチャネルエコー消去装置。 k represents time, f represents frequency, j represents a frame number, i represents an adaptive filter tap number, m represents a speaker number, and SetM (f) is linearly arranged at equal intervals. Represents a set of speaker numbers selected so that the spatial sampling interval becomes wider as the frequency f is lower from M (3 ≦ m ≦ M) speakers , n represents a microphone number, and SetN (f) is Assuming a set of microphone numbers selected from N (3 ≦ n ≦ N) microphones arranged at equal intervals in a straight line so that the spatial sampling interval becomes wider as the frequency f is lower ,
M speakers arranged at equal intervals on a straight line;
N microphones arranged at equal intervals on a straight line;
An M-channel received signal x _m (k) output from the speaker is converted into a frequency domain for each channel m to generate a received signal X _m (f, j);
When the received signal X _m (f, j) is input, the microphone n included in the SetN (f) and the SetM (f) are set for each frequency f from the received signal X _m (f, j). Echoes spatially thinned using filter coefficients H _{m, n, i} (f, j) with the number of taps I (1 ≦ i ≦ I, I is 1 or more) for combinations of speakers m included When the replica Y ^ _{SetN (f)} (f, j) is generated and the error signal E _n (f, j) is input, the error signal E _n (f, j) and the received signal X _m (f, j) j) for each frequency f, a correction amount dH _{m, n, i} (f, j) is obtained for the combination of the microphone n included in the SetN (f) and the speaker m included in the SetM (f). using the correction amount _{dH m, n,} i and (f, j), the full An echo replica generator for updating filter coefficients _{H m, n,} i and (f, j),
Echo replica spatial interpolation for performing spatial interpolation for each frequency f from the spatially thinned echo replica Y ^ _{SetN (f)} (f, j) to generate echo replica Y ^ _n (f, j) And
An inverse FFT unit that converts the echo replica Y ^ _n (f, j) into a time domain to generate an echo replica y ^ _n (j);
From transmission signal _y n of the N channels picked up (j) and the echo replica _{y ^} n (j) from the microphone, the echo cancellation unit which generates an error signal _e n (k),
The error signal _e n (k), and the FFT unit for converting the frequency domain, the error signal _E n (f, j) to produce a,
A multi-channel echo canceling apparatus comprising:

請求項１から６のいずれかに記載されたマルチチャネルエコー消去方法の各ステップをコンピュータに実行させるためのプログラム。 The program for making a computer perform each step of the multichannel echo cancellation method described in any one of Claim 1 to 6.