JP4478045B2

JP4478045B2 - Echo erasing device, echo erasing method, echo erasing program and recording medium therefor

Info

Publication number: JP4478045B2
Application number: JP2005062995A
Authority: JP
Inventors: 勝宏福井; 末廣島内
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-03-07
Filing date: 2005-03-07
Publication date: 2010-06-09
Anticipated expiration: 2025-03-07
Also published as: JP2006246397A

Description

この発明は、例えば多チャネル音響再生系を有する通信会議システムに適用され、ハウリングの原因及び聴覚上の障害となる音響エコーを消去するエコー消去装置、エコー消去方法、エコー消去プログラムおよびその記録媒体に関するものである。 The present invention is applied to a communication conference system having, for example, a multi-channel sound reproduction system, and relates to an echo erasing apparatus, an echo erasing method, an echo erasing program, and a recording medium for erasing acoustic echoes that cause acoustic feedback and cause hearing problems Is.

図1に示すように、Ｎ(≧２の整数)チャンネルの再生信号ｘ_１（ｋ）〜ｘ_Ｎ（ｋ）がスピーカ１_１〜１_Ｎからマイクロホン２へ回り込むエコーを消去した出力信号ｅ（ｋ）を生成する従来の多チャンネルエコー消去装置は、非特許文献１に記載する手法を用いて、スピーカ１_１〜１_Ｎとマイクロホン２間のエコー経路のインパルス応答を要素として持つ長さ（即ちタップ数）Ｌのベクトルｈ_１〜ｈ_Ｎの疑似特性ｈ＾_１（ｋ）〜ｈ＾_Ｎ（ｋ）を保持する疑似エコー経路を適応フィルタ３_１〜３_Ｎで実現している。ここで、ｋは、所定間隔の離散的な時刻を指す数（サンプル点の番号）である。サンプリングとは、アナログの音声信号をディジタル信号に変換するために変数のある区間の値を１つの代表する値に置き換えることで、たとえばサンプリング周波数１６ｋＨｚ（１秒間に１６０００回）で行われる。なお、スピーカ１_１〜１_Ｎに与える信号、マイクロホン２で収音された信号はアナログ信号であり、以下の説明では、ディジタル信号を扱うので、それぞれＤＡ変換器、ＡＤ変換器によって変換を行う必要があるが、それは当黙のことであり、図示していない。 As shown in FIG. 1, an output signal e (k) in which echoes _N ₁ (≧ 2) channel reproduction signals x ₁ (k) to x _N (k) cancel echoes that circulate from the speakers 1 ₁ to 1 _N to the microphone 2. The conventional multi-channel echo canceling device that generates the) is a length having an impulse response of the echo path between the speakers 1 ₁ to 1 _N and the microphone 2 as an element (i.e., a tap). realizes a pseudo echo path for holding the number) pseudo properties of L vectors _{_{_{h 1 ~h N h ^ 1 (}}} k) ~h ^ N (k) in adaptive filter ₃ 1 to 3 _N. Here, k is a number indicating the discrete time at a predetermined interval (number of sample points). Sampling is performed, for example, at a sampling frequency of 16 kHz (16000 times per second) by replacing a value in a certain section of a variable with one representative value in order to convert an analog audio signal into a digital signal. The signals given to the speakers 1 ₁ to 1 _N and the signals picked up by the microphone 2 are analog signals. In the following description, digital signals are handled. Therefore, it is necessary to perform conversion by a DA converter and an AD converter, respectively. It is silent and not shown.

適応フイルタ３_１〜３_Ｎは再生信号ｘ_１（ｋ）〜ｘ_Ｎ（ｋ）と疑似特性ｈ＾_１（ｋ）〜ｈ＾_Ｎ（ｋ）との畳み込み演算により疑似エコー信号ｄ’_１（ｋ）〜ｄ’_Ｎ（ｋ）を生成し、実際のエコー信号を含むマイクロホン２の収音信号（「エコー消去前信号」とも呼ぶ。）ｙ（ｋ）から減算することで、エコー消去装置の出力信号（「エコー消去信号」とも呼ぶ。）ｅ（ｋ）を出力する。再生信号ｘ_１（ｋ）〜ｘ_Ｎ（ｋ）と出力信号ｅ（ｋ）とを用いて、疑似特性ｈ＾_１（ｋ）〜ｈ＾_Ｎ（ｋ）の特性を随時更新し、適応フィルタ３_１〜３_Ｎに設定する。
適応フィルタ３_１〜３_Ｎにおいて、たとえば学習同定アルゴリズムを用いた場合の疑似特性ｈ＾_１（ｋ）〜ｈ＾_Ｎ（ｋ）の推定は、 The adaptive filters 3 _{1 to} 3 _N are converted into pseudo echo signals d ′ ₁ (k by convolution operation of the reproduction signals x ₁ (k) to x _N (k) and the pseudo characteristics h ₁ (k) to h _N (k). ) To d ′ _N (k) are generated and subtracted from the collected sound signal (also referred to as “pre-echo cancellation signal”) y (k) including the actual echo signal, the output of the echo cancellation device A signal (also referred to as “echo cancellation signal”) e (k) is output. Using the reproduction signals x ₁ (k) to x _N (k) and the output signal e (k), the characteristics of the pseudo characteristics h ₁ (k) to h _N (k) are updated as needed, and the adaptive filter 3 _{1 to} 3 Set to _N.
In the adaptive filters 3 _{1 to} 3 _N , for example, when the learning identification algorithm is used, the pseudo characteristics h ₁ (k) to h ₁ _N (k) are estimated as follows:

で表される。ここで、チャンネルｎは１〜Ｎの間の自然数、ｘ_ｎ（ｋ）＝［ｘ_ｎ（ｋ），ｘ_ｎ（ｋ−１），…，ｘ_ｎ（ｋ―Ｌ＋１）］^Ｔ、ψは係数の更新幅を与えるステップサイズであり、０〜２の間の値をとる実数である。δは分母が０になることを防止するための微小な定数である。式（１）が示すように、前回の疑似特性ｈ＾_１（ｋ−１）に対し更新量を加えて今回の疑似特性ｈ＾_１（ｋ）を得る。
藤井哲郎、島田正治、“多チャンネル適応ディジタルフィルタ、”電子通信学会論文誌’８６／１０、Ｖｏｌ．Ｊ６９−ＡＮｏ．１０．

It is represented by Here, channel n is a natural number between 1 and N, x _n (k) = [x _n (k), x _n (k−1),..., X _n (k−L + 1)] ^T , ψ is a coefficient Is a step number that gives an update width, and is a real number that takes a value between 0 and 2. δ is a minute constant for preventing the denominator from becoming zero. As shown in Equation (1), the update amount is added to the previous pseudo characteristic ₁ (k−1) to obtain the current pseudo characteristic ｈ ₁ (k).
Tetsuro Fujii, Shoji Shimada, “Multi-channel adaptive digital filter,” IEICE Transactions '86 / 10, Vol. J69-A No. 10.

式（１）において、再生信号ｘ_１（ｋ）〜ｘ_Ｎ（ｋ）のチャンネル数と同数の疑似特性ｈ＾_１（ｋ）〜ｈ＾_Ｎ（ｋ）を随時更新しているため、演算量が飛躍的に増大するという問題点があった。また、適応フィルタ３_１〜３_Ｎの収束には一定時間を要するため、学習途中において推定誤差が起こり、エコー推定精度が劣化するという問題点もあった。本発明の課題は、演算量を低減することによりハードウェアの規模を縮小し、瞬時にエコー抑圧を行うことで上記問題に起因するエコー消去装置の性能劣化を改善することである。 In the formula (1), since the updated reproduced signal _x 1 (k) _~x N pseudo characteristic number of channels as many of _{(k) h ^ 1 (k} ) ~h ^ N (k) from time to time, the amount of computation However, there was a problem that the number increased dramatically. In addition, since the convergence of the adaptive filters 3 _{1 to} 3 _N requires a certain time, there is a problem that an estimation error occurs during learning and the echo estimation accuracy deteriorates. An object of the present invention is to improve the performance deterioration of an echo canceling apparatus due to the above problem by reducing the amount of calculation to reduce the scale of hardware and instantaneously performing echo suppression.

本発明では、再生信号の総和の周波数成分と、収音信号の周波数成分を分析し、各周波数成分をグループ化し、当該グループごとに振幅比からエコーの振幅スペクトルを推定し、収音信号の周波数成分と推定されたエコー振幅スペクトルの周波数成分ごとの振幅比からエコー消去信号の周波数成分を計算し、エコー消去信号の周波数成分を時間領域に変換して出力する。 In the present invention, the frequency component of the sum of the reproduction signal and the frequency component of the collected sound signal are analyzed, each frequency component is grouped, the amplitude spectrum of the echo is estimated from the amplitude ratio for each group, and the frequency of the collected sound signal The frequency component of the echo cancellation signal is calculated from the amplitude ratio of each frequency component of the echo amplitude spectrum estimated as the component, and the frequency component of the echo cancellation signal is converted into the time domain and output.

この発明によれば、収音信号周波数成分と推定エコー振幅スペクトルの振幅比のみを用いてエコー消去信号周波数成分を算出できるので、従来のような膨大な計算量となる多チャンネルの適応フィルタ演算を避けることができる。また、推定エコー振幅スペクトルをほぼ瞬時に算出することができるため、エコー消去精度が適応フィルタの収束精度に大きく依存するという従来の問題を解決できる。 According to the present invention, since the echo cancellation signal frequency component can be calculated using only the amplitude ratio of the collected sound signal frequency component and the estimated echo amplitude spectrum, the multi-channel adaptive filter calculation that requires a huge amount of calculation as in the prior art can be performed. Can be avoided. Moreover, since the estimated echo amplitude spectrum can be calculated almost instantaneously, the conventional problem that the echo cancellation accuracy greatly depends on the convergence accuracy of the adaptive filter can be solved.

以下にこの発明の実施形態を、図面を参照しながら説明するが、各図中の対応する部分は同一参照番号を付けて重複説明を省略する。
［第１実施形態］
図２は本発明のエコー消去装置１００の機能構成例を示す図、図３は処理フローを示す図である。エコー消去装置１００は、総和部４Ａ、再生信号用の周波数分析部１０１、収音信号用の周波数分析部１０２、エコー振幅スペクトル計算部１０３、目的成分選択計算部１０４、および周波数合成部１０５から構成される。以下に、図２と図３とを参照しながら説明する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below with reference to the drawings. Corresponding portions in each drawing are given the same reference numerals, and redundant description is omitted.
[First Embodiment]
FIG. 2 is a diagram showing a functional configuration example of the echo canceling apparatus 100 of the present invention, and FIG. 3 is a diagram showing a processing flow. The echo cancellation apparatus 100 includes a summation unit 4A, a reproduction signal frequency analysis unit 101, a collected sound signal frequency analysis unit 102, an echo amplitude spectrum calculation unit 103, a target component selection calculation unit 104, and a frequency synthesis unit 105. Is done. This will be described below with reference to FIGS.

ステップＳ４Ａ
総和部４Ａでは、複数（Ｎ個、Ｎは２以上の整数）のチャンネルの再生信号ｘ_１（ｋ）〜ｘ_Ｎ（ｋ）を入力とし、各チャンネルの再生信号ｘ_１（ｋ）〜ｘ_Ｎ（ｋ）をサンプルごとに加算した加算再生信号ｘ（ｋ）＝Σ^Ｎ _ｎ＝１ｘ_ｎ（ｋ）を出力する。ここで、ｋは、所定間隔の離散的な時刻を指す数（サンプル点の番号）である。サンプリングは、たとえばサンプリング周波数１６ｋＨｚ（１秒間に１６０００回）で行われる。
ステップＳ１０１
周波数分析部１０１は、加算再生信号ｘ（ｋ）を入力とし、各周波数成分の振幅スペクトルを加算再生信号振幅スペクトル｜Ｘ_ω｜として出力する。ここで、ωは所定の周波数間隔で求めた振幅スペクトルの周波数成分の番号を示す数である。たとえば、１６ｋＨｚでサンプリングした５１２個の加算再生信号ｘ（ｋ−５１１），…，ｘ（ｋ）を１フレームとし、加算再生信号ｘ（ｋ）をフレーム単位で、８ｋＨｚまでの周波数帯域をサンプル点数２５６で表した加算再生信号振幅スペクトル｜Ｘ_ω｜（ω＝１，…，２５６）へ変換する。 Step S4A
In the summing unit 4A, the reproduction signals x ₁ (k) to x _N (k) of a plurality of channels (N, N is an integer of 2 or more) are input, and the reproduction signals x ₁ (k) to x _{N of} each channel are input. (k) adding a reproduced signal obtained by adding each sample ^{_{x (k) = Σ n n}} = outputs a _{1 x} n (k). Here, k is a number indicating the discrete time at a predetermined interval (number of sample points). Sampling is performed, for example, at a sampling frequency of 16 kHz (16000 times per second).
Step S101
The frequency analysis unit 101 receives the added reproduction signal x (k) as an input, and outputs the amplitude spectrum of each frequency component as an added reproduction signal amplitude spectrum | X _ω |. Here, ω is a number indicating the number of the frequency component of the amplitude spectrum obtained at a predetermined frequency interval. For example, 512 additional reproduction signals x (k−511),..., X (k) sampled at 16 kHz are defined as one frame, and the additional reproduction signal x (k) is a frame unit, and the frequency band up to 8 kHz is sampled. This is converted into an additive reproduction signal amplitude spectrum | X _ω | (ω = 1,..., 256) represented by 256.

ステップＳ１０２
周波数分析部１０２は、収音信号ｙ（ｋ）を入力とし、各周波数成分の収音信号振幅スペクトル｜Ｙ_ω｜と位相スペクトルａｒｇ（Ｙ_ω）を出力する。ｋおよびωはステップＳ１０１での説明と同じである。またａｒｇ（Ｙ_ω）は０以上２π未満の実数である。
ステップＳ１０３
エコー振幅スペクトル計算部１０３は、入力の加算再生信号振幅スペクトル｜Ｘ_ω｜と収音信号振幅スペクトル｜Ｙ_ω｜から、推定エコー振幅スペクトル｜Ｄ＾_ω｜を出力とする。ステップＳ１０３１からＳ１０３４は、Ｍ個（Ｍは２以上の整数）の周波数成分のグループごとに行い、すべてのｍ（１≦ｍ≦Ｍ）の処理が終了するとステップＳ１０３５へ進む（ステップＳ１０３７、Ｓ１０３８、Ｓ１０３９により、周波数グループ単位の繰り返し処理を行っている。）。たとえば、ω＝１，…，２５６、Ｍ＝３２の場合は、８個のω（ω＝８ｍ−７，…，８ｍ）の各周波数成分が１つのグループとなる。内部の具体的な処理手順は以下の通りである。 Step S102
The frequency analysis unit 102 receives the collected sound signal y (k) as an input, and outputs a collected sound signal amplitude spectrum | Y _ω | and a phase spectrum arg (Y _ω ) of each frequency component. k and ω are the same as those described in step S101. Further, arg (Y _ω ) is a real number not less than 0 and less than 2π.
Step S103
The echo amplitude spectrum calculation unit 103 outputs an estimated echo amplitude spectrum | D ^ _ω | from the input addition reproduction signal amplitude spectrum | X _ω | and the collected sound signal amplitude spectrum | Y _ω |. Steps S1031 to S1034 are performed for each group of M frequency components (M is an integer of 2 or more), and when all m (1 ≦ m ≦ M) processing is completed, the process proceeds to step S1035 (steps S1037, S1038, (Repetition processing is performed in units of frequency groups in S1039.) For example, when ω = 1,..., 256 and M = 32, each frequency component of 8 ω (ω = 8m−7,..., 8m) forms one group. A specific internal processing procedure is as follows.

ステップＳ１０３１
周波数グループ中の複数個の｜Ｙ_ω（ｊ）｜（たとえば、１つのグループが８個のωで構成されている場合には、｜Ｙ_８ｍ−７（ｊ）｜，…，｜Ｙ_８ｍ（ｊ）｜）のうちの最大値とその最大値をとるω_Ｙｍを求める。
次に、周波数グループ中の複数個の｜Ｘ_ω（ｊ）｜（たとえば、１つのグループが８個のωで構成されている場合には、｜Ｘ_８ｍ−７（ｊ）｜，…，｜Ｘ_８ｍ（ｊ）｜）から、フレームｊでの残響付加再生信号振幅スペクトル Step S1031
A plurality of | Y _ω (j) | in the frequency group (for example, when one group is composed of 8 ω, | Y _8m-7 (j) |,..., | Y _8m ( j) _Find the maximum value of |) and ω _Ym that takes the maximum value.
Next, a plurality of | X _ω (j) | in the frequency group (for example, | X _8m−7 (j) |,. X _8m (j) |), the reverberation-added reproduction signal amplitude spectrum in frame j

を求める。ただし、

である。また、ξはエコー消去装置１００を使用する場所の残響時間を考慮して過去の残響付加再生信号振幅スペクトルを加算再生信号振幅スペクトル｜Ｘ_ω（ｊ）｜に付加する割合を表し、０〜１の範囲（たとえば０．７）で値を設定する。なお、たとえば残響付加再生信号振幅スペクトルの初期値は０、変形補正量ｃ_ｍ（０）は１とする。
このように周波数グループｍ内の全てのωに対して求めた残響付加再生信号振幅スペクトル（たとえば、１つのグループが８個のωで構成されている場合には、

Ask for. However,

It is. In addition, ξ represents the ratio of adding the past reverberation-added reproduction signal amplitude spectrum to the additional reproduction signal amplitude spectrum | X _ω (j) | in consideration of the reverberation time of the place where the echo canceller 100 is used, A value is set in the range of (for example, 0.7). For example, the initial value of the reverberation-added reproduction signal amplitude spectrum is 0, and the deformation correction amount c _m (0) is 1.
Thus, the reverberation-added reproduction signal amplitude spectrum obtained for all ω in the frequency group m (for example, when one group is composed of 8 ω,

）のうちの最大値とその最大値をとるω_Ｘｍを求める。
次に、暫定補正量ｚ_ｍ（ｊ）を

により計算する。ｊは、フレームの番号を示す値であり、νは変形補正量の変化を調整するための重み係数（たとえば０．４）であり、δは式（１）と同様に分母が０とならないために加える微小な値である。｜Ｙ_ω（ｊ）｜は、フレームｊでの収音信号振幅スペクトルである。
ステップＳ１０３２
次に、ステップＳ１０３５で用いる変形補正量ｃ_ｍ（ｊ）を、たとえば以下の２つの条件（条件１、条件２）の組合せによる判断で求める。

) And ω _Xm taking the maximum value.
Next, the provisional correction amount z _m (j) is set to

Calculate by j is a value indicating the frame number, ν is a weighting coefficient (for example, 0.4) for adjusting the change of the deformation correction amount, and δ is not denominator 0 as in the equation (1). It is a minute value added to. | Y _ω (j) | is the collected signal amplitude spectrum in frame j.
Step S1032
Next, the deformation correction amount c _m (j) used in step S1035 is obtained by determination based on, for example, a combination of the following two conditions (condition 1 and condition 2).

条件１：グループｍ（１≦ｍ≦Ｍ）での残響付加再生信号振幅スペクトルが最大となるω_Ｘｍと収音信号周波数成分振幅スペクトルが最大となるω_Ｙｍが一致し、かつ当該ωでの２つの振幅スペクトルがあらかじめ定めた所定の閾値以上の値を持つ。
あらかじめ定める閾値とは、エコー消去装置１００を使用する環境の雑音などによって異なり、音として認識できる程度の値（たとえば、６０ｄＢｍ、１０００など）である。
条件２：暫定補正量ｚ_ｍ（ｊ）が、ｃ_ｍ（ｊ−１）と比較してあらかじめ定めた範囲以内(たとえば、０．５・ｃ_ｍ（ｊ−１）＜ｚ_ｍ（ｊ）＜２・ｃ_ｍ（ｊ−１）)である。 Condition 1: ω _Xm that maximizes the reverberation-added reproduction signal amplitude spectrum in group m (1 ≦ m ≦ M) matches ω _Ym that maximizes the collected signal frequency component amplitude spectrum, and 2 at the ω. One amplitude spectrum has a value equal to or greater than a predetermined threshold value.
The predetermined threshold value is a value that can be recognized as a sound (for example, 60 dBm, 1000, etc.) depending on the noise of the environment in which the echo canceller 100 is used.
Condition 2: The provisional correction amount z _m (j) is within a predetermined range as compared with c _m (j−1) (for example, 0.5 · c _m (j−1) <z _m (j) < 2 · c _m (j−1)).

ここで、範囲を定めるのは、本発明ではチャンネル間の位相差を検出（計算）していないため、２つ以上の音が強めあう場合や弱めあう場合があるが、このような特定の周波数での誤動作、および近端話者（マイクロホン２に対する話者）が話した場合に、話者の成分でｃ_ｍ（ｊ）が大きくなりすぎたり小さくなりすぎたりすることを避けるためである。
上記の２つの条件（条件１、条件２）とも満たす場合にはステップＳ１０３３へ進み、どちらか一方でも条件を満たさない場合にはステップＳ１０３４へ進む。
ステップＳ１０３３
ステップＳ１０３２の条件を満足する場合は、ステップＳ１０３１で求めた暫定補正量ｚ_ｍ（ｊ）を変形補正量ｃ_ｍ（ｊ）とする。ここで求めた変形補正量ｃ_ｍ（ｊ）は次フレームのＳ１０３２およびＳ１０３４で前フレームの変形補正量として用いるために記憶しておく。 Here, the range is determined because the phase difference between channels is not detected (calculated) in the present invention, and two or more sounds may be strengthened or weakened. This is for avoiding the malfunction of, and when the near-end speaker (speaker to the microphone 2) speaks, c _m (j) becomes too large or too small due to the speaker component.
If both of the above two conditions (condition 1 and condition 2) are satisfied, the process proceeds to step S1033, and if either of the conditions is not satisfied, the process proceeds to step S1034.
Step S1033
When the condition of step S1032 is satisfied, the provisional correction amount z _m (j) obtained in step S1031 is set as the deformation correction amount c _m (j). The deformation correction amount c _m (j) obtained here is stored for use as the deformation correction amount of the previous frame in S1032 and S1034 of the next frame.

ステップＳ１０３４
ステップＳ１０３２の条件を満足しない場合は、前フレームで用いた変形補正量ｃ_ｍ（ｊ−１）を変形補正量ｃ_ｍ（ｊ）とする。ここで求めた変形補正量ｃ_ｍ（ｊ）は次フレームのＳ１０３２およびＳ１０３４で前フレームの変形補正量として用いるために記憶しておく。
ステップＳ１０３１からステップＳ１０３４は、周波数成分のグループｍごとに行ない、すべてのｍ（１≦ｍ≦Ｍ）の処理が終了するとステップＳ１０３５へ進む。 Step S1034
If the condition of step S1032 is not satisfied, the deformation correction amount _cm (j-1) used in the previous frame is set as the deformation correction amount _cm (j). The deformation correction amount c _m (j) obtained here is stored for use as the deformation correction amount of the previous frame in S1032 and S1034 of the next frame.
Steps S1031 to S1034 are performed for each frequency component group m. When all m (1 ≦ m ≦ M) processing is completed, the process proceeds to step S1035.

ステップＳ１０３５
各周波数成分ωごとの推定エコー振幅スペクトル｜Ｄ＾_ω｜（すなわち、｜Ｄ＾₁｜，…，｜Ｄ＾₂₅₆｜）を Step S1035
Estimated echo amplitude spectrum | D ^ _ω | (that is, | D ^ ₁ |, ..., | D ^ ₂₅₆ |) for each frequency component ω

により求める。この処理は周波数成分ωごとに行い、当該周波数成分ωが属するグループｍのステップＳ１０３３またはＳ１０３４で求めた変形補正量ｃ_ｍ（ｊ）を用いて計算する。
ステップＳ１０４
目的成分選択計算部１０４では、エコー消去信号振幅スペクトル｜Ｅ_ω｜を周波数成分ωごとに、

Ask for. This process is performed for each frequency component omega, calculated using the modified correction amount calculated in step S1033 or S1034 in group m to the frequency component omega belongs c _{m (j).}
Step S104
The target component selection calculation unit 104 calculates the echo cancellation signal amplitude spectrum | E _ω | for each frequency component ω.

より求め、出力する。ただし、

である。また、βは推定エコー振幅スペクトル｜Ｄ＾_ω｜を実際より小さく推定することによって生じる近端話者による誤動作を軽減するためにあらかじめ設定する閾値であり、この閾値は１よりやや小さく設定する（たとえば、β＝２．５）。
なお、既存の再生信号のシングルトーク検出装置２００を利用して、再生信号のシングルトーク状態（近端話者が話していない状態）を検出したとき（ステップＳ２００）、式（４）のエコー消去信号振幅スペクトル｜Ｅ_ω｜を振幅比ΔＡ_ωに関わらず０にする方法もある。

Find and output. However,

It is. Further, β is a threshold value set in advance in order to reduce a malfunction by the near-end speaker caused by estimating the estimated echo amplitude spectrum | D ^ _ω | to be smaller than actual, and this threshold value is set slightly smaller than 1 ( For example, β = 2.5).
When a single talk state of the reproduction signal (a state where the near-end speaker is not speaking) is detected using the existing reproduction signal single talk detection device 200 (step S200), the echo cancellation of equation (4) is performed. There is also a method of setting the signal amplitude spectrum | E _ω | to 0 regardless of the amplitude ratio ΔA _ω .

ステップＳ１０５
周波数合成部１０５では、ステップＳ１０４で求めた各周波数成分ωに対応するエコー消去信号振幅スペクトル｜Ｅ_ω｜とステップＳ１０２で求めた位相スペクトルａｒｇ（Ｙ_ω）から、時間領域の信号ｅ（ｋ）を再合成して出力する。
［第２実施形態］
第１実施形態では複数の再生信号ｘ_１（ｋ）〜ｘ_Ｎ（ｋ）を加算した後に周波数分析したが、本発明では、それぞれの再生信号を周波数分析した後に、周波数成分ωごとに加算する点が異なる。このように先に再生信号ごとの周波数分析を行うことで、周波数分析部の数は多くなるが、再生信号間の位相差による強めあいや弱めあいの影響を避けることができる。図４にエコー消去装置１００’の機能構成例、図５に処理フローを示す。 Step S105
In the frequency synthesizer 105, the time-domain signal e (k) is calculated from the echo cancellation signal amplitude spectrum | E _ω | corresponding to each frequency component ω obtained in step S104 and the phase spectrum arg (Y _ω ) obtained in step S102. Are synthesized and output.
[Second Embodiment]
In the first embodiment, the frequency analysis is performed after adding a plurality of reproduction signals x ₁ (k) to x _N (k). However, in the present invention, each reproduction signal is subjected to frequency analysis and then added for each frequency component ω. The point is different. Thus, by performing frequency analysis for each reproduction signal first, the number of frequency analysis units increases, but the influence of strengthening and weakening due to the phase difference between the reproduction signals can be avoided. FIG. 4 shows a functional configuration example of the echo canceling apparatus 100 ′, and FIG. 5 shows a processing flow.

第１実施形態と異なる点のみについて、以下に説明する。
エコー消去装置１００’は、第１実施形態のエコー消去装置１００の総和部４Ａと周波数分析部１０１の代わりに、複数の周波数分析部１０１_１〜１０１_Ｎと総和部４Ｂを備えている。
ステップＳ１０１およびＳ１０１７〜Ｓ１０１９
ハードウェアとして周波数分析部を構成する場合には、Ｎ個の周波数分析部１０１_１〜１０１_Ｎが存在し、再生信号ｘ_１（ｋ）〜ｘ_Ｎ（ｋ）をそれぞれ周波数分析し、再生信号振幅スペクトル｜Ｘ_１ω｜〜｜Ｘ_Ｎω｜を得る。一方、周波数分析部がソフトウェアによって構成される場合には、Ｎ回の繰り返し処理によってＮ個の再生信号ｘ_１（ｋ）〜ｘ_Ｎ（ｋ）からＮ個の再生信号振幅スペクトル｜Ｘ_１ω｜〜｜Ｘ_Ｎω｜を得る。図５の処理フローでは、ソフトウェアによって構成した場合を示しており、ステップＳ１０１７〜Ｓ１０１９の繰り返し処理によって、Ｎ回の周波数分析が行われている。 Only differences from the first embodiment will be described below.
The echo canceling apparatus 100 ′ includes a plurality of frequency analysis units 101 _{1 to} 101 _N and a summation unit 4B instead of the summation unit 4A and the frequency analysis unit 101 of the echo cancellation device 100 of the first embodiment.
Steps S101 and S1017 to S1019
When configuring the frequency analyzer as a hardware, there are N frequency analysis unit ₁₀₁ 1 to 101 _N, the reproduction signal _x 1 (k) _~x N a (k) respectively frequency analysis, the reproduction signal amplitude A spectrum | X _1ω | ˜ | X _Nω | is obtained. On the other hand, when the frequency analysis unit is configured by software, N reproduced signal amplitude spectra | X _1ω | ˜ from N reproduced signals x ₁ (k) to x _N (k) by N _iterations . | X _Nω | is obtained. The processing flow of FIG. 5 shows a case where it is configured by software, and N frequency analyzes are performed by repeating the processing of steps S1017 to S1019.

ステップＳ４Ｂ
総和部４Ｂでは、再生信号振幅スペクトル｜Ｘ_１ω｜〜｜Ｘ_Ｎω｜を入力とし、周波数ごとに振幅スペクトルを加算し、加算再生信号振幅スペクトル｜Ｘ_ω｜を Step S4B
In the summation unit 4B, the reproduction signal amplitude spectrum | X _1ω | to | X _Nω | is input, the amplitude spectrum is added for each frequency, and the addition reproduction signal amplitude spectrum | X _ω |

のように求めて、出力する。
残りの処理は、第１実施形態と同じである。
［第３実施形態］
再生信号のシングルトーク時には、収音信号ｙ（ｋ）は再生信号ｘ_１（ｋ）〜ｘ_Ｎ（ｋ）のエコーと雑音のみから構成されているため、式（５）の振幅比ΔＡ_ωは１に近い値となるはずである。もし、振幅比ΔＡ_ωが1/β未満になる周波数成分が存在すれば、それは推定エコー振幅スペクトル｜Ｄ＾_ω｜の誤推定により、変形補正量ｃ_ｍの設定が小さすぎるためである。このような特定の周波数成分の変形補正量ｃ_ｍが小さすぎると、ミュージカルノイズが発生する原因となるため、本実施形態では、変形補正量ｃ_ｍを増加させる処理を加える。具体的には、エコー消去装置１００または１００’の目的成分選択計算部１０４での処理を以下のように変更する。本実施形態での処理フローを図６に示す。なお、図６は第１実施形態からの変更例を示しているが、第２実施形態の場合にも同じように適用できる。

And output as follows.
The remaining processing is the same as in the first embodiment.
[Third Embodiment]
At the time of single talk of the reproduction signal, the collected sound signal y (k) is composed only of echoes and noises of the reproduction signals x ₁ (k) to x _N (k), and therefore the amplitude ratio ΔA _{ω in} Expression (5) is The value should be close to 1. If there is a frequency component at which the amplitude ratio ΔA _ω is less than 1 / β, this is because the setting of the deformation correction amount _cm is too small due to erroneous estimation of the estimated echo amplitude spectrum | D ^ _ω |. If the deformation correction amount _cm of such a specific frequency component is too small, musical noise is generated. In the present embodiment, processing for increasing the deformation correction amount _cm is added. Specifically, the processing in the target component selection calculation unit 104 of the

echo canceling apparatus

100 or 100 ′ is changed as follows. The processing flow in this embodiment is shown in FIG. FIG. 6 shows a modified example from the first embodiment, but the same applies to the case of the second embodiment.

ステップＳ１０４２の追加
目的成分選択計算部１０４では、振幅比ΔＡ_ωが1/β未満になる周波数成分が存在するか否かを確認する。そのような周波数成分がない場合にはステップＳ１０４に進み、条件を満足する周波数成分がある場合には、ステップＳ１０４３に進む。
ステップＳ１０４３の追加
目的成分選択計算部１０４では、変形補正量ｃ_ｍ（ｊ）を増加させる処理として、たとえば、 The additional target component selection calculation unit 104 in step S1042 confirms whether there is a frequency component with which the amplitude ratio ΔA _ω is less than 1 / β. If there is no such frequency component, the process proceeds to step S104, and if there is a frequency component that satisfies the condition, the process proceeds to step S1043.
In the additional target component selection calculation unit 104 in step S1043, as a process of increasing the deformation correction amount c _m (j), for example,

を行う。ここで、ｃ_ｍ（ｊ）は増加させる処理後の変形補正量、ｃ_ｍ’（ｊ）はステップＳ１０３３またはＳ１０３４でもとめたフレームｊの変形補正量（増加させる処理前の変形補正量）、εは変形補正量ｃ_ｍ（ｊ）の更新値を微増減する係数であり、あらかじめ値を設定する。本ステップでは、ステップＳ１０３２のような変形補正量ｃ_ｍ（ｊ）の更新を行うか否かの判断は行わず、強制的に変形補正量ｃ_ｍ（ｊ）を増加させるので、大きな変化を避けるため、εをたとえば０．２のような値とする。

I do. Here, c _m (j) is a deformation correction amount after processing to be increased, c _m ′ (j) is a deformation correction amount of frame j (deformation correction amount before processing to be increased) stopped in step S1033 or S1034, and ε Is a coefficient that slightly increases or decreases the update value of the deformation correction amount c _m (j), and the value is set in advance. In this step, deformation correction amount c _m of whether to update the _(j) determined as in Step S1032 is not performed, because it increases the force the deformation correction amount c _{m (j),} avoid large changes in Therefore, ε is set to a value such as 0.2, for example.

このように増加された変形補正量ｃ_ｍ（ｊ）を記録し、次フレームでの暫定補正量ｚ_ｍ（ｊ＋１）の計算やステップＳ１０３４の処理に使用される。
残りの処理は第１実施形態および第２実施形態と同じである。
［第４実施形態］
わずかな推定誤差が含まれることによってもミュージカルノイズや近端話者の音がこもるなどの問題が発生する。本実施形態では、このような問題を解決するための手法として、一般的に使用されている原音付加の方法を適用した場合を示す。図７にエコー消去装置１００または１００’の変更する部分を示す。この原音付加の方法は、第１実施形態から第３実施形態までの実施形態と組み合わせることができるが、図８には第２実施形態と組み合わせた処理フローを示す。 Thus recording the increased deformation correction amount c _{m (j),} it is used to process the calculation and step S1034 of the provisional correction amount in the next frame z _{m (j} + 1).
The remaining processes are the same as those in the first and second embodiments.
[Fourth Embodiment]
Even if a slight estimation error is included, problems such as musical noise and near-end speaker's sound occur. In this embodiment, as a technique for solving such a problem, a case where a generally used method for adding original sound is applied is shown. FIG. 7 shows a part to be changed of the echo canceling apparatus 100 or 100 ′. This original sound adding method can be combined with the first to third embodiments, but FIG. 8 shows a processing flow combined with the second embodiment.

ステップＳ５Ａの追加
積算部５Ａでは、収音信号振幅スペクトル｜Ｙ_ω｜に（１−α）を積算する。ここで、αはエコー消去信号振幅スペクトル｜Ｅ_ω｜と収音信号振幅スペクトル｜Ｙ_ω｜との比をあらかじめ定める値であり、たとえば、α＝０．９９などの値である。
ステップＳ５Ｂの追加
積算部５Ｂでは、エコー消去信号振幅スペクトル｜Ｅ_ω｜にαを積算する。
ステップＳ６の追加
加算部６では、積算部５Ａからの出力と積算部５Ｂからの出力とを加算する。 In the additional accumulation unit 5A in step S5A , (1-α) is accumulated in the collected sound signal amplitude spectrum | Y _ω |. Here, α is a value that predetermines the ratio between the echo cancellation signal amplitude spectrum | E _ω | and the collected sound signal amplitude spectrum | Y _ω |, for example, α = 0.99.
Adding integration section 5B in step S5B, echo-canceled signal amplitude spectrum | integrating the α to | E _omega.
The additional adding unit 6 in step S6 adds the output from the integrating unit 5A and the output from the integrating unit 5B.

残りの処理は第１実施形態、第２実施形態、および第３実施形態と同じである。
［第５実施形態］
本実施形態では、再生信号のシングルトーク状態か否かの判断手段１０４１を目的成分選択計算部１０４’に追加している。この機能構成例を図９、１０に示す。この方法の場合、図２および図４に示したシングルトーク検出装置２００は不要である。図９は第１実施形態から変更した場合であり、図１０は第２実施形態から変更した場合である。図１１は図９の機能構成例（第１実施形態からの変更）の場合の処理フローを示す図である。図１０の機能構成例の場合も、変更箇所は同じであり、再生信号のシングルトーク検出（ステップＳ１０４１）を追加するだけである。 The remaining processes are the same as those in the first embodiment, the second embodiment, and the third embodiment.
[Fifth Embodiment]
In this embodiment, a determination means 1041 for determining whether or not the reproduction signal is in a single talk state is added to the target component selection calculation unit 104 ′. Examples of this functional configuration are shown in FIGS. In the case of this method, the single talk detecting device 200 shown in FIGS. 2 and 4 is unnecessary. FIG. 9 shows a case where the change is made from the first embodiment, and FIG. 10 shows a case where the change is made from the second embodiment. FIG. 11 is a diagram showing a processing flow in the case of the functional configuration example of FIG. 9 (change from the first embodiment). Also in the case of the functional configuration example of FIG. 10, the change place is the same, and only the single talk detection (step S1041) of the reproduction signal is added.

ステップＳ１０４１
目的成分選択計算部１０４’のシングルトーク判断手段１０４１では、すべての周波数成分の振幅比ΔＡ_ωが1/β’以上の時、再生信号のシングルトーク状態と判断し、シングルトーク状態であることを示す情報を出力する。ここで、β’はあらかじめ定める値であり、たとえばβ’＝１０のようなβよりも大きな値を設定する。
目的成分選択計算部１０４’で行う、ステップＳ１０４２やステップＳ１０４では、このシングルトーク状態か否かを示す情報を用いて、これらの処理を行う。 Step S1041
The single talk determination means 1041 of the target component selection calculation unit 104 ′ determines that the reproduction signal is in a single talk state when the amplitude ratio ΔA _ω of all frequency components is 1 / β ′ or more, and determines that the single talk state exists. The information shown is output. Here, β ′ is a predetermined value, for example, a value larger than β such as β ′ = 10 is set.
In step S1042 and step S104 performed by the target component selection calculation unit 104 ′, these processes are performed using information indicating whether or not the single talk state is set.

残りの処理は第１実施形態、第２実施形態、第３実施形態、および第４実施形態と同じである。
なお、本発明のすべての実施形態は、上記の処理手順の全部または一部を、コンピュータと当該コンピュータを動作させるプログラムによっても実行することができる。また、当該プログラムはコンピュータ読み取り可能な記録媒体に記録しておき、必要に応じてコンピュータに読み取らせて実行することも可能である。
［実験例］
実験では、第２実施形態に第３実施形態から第５実施形態での変更を適用したエコー消去装置を用いて従来方法との違いを確認した。図１２は本実験で使用したエコー処理装置の処理フローを示す図である。 The remaining processes are the same as those in the first embodiment, the second embodiment, the third embodiment, and the fourth embodiment.
In all the embodiments of the present invention, all or a part of the above-described processing procedure can be executed by a computer and a program for operating the computer. Further, the program can be recorded on a computer-readable recording medium, and can be read and executed by a computer as necessary.
[Experimental example]
In the experiment, the difference from the conventional method was confirmed using an echo canceller that applied the changes in the third to fifth embodiments to the second embodiment. FIG. 12 is a diagram showing a processing flow of the echo processing apparatus used in this experiment.

なお、サンプリング周波数は１６ｋＨｚとし、残響時間２００ｍｓの部屋で実測したインパルス応答を２０４８点で打ち切り、与えた。本発明のエコー消去装置では、周波数分析点数を５１２点、周波数帯域のグループ数を３２、ξ＝０．７、ν＝０．４、β＝２．５、β’＝１０と設定した。なお、適応フィルタはステップサイズ０．５、タップ数Ｌ＝２０４８の学習同定アルゴリズムとし、送話音声存在区間で適応を停止させた。
図１３に各信号の時間波形を、図１４に各信号をパワーエンベローブに変換したエコー抑圧量を示す。図１３で、Ａはエコー信号を、Ｂは送話信号を、Ｃは適応フィルタによるエコー消去信号を、Ｄは本発明のエコー消去方法によるエコー消去信号を示している。図１４では、点線は収音信号、細線は適応フィルタによるエコー消去の比率、太線は本発明のエコー消去方法によるエコー消去の比率を示している。また、区間（１）は受話シングルトーク状態、区間（２）は送話シングルトーク状態、区間（３）はダブルトーク状態、区間（４）はステレオ信号の相関による適応フィルタの誤収束の影響を確認するために左右の再生信号を入れ替えた受話シングルトーク状態である。 The sampling frequency was 16 kHz, and impulse responses measured in a room with a reverberation time of 200 ms were cut off at 2048 points. In the echo canceller of the present invention, the number of frequency analysis points is set to 512, the number of frequency band groups is set to 32, ξ = 0.7, ν = 0.4, β = 2.5, and β ′ = 10. Note that the adaptive filter is a learning identification algorithm with a step size of 0.5 and the number of taps L = 2048, and the adaptation is stopped in the transmission voice existence section.
FIG. 13 shows a time waveform of each signal, and FIG. 14 shows an echo suppression amount obtained by converting each signal into a power envelope. In FIG. 13, A is an echo signal, B is a transmission signal, C is an echo cancellation signal by an adaptive filter, and D is an echo cancellation signal by the echo cancellation method of the present invention. In FIG. 14, the dotted line indicates the collected sound signal, the thin line indicates the ratio of echo cancellation by the adaptive filter, and the thick line indicates the ratio of echo cancellation by the echo cancellation method of the present invention. In addition, the section (1) is the received single talk state, the section (2) is the transmitted single talk state, the section (3) is the double talk state, and the section (4) is the influence of the misconvergence of the adaptive filter due to the correlation of the stereo signal. In order to confirm, it is an incoming single talk state in which left and right reproduction signals are switched.

図１４より、区間（１）において、本発明によるエコー抑圧量は約４０ｄＢに達し、適応フィルタと比べて少なくとも約３０ｄＢエコーを低減している。区間（２）では、出力信号の波形が送話信号の波形とほぼ同じであり、送話音声に悪影響が無いことが確認できる。区間（４）では、本発明のエコー消去方法が瞬時にエコーを４０ｄＢ程度抑圧し、エコー経路の変動に頑健であることが分かる。これに対し、適応フィルタでは残留エコーが区間の初期には多いことが分かる。図１３の区間（３）では、本発明のエコー消去方法は、送話信号の波形をほぼ復元していることが分かる。また、内観聴取からミュージカルノイズがほとんど無いことも確認した。このように、本発明のエコー消去方法を用いることで、使用する部屋の環境の変化などにも即応でき、送話音声パワーを保持したままでエコーを抑圧できることが分かった。 From FIG. 14, in the section (1), the echo suppression amount according to the present invention reaches about 40 dB, and at least about 30 dB echo is reduced as compared with the adaptive filter. In section (2), the waveform of the output signal is almost the same as the waveform of the transmission signal, and it can be confirmed that there is no adverse effect on the transmission voice. In section (4), it can be seen that the echo cancellation method of the present invention instantaneously suppresses the echo by about 40 dB, and is robust to fluctuations in the echo path. On the other hand, it can be seen that the adaptive filter has many residual echoes at the beginning of the interval. In section (3) of FIG. 13, it can be seen that the echo cancellation method of the present invention almost restores the waveform of the transmitted signal. In addition, it was also confirmed from the introspection that there was almost no musical noise. Thus, it has been found that by using the echo canceling method of the present invention, it is possible to immediately respond to changes in the environment of the room used, and to suppress echoes while maintaining the transmission voice power.

従来の多チャンネルエコー消去装置の機能構成例を示す図。The figure which shows the function structural example of the conventional multichannel echo cancellation apparatus. 第１実施形態のエコー消去装置１００の機能構成例を示す図。The figure which shows the function structural example of the echo cancellation apparatus 100 of 1st Embodiment. 第１実施形態のエコー消去装置１００の処理フローを示す図。The figure which shows the processing flow of the echo cancellation apparatus 100 of 1st Embodiment. 第２実施形態のエコー消去装置１００’の機能構成例を示す図。The figure which shows the function structural example of the echo cancellation apparatus 100 'of 2nd Embodiment. 第２実施形態のエコー消去装置１００’の処理フローを示す図。The figure which shows the processing flow of the echo cancellation apparatus 100 'of 2nd Embodiment. 第３実施形態の処理フローを示す図。The figure which shows the processing flow of 3rd Embodiment. 第４実施形態の機能構成例を示す図。The figure which shows the function structural example of 4th Embodiment. 第２実施形態と組み合わせた第４実施形態の処理フローを示す図。The figure which shows the processing flow of 4th Embodiment combined with 2nd Embodiment. 第５実施形態のエコー消去装置１００’’の機能構成例を示す図。The figure which shows the function structural example of the echo cancellation apparatus 100 '' of 5th Embodiment. 第５実施形態のエコー消去装置１００’’’の機能構成例を示す図。The figure which shows the function structural example of the echo cancellation apparatus 100 '' '' of 5th Embodiment. 第５実施形態のエコー消去装置１００’’の処理フローを示す図。The figure which shows the processing flow of the echo cancellation apparatus 100 '' of 5th Embodiment. 実験で使用したエコー処理装置の処理フローを示す図。The figure which shows the processing flow of the echo processing apparatus used in experiment. 各信号の時間波形を示す図。The figure which shows the time waveform of each signal. 各信号をパワーエンベローブに変換したエコー抑圧量を示す図。The figure which shows the echo suppression amount which converted each signal into the power envelope.

Claims

入力された複数チャンネルの再生信号を加算し、加算再生信号を出力する総和部と、
上記加算再生信号を周波数領域に変換し、周波数成分の分析を行い、加算再生信号振幅スペクトルを出力する第１の周波数分析部と、
入力された収音信号を周波数領域に変換し、周波数成分の分析を行い、収音信号周波数成分を出力する第２の周波数分析部と、
上記加算再生信号振幅スペクトルと上記収音信号周波数成分をそれぞれ複数成分から構成されるグループに分け、当該グループごとに、グループ内で最大となる上記加算再生信号振幅スペクトルと該グループ内で最大となる上記収音信号周波数成分の値との振幅比から該グループのエコーの振幅スペクトルである推定エコー振幅スペクトルを推定し出力するエコー振幅スペクトル計算部と、
上記収音信号周波数成分と上記推定エコー振幅スペクトルの周波数成分ごとの振幅比からエコー消去信号周波数成分を出力する目的成分選択計算部と、
上記エコー消去信号周波数成分を時間領域に変換し、出力信号を出力する周波数合成部と、
を備えるエコー消去装置。 A summing unit for adding the input reproduction signals of a plurality of channels and outputting the addition reproduction signal;
A first frequency analysis unit that converts the addition reproduction signal into a frequency domain, analyzes a frequency component, and outputs an addition reproduction signal amplitude spectrum;
A second frequency analysis unit that converts the input sound pickup signal into a frequency domain, analyzes the frequency component, and outputs the sound pickup signal frequency component;
The additive reproduction signal amplitude spectrum and the collected sound signal frequency component are divided into groups each composed of a plurality of components, and for each group, the additive reproduction signal amplitude spectrum that is maximum in the group and maximum in the group. An echo amplitude spectrum calculation unit that estimates and outputs an estimated echo amplitude spectrum that is an amplitude spectrum of the echo of the group from an amplitude ratio with the value of the collected sound signal frequency component ;
A target component selection calculator that outputs an echo cancellation signal frequency component from an amplitude ratio for each frequency component of the collected sound signal frequency component and the estimated echo amplitude spectrum;
A frequency synthesizer that converts the echo cancellation signal frequency component into a time domain and outputs an output signal;
An echo canceller comprising:

入力される複数チャンネルの再生信号を、チャンネルごとに周波数領域に変換し、周波数成分の分析を行い、再生信号振幅スペクトルを出力する第１の周波数分析部と、
複数チャンネルの上記再生信号振幅スペクトルを加算し、加算再生信号振幅スペクトルを出力する総和部と、
入力された収音信号を周波数領域に変換し、周波数成分の分析を行い、収音信号周波数成分を出力する第２の周波数分析部と、
上記加算再生信号振幅スペクトルと上記収音信号周波数成分をそれぞれ複数成分から構成されるグループに分け、当該グループごとに、グループ内で最大となる上記加算再生信号振幅スペクトルと該グループ内で最大となる上記収音信号周波数成分の値との振幅比から該グループのエコーの振幅スペクトルである推定エコー振幅スペクトルを推定し出力するエコー振幅スペクトル計算部と、
上記収音信号周波数成分と上記推定エコー振幅スペクトルの周波数成分ごとの振幅比からエコー消去信号周波数成分を出力する目的成分選択計算部と、
上記エコー消去信号周波数成分を時間領域に変換し、出力信号を出力する周波数合成部と、
を備えるエコー消去装置。 A first frequency analyzer that converts the input reproduction signals of a plurality of channels into a frequency domain for each channel, analyzes a frequency component, and outputs a reproduction signal amplitude spectrum;
A summing unit for adding the reproduced signal amplitude spectra of a plurality of channels and outputting an added reproduced signal amplitude spectrum;
A second frequency analysis unit that converts the input sound pickup signal into a frequency domain, analyzes the frequency component, and outputs the sound pickup signal frequency component;
The additive reproduction signal amplitude spectrum and the collected sound signal frequency component are divided into groups each composed of a plurality of components, and for each group, the additive reproduction signal amplitude spectrum that is maximum in the group and maximum in the group. An echo amplitude spectrum calculation unit that estimates and outputs an estimated echo amplitude spectrum that is an amplitude spectrum of the echo of the group from an amplitude ratio with the value of the collected sound signal frequency component ;
A target component selection calculator that outputs an echo cancellation signal frequency component from an amplitude ratio for each frequency component of the collected sound signal frequency component and the estimated echo amplitude spectrum;
A frequency synthesizer that converts the echo cancellation signal frequency component into a time domain and outputs an output signal;
An echo canceller comprising:

請求項１または２記載のエコー消去装置であって、
再生信号のシングルトークか否かの状態を示す信号も入力でき、再生信号のシングルトーク状態の場合に、エコー消去信号周波数成分を０として出力する上記目的成分選択計算部
を備えるエコー消去装置。 The echo canceller according to claim 1 or 2, wherein
An echo canceling device comprising the above-described target component selection calculating unit that can also input a signal indicating whether or not a single talk of a reproduction signal is input and outputs the echo canceling signal frequency component as 0 in the case of a single talk state of the reproduction signal.

請求項１または２記載のエコー消去装置であって、
上記収音信号周波数成分と上記推定エコー振幅スペクトルの周波数成分ごとの振幅比から再生信号のシングルトークか否かの状態を判断する手段を有し、再生信号のシングルトーク状態の場合に、エコー消去信号周波数成分を０として出力する上記目的成分選択計算部
を備えるエコー消去装置。 The echo canceller according to claim 1 or 2, wherein
A means for determining whether or not the reproduced signal is in single talk from the amplitude ratio of each frequency component of the collected sound signal and the estimated echo amplitude spectrum, and echo cancellation in the case of a single talk state of the reproduced signal. An echo canceller comprising the target component selection calculation unit that outputs a signal frequency component as zero.

入力された複数チャンネルの再生信号を加算し、加算再生信号を出力する総和部と、
上記加算再生信号を所定の時間長のフレーム単位で周波数領域に変換し、周波数成分の分析を行い、加算再生信号振幅スペクトルを出力する第１の周波数分析部と、
入力された収音信号をフレーム単位で周波数領域に変換し、周波数成分の分析を行い、収音信号周波数成分を出力する第２の周波数分析部と、
上記加算再生信号振幅スペクトルと上記収音信号周波数成分をそれぞれ複数成分から構成されるグループに分け、当該グループごとに、グループ内で最大となる上記加算再生信号振幅スペクトルに対する該グループ内で最大となる上記収音信号周波数成分の値の振幅比と過去のフレームの補正量とから現フレームに対する補正量を算出し、現フレームに対する補正量を上記加算再生信号振幅スペクトルに乗算することにより推定エコー振幅スペクトルを推定し出力するエコー振幅スペクトル計算部と、
上記収音信号周波数成分と上記推定エコー振幅スペクトルの周波数成分ごとの振幅比からエコー消去信号周波数成分を出力すると共に、上記再生信号のシングルトーク状態の場合であって、上記収音信号周波数成分と上記推定エコー振幅スペクトルとの振幅比があらかじめ定めた値未満の時に、現フレームに対する補正量を増加させたものを次フレーム以降の補正量算出のための補正量とする目的成分選択計算部と、
上記エコー消去信号周波数成分を時間領域に変換し、出力信号を出力する周波数合成部と、
を備えるエコー消去装置。 A summing unit for adding the input reproduction signals of a plurality of channels and outputting the addition reproduction signal;
A first frequency analysis unit that converts the addition reproduction signal into a frequency domain in a frame unit of a predetermined time length, analyzes a frequency component, and outputs an addition reproduction signal amplitude spectrum;
A second frequency analysis unit that converts the input sound pickup signal into a frequency domain in units of frames, analyzes the frequency component, and outputs the sound pickup signal frequency component;
The additive reproduction signal amplitude spectrum and the collected sound signal frequency component are divided into groups each composed of a plurality of components, and each group has the maximum in the group with respect to the additive reproduction signal amplitude spectrum that is maximum in the group. An estimated echo amplitude spectrum is calculated by calculating a correction amount for the current frame from the amplitude ratio of the value of the collected sound signal frequency component and the correction amount of the past frame, and multiplying the additional reproduction signal amplitude spectrum by the correction amount for the current frame. An echo amplitude spectrum calculation unit for estimating and outputting
An echo cancellation signal frequency component is output from an amplitude ratio for each frequency component of the collected sound signal frequency component and the estimated echo amplitude spectrum, and the reproduced signal is in a single talk state, and the collected sound signal frequency component and when less than a value amplitude ratio is predetermined between the estimated echo magnitude spectrum, purpose component selection calculating section shall be the correction amount for the correction amount for the next frame after those increasing the correction amount calculation for the current frame When,
A frequency synthesizer that converts the echo cancellation signal frequency component into a time domain and outputs an output signal ;
An echo canceller comprising:

入力される複数チャンネルの再生信号を、所定の時間長のフレーム単位でチャンネルごとに周波数領域に変換し、周波数成分の分析を行い、再生信号振幅スペクトルを出力する第１の周波数分析部と、  A first frequency analyzer that converts the input reproduction signals of a plurality of channels into a frequency domain for each channel in a frame unit of a predetermined time length, analyzes a frequency component, and outputs a reproduction signal amplitude spectrum;
複数チャンネルの上記再生信号振幅スペクトルを加算し、加算再生信号振幅スペクトルを出力する総和部と、  A summing unit for adding the reproduced signal amplitude spectra of a plurality of channels and outputting an added reproduced signal amplitude spectrum;
入力された収音信号をフレーム単位で周波数領域に変換し、周波数成分の分析を行い、収音信号周波数成分を出力する第２の周波数分析部と、  A second frequency analysis unit that converts the input sound pickup signal into a frequency domain in units of frames, analyzes the frequency component, and outputs the sound pickup signal frequency component;
上記加算再生信号振幅スペクトルと上記収音信号周波数成分をそれぞれ複数成分から構成されるグループに分け、当該グループごとに、グループ内で最大となる上記加算再生信号振幅スペクトルに対する該グループ内で最大となる上記収音信号周波数成分の値の振幅比と過去のフレームの補正量とから現フレームに対する補正量を算出し、現フレームに対する補正量を上記加算再生信号振幅スペクトルに乗算することにより推定エコー振幅スペクトルを推定し出力するエコー振幅スペクトル計算部と、  The additive reproduction signal amplitude spectrum and the collected sound signal frequency component are divided into groups each composed of a plurality of components, and each group has the maximum in the group with respect to the additive reproduction signal amplitude spectrum that is maximum in the group. An estimated echo amplitude spectrum is calculated by calculating a correction amount for the current frame from the amplitude ratio of the value of the collected sound signal frequency component and the correction amount of the past frame, and multiplying the additional reproduction signal amplitude spectrum by the correction amount for the current frame. An echo amplitude spectrum calculation unit for estimating and outputting
上記収音信号周波数成分と上記推定エコー振幅スペクトルの周波数成分ごとの振幅比からエコー消去信号周波数成分を出力すると共に、上記再生信号のシングルトーク状態の場合であって、上記収音信号周波数成分と上記推定エコー振幅スペクトルとの振幅比があらかじめ定めた値未満の時に、現フレームに対する補正量を増加させたものを次フレーム以降の補正量算出のための補正量とする目的成分選択計算部と、  An echo cancellation signal frequency component is output from an amplitude ratio for each frequency component of the collected sound signal frequency component and the estimated echo amplitude spectrum, and the reproduced signal is in a single talk state, and the collected sound signal frequency component and When the amplitude ratio with the estimated echo amplitude spectrum is less than a predetermined value, a target component selection calculation unit that sets the correction amount for the current frame as a correction amount for calculating the correction amount for the next frame and thereafter,
上記エコー消去信号周波数成分を時間領域に変換し、出力信号を出力する周波数合成部と、  A frequency synthesizer that converts the echo cancellation signal frequency component into a time domain and outputs an output signal;
を備えるエコー消去装置。  An echo canceller comprising:

請求項１から６のいずれかに記載のエコー消去装置であって、
エコー消去信号周波数成分にあらかじめ定めた第１の係数を乗ずる第１の積算部と、
収音信号周波数成分にあらかじめ定めた第２の係数を乗ずる第２の積算部と、
上記第１の積算部の出力と、上記第２の積算部の出力とを加算する加算部も備え、
上記加算部からの出力を時間領域に変換し、出力信号を出力する周波数合成部
を備えるエコー消去装置。 The echo canceller according to any one of claims 1 to 6 ,
A first integration unit that multiplies the echo cancellation signal frequency component by a predetermined first coefficient;
A second integration unit that multiplies the sound collection signal frequency component by a predetermined second coefficient;
An addition unit for adding the output of the first integration unit and the output of the second integration unit;
An echo canceller comprising: a frequency synthesizer that converts an output from the adder into a time domain and outputs an output signal.

総和部で、入力された複数チャンネルの再生信号を加算し、加算再生信号を出力し、
第１の周波数分析部で、上記加算再生信号を周波数領域に変換し、周波数成分の分析を行い、加算再生信号振幅スペクトルを出力し、
第２の周波数分析部で、入力された収音信号を周波数領域に変換し、周波数成分の分析を行い、収音信号周波数成分を出力し、
エコー振幅スペクトル計算部で、上記加算再生信号振幅スペクトルと上記収音信号周波数成分をそれぞれ複数成分から構成されるグループに分け、当該グループごとに、グループ内で最大となる上記加算再生信号振幅スペクトルと該グループ内で最大となる上記収音信号周波数成分の値との振幅比から該グループのエコーの振幅スペクトルである推定エコー振幅スペクトルを推定し出力し、
目的成分選択計算部で、上記収音信号周波数成分と上記推定エコー振幅スペクトルの周波数成分ごとの振幅比からエコー消去信号周波数成分を出力し、
周波数合成部で、上記エコー消去信号周波数成分を時間領域に変換し、出力信号を出力する、
ことを備えるエコー消去方法。 The summation unit adds the input playback signals of multiple channels and outputs the added playback signal.
In the first frequency analysis unit, the addition reproduction signal is converted into the frequency domain, the frequency component is analyzed, and the addition reproduction signal amplitude spectrum is output,
In the second frequency analysis unit, the input sound pickup signal is converted into the frequency domain, the frequency component is analyzed, and the sound pickup signal frequency component is output,
The echo amplitude spectrum calculation unit divides the additional reproduction signal amplitude spectrum and the collected sound signal frequency component into groups each composed of a plurality of components, and for each group, the additional reproduction signal amplitude spectrum that is maximum in the group, Estimating and outputting an estimated echo amplitude spectrum, which is an amplitude spectrum of the echo of the group, from an amplitude ratio with the value of the collected sound signal frequency component that is maximum within the group ;
The target component selection calculation unit outputs an echo cancellation signal frequency component from the amplitude ratio of each frequency component of the collected sound signal frequency component and the estimated echo amplitude spectrum,
In the frequency synthesizer, the echo cancellation signal frequency component is converted into the time domain and an output signal is output.
An echo cancellation method comprising:

第１の周波数分析部で、入力される複数チャンネルの再生信号を、チャンネルごとに周波数領域に変換し、周波数成分の分析を行い、再生信号振幅スペクトルを出力し、
総和部で、複数チャンネルの上記再生信号振幅スペクトルを加算し、加算再生信号振幅スペクトルを出力し、
第２の周波数分析部で、入力された収音信号を周波数領域に変換し、周波数成分の分析を行い、収音信号周波数成分を出力し、
エコー振幅スペクトル計算部で、上記加算再生信号振幅スペクトルと上記収音信号周波数成分をそれぞれ複数成分から構成されるグループに分け、当該グループごとに、グループ内で最大となる上記加算再生信号振幅スペクトルと該グループ内で最大となる上記収音信号周波数成分の値との振幅比から該グループのエコーの振幅スペクトルである推定エコー振幅スペクトルを推定し出力し、
目的成分選択計算部で、上記収音信号周波数成分と上記推定エコー振幅スペクトルの周波数成分ごとの振幅比からエコー消去信号周波数成分を出力し、
周波数合成部で、上記エコー消去信号周波数成分を時間領域に変換し、出力信号を出力する、
ことを備えるエコー消去方法。 In the first frequency analysis unit, the input reproduction signal of a plurality of channels is converted into a frequency domain for each channel, the frequency component is analyzed, and the reproduction signal amplitude spectrum is output,
In the summation unit, add the reproduction signal amplitude spectrum of multiple channels, and output the addition reproduction signal amplitude spectrum,
In the second frequency analysis unit, the input sound pickup signal is converted into the frequency domain, the frequency component is analyzed, and the sound pickup signal frequency component is output,
The echo amplitude spectrum calculation unit divides the additional reproduction signal amplitude spectrum and the collected sound signal frequency component into groups each composed of a plurality of components, and for each group, the additional reproduction signal amplitude spectrum that is maximum in the group, Estimating and outputting an estimated echo amplitude spectrum, which is an amplitude spectrum of the echo of the group, from an amplitude ratio with the value of the collected sound signal frequency component that is maximum within the group ;
The target component selection calculation unit outputs an echo cancellation signal frequency component from the amplitude ratio of each frequency component of the collected sound signal frequency component and the estimated echo amplitude spectrum,
In the frequency synthesizer, the echo cancellation signal frequency component is converted into the time domain and an output signal is output.
An echo cancellation method comprising:

請求項８または９記載のエコー消去方法であって、
上記目的成分選択計算部で、再生信号のシングルトークか否かの状態を示す信号を受信し、再生信号のシングルトーク状態の場合に、エコー消去信号周波数成分を０として出力する
ことを特徴とするエコー消去方法。 The echo cancellation method according to claim 8 or 9 , wherein
The target component selection calculation unit receives a signal indicating whether or not the reproduction signal is in single talk, and outputs the echo cancellation signal frequency component as 0 when the reproduction signal is in single talk state. Echo cancellation method.

請求項８または９記載のエコー消去方法であって、
上記目的成分選択計算部で、上記収音信号周波数成分と上記推定エコー振幅スペクトルの周波数成分ごとの振幅比から再生信号のシングルトークか否かの状態を判断し、再生信号のシングルトーク状態の場合に、エコー消去信号周波数成分を０として出力する
ことを特徴とするエコー消去方法。 The echo cancellation method according to claim 8 or 9 , wherein
In the case of the single talk state of the reproduction signal, the target component selection calculation unit determines whether or not the reproduction signal is a single talk from the amplitude ratio of each frequency component of the collected sound signal frequency and the estimated echo amplitude spectrum. And outputting an echo cancellation signal frequency component as zero.

総和部で、入力された複数チャンネルの再生信号を加算し、加算再生信号を出力し、
第１の周波数分析部で、上記加算再生信号を所定の時間長のフレーム単位で周波数領域に変換し、周波数成分の分析を行い、加算再生信号振幅スペクトルを出力し、
第２の周波数分析部で、入力された収音信号をフレーム単位で周波数領域に変換し、周波数成分の分析を行い、収音信号周波数成分を出力し、
エコー振幅スペクトル計算部で、上記加算再生信号振幅スペクトルと上記収音信号周波数成分をそれぞれ複数成分から構成されるグループに分け、当該グループごとに、グループ内で最大となる上記加算再生信号振幅スペクトルに対する該グループ内で最大となる上記収音信号周波数成分の値の振幅比と過去のフレームの補正量とから現フレームに対する補正量を算出し、現フレームに対する補正量を上記加算再生信号振幅スペクトルに乗算することにより推定エコー振幅スペクトルを推定し出力し、
目的成分選択計算部で、上記収音信号周波数成分と上記推定エコー振幅スペクトルの周波数成分ごとの振幅比からエコー消去信号周波数成分を出力すると共に、上記再生信号のシングルトーク状態の場合であって、上記収音信号周波数成分と上記推定エコー振幅スペクトルとの振幅比があらかじめ定めた値未満の時に、現フレームに対する補正量を増加させたものを次フレーム以降の補正量算出のための補正量とし、
周波数合成部で、上記エコー消去信号周波数成分を時間領域に変換し、出力信号を出力することを特徴とするエコー消去方法。 The summation unit adds the input playback signals of multiple channels and outputs the added playback signal.
In the first frequency analysis unit, the addition reproduction signal is converted into a frequency domain in a frame unit of a predetermined time length, the frequency component is analyzed, and an addition reproduction signal amplitude spectrum is output,
In the second frequency analysis unit, the input sound pickup signal is converted into the frequency domain in units of frames, the frequency component is analyzed, and the sound pickup signal frequency component is output.
The echo amplitude spectrum calculation unit divides the additive reproduction signal amplitude spectrum and the collected sound signal frequency component into groups each composed of a plurality of components, and for each group, the maximum reproduction signal amplitude spectrum within the group is obtained. The correction amount for the current frame is calculated from the amplitude ratio of the collected sound signal frequency component value that is maximum in the group and the correction amount of the past frame, and the correction amount for the current frame is multiplied by the correction signal amplitude spectrum. To estimate and output the estimated echo amplitude spectrum,
In the target component selection calculation unit, the echo cancellation signal frequency component is output from the amplitude ratio for each frequency component of the collected sound signal frequency component and the estimated echo amplitude spectrum, and the reproduction signal is in a single talk state, When the amplitude ratio between the collected sound signal frequency component and the estimated echo amplitude spectrum is less than a predetermined value, the amount of correction for the current frame is set as a correction amount for calculating the correction amount for the next frame,
An echo canceling method comprising: converting a frequency component of the echo canceling signal into a time domain and outputting an output signal in a frequency synthesizing unit .

第１の周波数分析部で、入力される複数チャンネルの再生信号を、所定の時間長のフレーム単位でチャンネルごとに周波数領域に変換し、周波数成分の分析を行い、再生信号振幅スペクトルを出力し、  The first frequency analysis unit converts the input reproduction signal of a plurality of channels into a frequency domain for each channel in a frame unit of a predetermined time length, analyzes a frequency component, and outputs a reproduction signal amplitude spectrum,
総和部で、複数チャンネルの上記再生信号振幅スペクトルを加算し、加算再生信号振幅スペクトルを出力し、  In the summation unit, add the reproduction signal amplitude spectrum of multiple channels, and output the addition reproduction signal amplitude spectrum,
第２の周波数分析部で、入力された収音信号をフレーム単位で周波数領域に変換し、周波数成分の分析を行い、収音信号周波数成分を出力し、  In the second frequency analysis unit, the input sound pickup signal is converted into the frequency domain in units of frames, the frequency component is analyzed, and the sound pickup signal frequency component is output.
エコー振幅スペクトル計算部で、上記加算再生信号振幅スペクトルと上記収音信号周波数成分をそれぞれ複数成分から構成されるグループに分け、当該グループごとに、グループ内で最大となる上記加算再生信号振幅スペクトルに対する該グループ内で最大となる上記収音信号周波数成分の値の振幅比と過去のフレームの補正量とから現フレームに対する補正量を算出し、現フレームに対する補正量を上記加算再生信号振幅スペクトルに乗算することにより推定エコー振幅スペクトルを推定し出力し、  The echo amplitude spectrum calculation unit divides the additive reproduction signal amplitude spectrum and the collected sound signal frequency component into groups each composed of a plurality of components, and for each group, the maximum reproduction signal amplitude spectrum within the group is obtained. The correction amount for the current frame is calculated from the amplitude ratio of the collected sound signal frequency component value that is maximum in the group and the correction amount of the past frame, and the correction amount for the current frame is multiplied by the correction signal amplitude spectrum. To estimate and output the estimated echo amplitude spectrum,
目的成分選択計算部で、上記収音信号周波数成分と上記推定エコー振幅スペクトルの周波数成分ごとの振幅比からエコー消去信号周波数成分を出力すると共に、上記再生信号のシングルトーク状態の場合であって、上記収音信号周波数成分と上記推定エコー振幅スペクトルとの振幅比があらかじめ定めた値未満の時に、現フレームに対する補正量を増加させたものを次フレーム以降の補正量算出のための補正量とし、  In the target component selection calculation unit, the echo cancellation signal frequency component is output from the amplitude ratio for each frequency component of the collected sound signal frequency component and the estimated echo amplitude spectrum, and the reproduction signal is in a single talk state, When the amplitude ratio between the collected sound signal frequency component and the estimated echo amplitude spectrum is less than a predetermined value, the amount of correction for the current frame is set as a correction amount for calculating the correction amount for the next frame,
周波数合成部で、上記エコー消去信号周波数成分を時間領域に変換し、出力信号を出力する  The frequency synthesizer converts the echo cancellation signal frequency component into the time domain and outputs the output signal.
ことを特徴とするエコー消去方法。An echo canceling method characterized by the above.

請求項８から１３のいずれかに記載のエコー消去方法であって、
第１の積算部で、エコー消去信号周波数成分にあらかじめ定めた第１の係数を乗じ、
第２の積算部で、収音信号周波数成分にあらかじめ定めた第２の係数を乗じ、
加算部で、上記第１の積算部の出力と、上記第２の積算部の出力とを加算し、
周波数合成部では、上記加算部からの出力を時間領域に変換し、出力信号を出力する
ことを特徴とするエコー消去方法。 The echo cancellation method according to any one of claims 8 to 13 ,
In the first integration unit, the echo cancellation signal frequency component is multiplied by a predetermined first coefficient,
The second integration unit multiplies the collected sound signal frequency component by a predetermined second coefficient,
The adding unit adds the output of the first integrating unit and the output of the second integrating unit,
An echo cancellation method, wherein the frequency synthesizer converts the output from the adder to the time domain and outputs an output signal.

請求項１から７のいずれかに記載のエコー消去装置をコンピュータにより実現するエコー消去プログラム。 Echo cancellation program realized by a computer of the echo canceller according to any one of claims 1 to 7.

請求項１５記載のエコー消去プログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium on which the echo cancellation program according to claim 15 is recorded.