JP2009033549A

JP2009033549A - Speech processor and echo removing method

Info

Publication number: JP2009033549A
Application number: JP2007196235A
Authority: JP
Inventors: Satoshi Ishigaki; 智石垣
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2007-07-27
Filing date: 2007-07-27
Publication date: 2009-02-12
Anticipated expiration: 2027-07-27
Also published as: JP5100234B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech processor and an echo removing method capable of producing the effects of a trade-off solution between a learning speed of a filter and prediction accuracy. <P>SOLUTION: The speech processor is provided with: a first adaptive filter for generating a first predictive echo signal on the basis of a reception signal; a second adaptive filter for learning at a higher speed than that of the first adaptive filter on the basis of the reception signal; an adder for reducing the echo signal on the basis of the subtraction of the first predictive echo signal from the transmission signal including the echo signal resulting from creeping of the reception signal from the reception signal path to the transmission signal path; and a correction part for updating the coefficients of the first and second adaptive filters based on the residual not to be reduced by subtraction. When mean and variance of ratio of coefficients between the first and second adaptive filters before subtraction satisfy a prescribed relation, the correction part corrects the coefficients of the first adaptive filter based on the coefficients of the second adaptive filter. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、例えばハンドセットの代わりにスピーカーとマイクロホンを用いて通話を行うハンズフリー機能を有する音声通信装置に適用される音声処理装置およびエコー除去方法に係わり、特に、エコーキャンセラが受話信号を取得してから再生された音声がマイクロホンに収音されてエコーキャンセラに入力されるまでの経路においてゲインの変更が行われた場合でも、エコーキャンセラの性能の悪化を防ぐことができる音声処理装置およびエコー除去方法に関する。 The present invention relates to a voice processing device and an echo removal method applied to a voice communication device having a hands-free function for making a call using a speaker and a microphone instead of a handset, for example, and in particular, an echo canceller acquires a received signal. Even if the gain is changed in the path from when the reproduced sound is picked up by the microphone until it is input to the echo canceller, the speech processing device and the echo canceller can prevent the performance of the echo canceller from deteriorating. Regarding the method.

ハンズフリーによる通話系においては、スピーカーから再生された音声がマイクロホンに収録されることによって音響エコーが生じる。この音響エコーを除去する技術をエコーキャンセラといい、従来より研究されている（例えば、特許文献１参照。）。 In a hands-free telephone call system, sound echo is generated by recording sound reproduced from a speaker on a microphone. A technique for removing the acoustic echo is called an echo canceller and has been studied conventionally (see, for example, Patent Document 1).

スピーカーとマイクを用いた音声通話においては、スピーカーから再生された音声がマイクから録音されて送話側に再送信される音響エコーが生じる。音響エコーを抑制する音響エコーキャンセラは、スピーカーから再生された音声がマイクに録音されるまでの室内伝達系の伝達関数を学習し、学習結果より予測したエコーを録音音声から減算することにより音響エコーを除去する。 In a voice call using a speaker and a microphone, an acoustic echo is generated in which sound reproduced from the speaker is recorded from the microphone and retransmitted to the transmitting side. An acoustic echo canceller that suppresses acoustic echo learns the transfer function of the room transfer system until the sound reproduced from the speaker is recorded on the microphone, and subtracts the echo predicted from the learning result from the recorded sound. Remove.

一般的なエコーキャンセラにおいては、スピーカーやマイクなどの外部ゲインが操作された場合、スピーカー・マイク間の伝達関数が変化するため、再学習が必要になり一時的にエコー除去性能が落ちる。一般的なエコーキャンセラでは、エコーを予測するフィルタが室内伝達系に近づくように学習を行う。このとき、学習速度が速いフィルタはエコーの予測精度が悪く、学習速度が遅いフィルタは学習するまでの時間は長いが予測精度は良い。 In a general echo canceller, when an external gain such as a speaker or a microphone is operated, the transfer function between the speaker and the microphone changes, so that re-learning is required and the echo removal performance temporarily decreases. In a general echo canceller, learning is performed so that a filter for predicting an echo approaches an indoor transmission system. At this time, a filter with a fast learning speed has poor echo prediction accuracy, and a filter with a slow learning speed has a long prediction time but good prediction accuracy.

さて特許文献１に記載されたエコーキャンセラは、特性変化に対して素早く追随できるフィルタ係数の学習更新装置なるものであり、フィルタ係数算定手段の時定数長さを、算定するパワー比と現パワー比との比較により設定するとある（「該算定手段で算定するパワー比が現パワー比を上回るときには該算定手段の時定数を短く設定し、下回るときには長い時定数を設定することを特徴とする」）。しかしながら、かかる設定がフィルタの学習速度と予測精度のトレードオフ解決の効果をもたらす技術の開示はなされていない。
特開平８−６５２１５公報 The echo canceller described in Patent Document 1 is a learning and updating device for filter coefficients that can quickly follow changes in characteristics. The time constant length of the filter coefficient calculation means is calculated by calculating the power ratio and the current power ratio. ("Characterized in that when the power ratio calculated by the calculation means exceeds the current power ratio, the time constant of the calculation means is set short, and when the power ratio is low, the long time constant is set") . However, there is no disclosure of a technique in which such a setting brings about an effect of solving a tradeoff between the learning speed of the filter and the prediction accuracy.
JP-A-8-65215

本発明の目的は、フィルタの学習速度と予測精度のトレードオフ解決の効果をもたらす音声処理装置およびエコー除去方法を提供することにある。 An object of the present invention is to provide a speech processing apparatus and an echo removal method that can bring about an effect of solving a trade-off between the learning speed of a filter and the prediction accuracy.

第１の発明は、上記目的を達成するため、受話信号に基づいて第１の予測エコー信号を生成する第１の適応フィルタと、前記受話信号に基づいて前記第１の適応フィルタより高速に学習する第２の適応フィルタと、前記受話信号が受話信号路から送話信号路に回り込むことにより発生するエコー信号を含む送話信号からの前記第１、第2の予測エコー信号の減算に基づいて前記エコー信号を低減する加算器と、前記減算によって低減されない残差に基づいて前記第１、第２の適応フィルタの係数を更新する修正部と、前記第１と第2の適応フィルタの係数の比の平均と分散が所定の関係を満たすとき、前記第２の適応フィルタの係数に基づいて前記第１の適応フィルタの係数を修正する適応フィルタ制御部を有することを特徴とする音声処理装置を提供する。 In order to achieve the above object, the first invention is a first adaptive filter that generates a first predicted echo signal based on a received signal, and learns faster than the first adaptive filter based on the received signal. And a subtracting of the first and second predicted echo signals from a transmission signal including an echo signal generated when the reception signal wraps around the transmission signal path from the reception signal path. An adder that reduces the echo signal, a correction unit that updates the coefficients of the first and second adaptive filters based on a residual that is not reduced by the subtraction, and a coefficient of the coefficients of the first and second adaptive filters An audio processing device comprising: an adaptive filter control unit that corrects the coefficient of the first adaptive filter based on the coefficient of the second adaptive filter when the average and variance of the ratio satisfy a predetermined relationship To provide.

本発明の音声処理装置およびエコー除去方法によれば、フィルタの学習速度と予測精度のトレードオフ解決の効果をもたらすことができる。 According to the speech processing device and the echo removal method of the present invention, it is possible to bring about an effect of solving the trade-off between the learning speed of the filter and the prediction accuracy.

以下、本発明の実施例を説明する。 Examples of the present invention will be described below.

本発明による実施例１を図１乃至図６を参照して説明する。
図１は、本発明の実施の形態に係る音声処理装置を適用した音声通信装置を示す。この音声通信装置１０は、ＰＣ（パーソナルコンピュータ）にスピーカー１６Ａ，１６Ｂとマイクロホン１８を接続して構成したものであり、キーボード１２、送話ゲインツマミ１３等を有する下部筐体１１と、下部筐体１１に開閉可能に取り付けられ、ＬＣＤ（液晶ディスプレイ）１５等の表示部を有する上部筐体１４と、下部筐体１１に接続され、一方に受話ゲインツマミ１７を有する左右一対の上記スピーカー１６Ａ，１６Ｂ、上記マイクロホン１８およびマウス１９と、下部筐体１１内に設けられた後述するエコーキャンセラ３１とを備え、ハンドセットの代わりにスピーカー１６Ａ，１６Ｂとマイクロホン１８を使用してネットワーク２０を介して通話を行う、いわゆるハンズフリー通話機能を有する。 A first embodiment of the present invention will be described with reference to FIGS.
FIG. 1 shows a voice communication apparatus to which a voice processing apparatus according to an embodiment of the present invention is applied. The voice communication apparatus 10 is configured by connecting speakers 16A and 16B and a microphone 18 to a PC (personal computer), and includes a lower casing 11 having a keyboard 12, a transmission gain knob 13, and the like, and a lower casing 11. And a pair of left and right speakers 16A and 16B connected to the lower housing 11 and having a reception gain knob 17 on one side, the upper housing 14 having a display unit such as an LCD (liquid crystal display) 15 and the like. A so-called microphone 18 and mouse 19 and an echo canceller 31 (described later) provided in the lower housing 11 are used to make a call via the network 20 using the speakers 16A and 16B and the microphone 18 instead of the handset. Has a hands-free call function.

図２は、音声通信装置１０の音声処理系を示す。この音声通信装置１０は、入力端子２１Ａからスピーカー１６Ａ，１６Ｂに至る受話信号路２２Ａと、マイクロホン１８から出力端子２１Ｂに至る送話信号路２２Ｂと、受話信号路２２Ａ上に設けられ、受話信号ｘ（ｋ）のゲインを受話ゲインツマミ１７の調整量に基づいて調節してスピーカー１６Ａ，１６Ｂに出力する受話ゲイン調節部２３Ａと、送話信号路２２Ｂ上に設けられ、マイクロホン１８から入力した送話入力信号ｓ（ｋ）のゲインを送話ゲインツマミ１３の調整量に基づいて調節する送話ゲイン調節部２３Ｂと、マイクロホン１８に混入したスピーカー１６Ａ，１６Ｂからの音響エコー成分をキャンセルした送話出力信号を出力端子２１Ｂから出力するエコーキャンセラ３１とを備える。 FIG. 2 shows an audio processing system of the audio communication apparatus 10. The voice communication device 10 is provided on the reception signal path 22A from the input terminal 21A to the speakers 16A and 16B, the transmission signal path 22B from the microphone 18 to the output terminal 21B, and the reception signal path 22A. A reception gain adjustment unit 23A that adjusts the gain of (k) based on the adjustment amount of the reception gain knob 17 and outputs it to the speakers 16A and 16B, and a transmission input input from the microphone 18 provided on the transmission signal path 22B. A transmission gain adjusting unit 23B that adjusts the gain of the signal s (k) based on the adjustment amount of the transmission gain knob 13, and a transmission output signal in which acoustic echo components from the speakers 16A and 16B mixed in the microphone 18 are canceled. And an echo canceller 31 that outputs from an output terminal 21B.

図４は、エコーキャンセラ３１の内部構成を示す。エコーキャンセラ３１は、受話信号が受話信号路２２Ａから送話信号路２２Ｂに回り込むことにより発生するエコーを予測して除去する。 FIG. 4 shows the internal configuration of the echo canceller 31. The echo canceller 31 predicts and removes echoes generated when the received signal circulates from the received signal path 22A to the transmitted signal path 22B.

まず、遅延算出部321において、再生信号x(k)と録音信号y(k)を比較し、再生信号と録音信号との間の遅延を求める。次に、遅延除去部322において再生信号x(k)から遅延を除去し、DFT部323においてフーリエ変換を行い、遅延除去後の信号の周波数成分Xkを求める。また、録音信号y(k)に対しても、DFT部324においてフーリエ変換を行い周波数成分Ykを求める。適応フィルタ310、312では、Xkに対してフィルタ係数Hk、H’kを掛け合わせることににより、予測エコー信号を求め、これを録音信号の周波数成分Ykから各加算器311、313において減算することにより、残留エコーEk、E’kを求める。フィルタ係数Hk、H’kは、残留エコーEk、E’kが0に近づくように更新される。適応フィルタ310の残留エコーをIDFT部325において逆フーリエ変換し、エコーキャンセル後の出力信号e(k)を求める。 First, the delay calculation unit 321 compares the reproduction signal x (k) and the recording signal y (k) to obtain a delay between the reproduction signal and the recording signal. Next, the delay removal unit 322 removes the delay from the reproduced signal x (k), and the DFT unit 323 performs Fourier transform to obtain the frequency component Xk of the signal after the delay removal. Also, the DFT unit 324 performs Fourier transform on the recording signal y (k) to obtain the frequency component Yk. The adaptive filters 310 and 312 obtain a predicted echo signal by multiplying Xk by the filter coefficients Hk and H′k, and subtract them in the adders 311 and 313 from the frequency component Yk of the recording signal. Thus, residual echoes Ek and E′k are obtained. The filter coefficients Hk and H′k are updated so that the residual echoes Ek and E′k approach zero. The IDFT unit 325 performs inverse Fourier transform on the residual echo of the adaptive filter 310 to obtain an output signal e (k) after echo cancellation.

さらに、エコーキャンセラ31は、適応フィルタ310、312のフィルタ係数Hk、H’kを比較することにより、送話ゲインツマミ１３あるいは受話ゲインツマミ１７によるゲインが調節された場合等に、適応フィルタ310の係数を修正する適応フィルタ制御部32を備える。 Furthermore, the echo canceller 31 compares the filter coefficients Hk and H′k of the adaptive filters 310 and 312 to adjust the coefficient of the adaptive filter 310 when the gain by the transmission gain knob 13 or the reception gain knob 17 is adjusted. An adaptive filter control unit 32 for correction is provided.

エコーキャンセラ３１の詳細を説明する前に、本実施例による改良が適用可能なエコーキャンセラの動作を一般的に解説する。
図３が一般的なエコーキャンセラのブロック図である。
時刻k における再生信号をx(k) 、録音信号をy(k) 、録音信号からエコーを除去した出力をe(k) とおく。
まず、遅延算出部314において、再生信号x(k)と録音信号y(k)を比較し、再生信号と録音信号との間の遅延を求める。
遅延算出部では、単位時間N 毎に時刻kN におけるマイク入力と参照信号の相互相関C_kN(l) を以下の式で求める。 Before describing the details of the echo canceller 31, the operation of the echo canceller to which the improvement according to the present embodiment can be applied will be generally described.
FIG. 3 is a block diagram of a general echo canceller.
It is assumed that the reproduction signal at time k is x (k), the recording signal is y (k), and the output obtained by removing the echo from the recording signal is e (k).
First, the delay calculation unit 314 compares the reproduction signal x (k) and the recording signal y (k) to obtain a delay between the reproduction signal and the recording signal.
The delay calculation unit obtains the cross correlation C _kN (l) between the microphone input and the reference signal at time kN for each unit time N by the following equation.

C_kN(l) を最大とするl を遅延d(kN) とする。
なお、遅延が変化しないシステムや、遅延がほとんど生じないシステムにおいては、d(kN)を定数とすることもできる。
次に、遅延除去部315において再生信号x(k)から遅延を除去した信号x_kを求め、x_kに対してDFT部316においてフーリエ変換を行い、遅延除去後の信号の周波数成分X_kを求める。

Let l be the delay d (kN) that maximizes C _kN (l).
Note that d (kN) can be a constant in a system in which the delay does not change or a system in which the delay hardly occurs.
Next, determine the signal x _k obtained by removing the delay from the reproduced signal x (k) in the delay removal unit 315 performs a Fourier transform in DFT section 316 with respect to x _k, the frequency components X _k of the signal after the delay removal Ask.

時刻k において、遅延除去後の信号はx_k=[x(k-d(k)), x(k-d(k)-1), …, x(k-d(k)-2L+1)] となる。x_kを離散フーリエ変換し、x_k の周波数成分 At time k, the signal after delay removal is x _k = [x (kd (k)), x (kd (k) −1),..., X (kd (k) −2L + 1)]. _Perform discrete Fourier transform of x _k and frequency components of x _k

X_k=[ X_k (0), X_k (1), …, X_k (L-1)] _{_{X k = [X k (0}} ), X k (1), ..., X k (L-1)]

を求める。
X_k(l)は信号x_k のl 番目の周波数帯を表す複素数である。
また、録音信号y(k)に対しても、DFT部317において
y_k=[y(k), y(k-1), …, y(k-2L+1)]
を離散フーリエ変換し、 y_kの周波数成分
Y_k=[ Y_k (0), Y_k (1), …, Y_k (L-1)] Ask for.
X _k (l) is a complex number representing the l th frequency band of the signal x _k .
Also, for the recording signal y (k), the DFT unit 317
y _k = [y (k), y (k-1),…, y (k-2L + 1)]
Is the discrete Fourier transform of y _k frequency components
Y _k = [Y _k (0), Y _k (1),…, Y _k (L-1)]

を求める。
適応フィルタ310では、エコーキャンセル後の音声データの周波数成分
E_k=[ E_k(0), E_k(1), …, E_k(L-1)]を、 Ask for.
In the adaptive filter 310, the frequency component of the audio data after echo cancellation
E _k = [E _k (0), E _k (1),…, E _k (L-1)]

適応フィルタH_k=[ H_k(0), H_k(1), …, H_k(L-1)] とX_k, Y_kを用いて Using adaptive filters H _k = [H _k (0), H _k (1),…, H _k (L-1)] and X _k , Y _k

E_k(l)= Y_k(l) - H_k(l) X_k(l) (１) E _k (l) = Y _k (l)-H _k (l) X _k (l) (1)

として求める。
H_k (l)は周波数帯l のフィルタ係数で、参照信号とエコーの振幅比、位相差を表す複素数である。
なお、複素数H_k (l) の絶対値、位相をそれぞれ| H_k (l)| 、∠H_k (l) と表記する。| H_k (l)|は周波数帯l のゲイン、∠H_k (l) は位相差を表す。 Asking.
H _{k (l)} is a filter coefficient of a frequency band l, the amplitude ratio of the reference signal and the echo, which is a complex number representing the phase difference.
The absolute value and phase of the complex number H _k (l) are expressed as | H _k (l) | and ∠H _k (l), respectively. | H _k (l) | represents the gain of the frequency band l, and ∠H _k (l) represents the phase difference.

IDFT部318においては、E_k に対して逆フーリエ変換を行い出力信号を求める。
E_kを逆フーリエ変換したものを
e_k=[ e_k (0), e_k (1), …, e_k (2L-1)]
とすると出力信号は以下の式で求まる。
e(k-l)=e_k(l) (0≦l＜L)
また、適応フィルタ310において、フィルタH_k を以下の式で更新する。 The IDFT unit 318 performs inverse Fourier transform on E _k to obtain an output signal.
The inverse Fourier transform of E _k
e _k = [e _k (0), e _k (1),…, e _k (2L-1)]
Then, the output signal is obtained by the following equation.
e (kl) = e _k (l) (0 ≦ l <L)
Further, in the adaptive filter 310 updates the filter H _k by the following equation.

（２）
ここで、αは０＜α＜１を満たす学習係数であり、1に近いほど学習速度が速いが、エコー予測精度が悪くなり、0に近いと学習速度は遅いが学習後のエコー予測精度がよくなる。

(2)
Here, α is a learning coefficient that satisfies 0 <α <1. The closer to 1, the faster the learning speed, but the lower the echo prediction accuracy, the closer to 0, the slower the learning speed, but the lower the echo prediction accuracy after learning. Get better.

次に、本実施の形態の構成、動作を、ブロック図４、５を用いて説明する。
本実施例では、上記適応フィルタ３１０に加えて、学習速度の速いフィルタ３１２を用いて学習を行う。
適応フィルタ３１２の係数はH’_k (H’_k= [ H’_k (0), H’_k (1), …, H’_k (L-1)] )で表す。 Next, the configuration and operation of the present embodiment will be described with reference to block diagrams 4 and 5.
In this embodiment, learning is performed using a filter 312 having a high learning speed in addition to the adaptive filter 310.
The coefficient of the adaptive filter 312 is represented by H ′ _k (H ′ _k = [H ′ _k (0), H ′ _k (1),..., H ′ _k (L−1)]).

また、２つのフィルタの係数をフィルタ制御部３２において制御する。
適応フィルタ３１２においては、係数H’_k を用いて、式（１）と同様に、以下の式でエコーキャンセル後の音声データの周波数成分E’_k=[ E’_k (0), E’_k (1), …, E’_k (L-1)] を求める。 The filter control unit 32 controls the coefficients of the two filters.
In the adaptive filter 312, the frequency component E ′ _k = [E ′ _k (0), E ′ _k of the audio data after echo cancellation using the coefficient H ′ _k in the following equation, using the coefficient H ′ _k. (1),…, E ' _k (L-1)].

E '_k(l)= Y_k (l) - H '_k (l) X_k (l) （３） E ' _k (l) = Y _k (l)-H' _k (l) X _k (l) (3)

E’_kは出力信号を求めるためには用いず、 H’_kの学習にのみ用いる。H’_k は以下の式で更新される。 E _'k is not used to determine the output signal, H' is used only to learn the _k. H ′ _k is updated by the following formula.

ここで、α’ はα＜α'＜１を満たす。
α＜α'であるため、H’_k はH_k よりも高速で学習は速いが、エコー予測精度は落ちる。
式（２），（４）でフィルタH_k 、H’_k を更新した後、適応フィルタ制御部３２において2つのフィルタを比較することによりゲイン変更の検出を行う。

Here, α ′ satisfies α <α ′ <1.
Since α <α ′, H ′ _k is faster than H _k and learning is faster, but the echo prediction accuracy is reduced.
After updating the filters H _k and H ′ _k with the equations (2) and (4), the adaptive filter control unit 32 detects the gain change by comparing the two filters.

スピーカー、マイクなどの外部ゲインが変更された場合、変更前の十分に学習されたフィルタをH_b=[ H_bk (0), H_bk (1), …, H_bk (L-1)] 、変更後の十分に学習されたフィルタをH_a=[ H_ak (0), H_ak (1), …, H_ak (L-1)] とおくと、外部ゲイン変更の前後で各周波数帯の絶対値のみが一定の比率で変化するため、 When the external gain of speakers, microphones, etc. is changed, the fully learned filter before the change is changed to H _b = [H _bk (0), H _bk (1),…, H _bk (L-1)], If the well-learned filter after the change is set as H _a = [H _ak (0), H _ak (1),…, H _ak (L-1)], the frequency band is changed before and after the external gain change. Since only the absolute value changes at a constant rate,

∠H_bk (l)= ∠H_ak (l) （６）
となる。
H’_k はH_k に比べて学習速度が速いため、外部ゲイン変更直後はH_k はH_b に近いままであり、H’_k はH_a に近づく。そのため、H_k とH’_k の関係は、式５，６のH_b とH_a の関係に近いものとなる。 ∠H _bk (l) = ∠H _ak (l) (6)
It becomes.
Since H ′ _k has a higher learning speed than H _k , immediately after the external gain change, H _k remains close to H _b and H ′ _k approaches H _a . Therefore, the relationship H _k and H _'k becomes close relationship of H _b and H _a of formula 5,6.

したがって、H_k とH’_k が式（５），（６）に近い関係を満たすときにH’_k の係数をH_k に代入することにより、ゲインの変化に高速に適応することができる。H’_k は学習速度が速いため予測精度は劣るが、H’_k をH_k に代入した後H_k を学習させることにより、精度を高めることができる。 Thus, H _k and H _'K has the formula (5), H when satisfying the close relationship (6)' by substituting the coefficients of _k is H _k, it is possible to adapt to fast changes in gain. Although H ′ _k has a high learning speed, the prediction accuracy is inferior, but the accuracy can be improved by learning H _k after substituting H ′ _k into H _k .

適応フィルタ制御部３２の動作を、適応フィルタ制御部のブロック図である図５を用いて説明する。
まず、H’_k とH_k の比D_k =[ D_k (0), D_k (1), …, D_k (L-1)] を以下の式で求める。 The operation of the adaptive filter control unit 32 will be described with reference to FIG. 5 which is a block diagram of the adaptive filter control unit.
First, the ratio D _k = [D _k (0), D _k (1),..., D _k (L−1)] between H ′ _k and H _k is _obtained by the following equation.

式５，６が満たされる場合は、| D_k (l)|=a 、∠D_k (l)=0 、となる。
平均値演算部32Aにおいて、D_k (l) の平均D_mを以下の式で求める。

If the expression 5,6 is _{satisfied, | D k (l) |} = a, ∠D k (l) = 0, and it becomes.
In the average value calculation unit 32A, the average D _m of D _k (l) is _obtained by the following equation.

平均値算出後、第一の判断部32Cにおいて、平均値Dmが以下の条件を満たすかどうかを判断する。
−θmin＜∠D_m＜θmin （7）
|D_m|＜Mmin、Mmax＜|D_m| （8）
θmin は位相の0からのずれの許容範囲を表す定数であり、０＜θmin である。
Mmin 、Mmax はH’_k の代入を行うためのゲイン変化の最小値を表す定数であり、０＜Mmin＜１、１＜Mmax を満たす。H_k とH’_k が共に十分学習されたときは式(７)および下に示す式(9)は満たされるが|D_m|=1となるので、式(8)により外部ゲインが変更されていることを確認する。

After calculating the average value, the first determination unit 32C determines whether the average value Dm satisfies the following condition.
−θmin <∠D _m <θmin (7)
| D _m | <Mmin, Mmax <| D _m | (8)
θmin is a constant representing the allowable range of phase deviation from 0, and 0 <θmin.
Mmin and Mmax are constants representing the minimum value of gain change for substituting H ′ _k , and satisfy 0 <Mmin <1 and 1 <Mmax. When both H _k and H ′ _k are sufficiently learned, Equation (7) and Equation (9) below are satisfied, but | D _m | = 1, so the external gain is changed by Equation (8). Make sure.

第一の判断部32Cの条件を満たす場合、分散演算部32Bで分散V_Dk を以下の式で求める。 When the condition of the first determination unit 32C is satisfied, the variance V _Dk is obtained by the following formula in the variance calculation unit 32B.

分散算出後、第二の判断部32Dにおいて分散が以下の条件を満たすことを判断する。
V_Dk＜Vmin （9）
Vmin はD_k (l) の分散の許容範囲を表す定数であり、０＜Vmin である。式８により、H’_k の変化が誤った学習によるものでなく、外部ゲインの変化によるものであることを確認する。

After the variance calculation, the second determination unit 32D determines that the variance satisfies the following condition.
V _Dk <Vmin (9)
Vmin is a constant representing the allowable range of dispersion of D _k (l), and 0 <Vmin. According to Equation 8, it is confirmed that the change in H ′ _k is not due to erroneous learning but due to a change in external gain.

第二の判断部の条件を満たすとき、外部ゲインが変化したとみなす。このとき、係数修正部325においてH_kにH’_k を代入することによりH_kをゲイン変化後の状態に近づける。 When the condition of the second determination unit is satisfied, it is considered that the external gain has changed. At this time, closer to H _k of the state after the gain change by substituting the H _'k in the coefficient modification unit 325 to H _k.

図６は、ゲイン調節後の音声処理装置の動作を示すフローチャートである。
平均値演算部３２Aは、H_k とH’_kとから、比D_k (l)の平均値D_mを求める（Ｓ１）。 FIG. 6 is a flowchart showing the operation of the sound processing apparatus after gain adjustment.
Average calculator 32A is from the H _k and H _'k, the average value D _m of the ratio _{D k (l) (S1)} .

分散演算部３２Bは、D_k (l)、および平均値演算部３２Aからの平均値D_mに基づいて、分散V_Dkを求める（Ｓ１）。 The variance calculation unit 32B calculates the variance V _Dk based on D _k (l) and the average value D _m from the average value calculation unit 32A (S1).

第１の判断部３２Cは、−θmin＜∠D_m＜θminであることと|D_m|＜Mmin、Mmax＜|D_m|であることとを確認する（Ｓ２）。 The first determination unit 32C confirms that −θmin <∠D _m <θmin and | D _m | <Mmin and Mmax <| D _m | (S2).

上記確認がされた場合は、第２の判断部３２Dは、分散演算部３２Bからの演算結果に基づいてV_Dk＜Vminであるか否かを判断する（Ｓ３）。 When the confirmation is made, the second determination unit 32D determines whether or not V _Dk <Vmin based on the calculation result from the variance calculation unit 32B (S3).

V_Dk＜Vminである場合は、係数修正部３２Eは、ゲインが変更されたと判断して適応フィルタ３１２の係数を適応フィルタ３１０のものとするように修正する（Ｓ４）。 If V _Dk <Vmin, the coefficient correction unit 32E determines that the gain has been changed, and corrects the coefficient of the adaptive filter 312 to be that of the adaptive filter 310 (S4).

学習速度の速いフィルタの係数は、ゲイン変更がない場合は学習速度の遅いフィルタとほぼ同じとなる。室内伝達系の伝達関数は、音声の遅延と、周波数帯ごとのゲイン・位相差を含むが、外部ゲインが変更された場合は、遅延と位相差は変化せず、ゲインのみ増減する。このとき、学習速度の速いフィルタは伝達関数の変化に素早く適応するため、学習速度の遅いフィルタと比べてゲインのみ増減した状態となる。 The coefficient of a filter with a fast learning speed is almost the same as that of a filter with a low learning speed when there is no gain change. The transfer function of the indoor transfer system includes a sound delay and a gain / phase difference for each frequency band. However, when the external gain is changed, the delay and the phase difference do not change, and only the gain is increased or decreased. At this time, since the filter with a high learning speed adapts quickly to the change of the transfer function, only the gain is increased or decreased as compared with the filter with a low learning speed.

学習速度の速いフィルタが学習速度の遅いフィルタと比べてゲインのみ増減しているとき、スピーカーかマイクのゲインが変更されたとみなし、学習速度の遅いフィルタの係数を学習速度の速いフィルタにコピーすることにより、スピーカー・マイクのゲイン変更に高速で対応することができるようになる。 When a filter with a fast learning speed increases or decreases only in gain compared to a filter with a slow learning speed, it is considered that the gain of the speaker or microphone has changed, and the coefficient of the filter with a slow learning speed is copied to the filter with a fast learning speed. As a result, it becomes possible to respond to the gain change of the speaker / microphone at high speed.

本実施例による改良は、上記に示される方式のエコーキャンセラだけでなく、周波数領域の振幅比と位相差を複素数で表現したフィルタによりエコー予測を行う他の方式のエコーキャンセラにも適用可能である。 The improvement according to the present embodiment can be applied not only to the echo canceller of the above-described method but also to other types of echo cancellers that perform echo prediction using a filter that expresses the amplitude ratio and phase difference in the frequency domain as complex numbers. .

本発明により、スピーカー・マイクなどの外部ゲインの変更に対して高速に追随し、外部ゲイン変更時にもエコー除去性能の低下の少ないエコーキャンセラを実現することができる。 According to the present invention, it is possible to realize an echo canceller that can follow a change in external gain of a speaker / microphone or the like at high speed and has little deterioration in echo removal performance even when the external gain is changed.

一般に学習速度の速いフィルタを用いるとエコー予測性能が低下するが、本発明では通常は学習速度の遅いフィルタを用いるため性能の低下はない。
［他の実施の形態］
なお、本発明は、上記各実施の形態に限定されず、その要旨を変更しない範囲内で種々な変形が可能である。受話ゲイン調節部と送話ゲイン調節部のいずれか一方を設けた構成でもよい。 In general, when a filter having a high learning speed is used, the echo prediction performance is lowered. However, in the present invention, since a filter having a slow learning speed is usually used, the performance is not lowered.
[Other embodiments]
The present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the invention. A configuration in which one of the reception gain adjustment unit and the transmission gain adjustment unit is provided may be employed.

本発明の実施の形態に係る音声処理装置を適用した音声通信装置の外観斜視図である。1 is an external perspective view of a voice communication device to which a voice processing device according to an embodiment of the present invention is applied. 本発明の実施の形態に係る音声通信装置の音声処理系を示す図である。It is a figure which shows the audio | voice processing system of the audio | voice communication apparatus which concerns on embodiment of this invention. 一般的なエコーキャンセラを示す図である。It is a figure which shows a general echo canceller. 本発明の実施の形態に係る音声処理系のエコーキャンセラ３１を示す図である。It is a figure which shows the echo canceller 31 of the speech processing system which concerns on embodiment of this invention. 本発明の実施の形態に係る適応フィルタ制御部のブロック図である。It is a block diagram of the adaptive filter control part which concerns on embodiment of this invention. 本発明の実施の形態に係る音声処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the speech processing unit which concerns on embodiment of this invention.

符号の説明Explanation of symbols

１０音声通信装置
１１下部筐体
１２キーボード
１３送話ゲインツマミ
１４上部筐体
１６，１６Ａ，１６Ｂスピーカー
１７受話ゲインツマミ
１８マイクロホン
１９マウス
２０ネットワーク
２１Ａ入力端子
２１Ｂ出力端子
２２Ａ受話信号路
２２Ｂ送話信号路
２３Ａ受話ゲイン調節部
２３Ｂ送話ゲイン調節部
３１エコーキャンセラ
３２適応フィルタ制御部
３２Ａ平均値演算部
３２Ｂ分散演算部
３２Ｃ第１の判断部
３２Ｄ第２の判断部
３２Ｅ係数修正部
３１０適応フィルタ
３１１加算器
３１２適応フィルタ
３１３加算器 DESCRIPTION OF SYMBOLS 10 Voice communication apparatus 11 Lower housing | casing 12 Keyboard 13 Transmission gain knob 14 Upper housing | casing 16,16A, 16B Speaker 17 Reception gain knob 18 Microphone 19 Mouse 20 Network 21A Input terminal 21B Output terminal 22A Reception signal path 22B Transmission signal path 23A Reception Gain adjusting unit 23B Transmission gain adjusting unit 31 Echo canceller 32 Adaptive filter control unit 32A Average value calculating unit 32B Dispersion calculating unit 32C First determining unit 32D Second determining unit 32E Coefficient correcting unit 310 Adaptive filter 311 Adder 312 Adaptive Filter 313 Adder

Claims

受話信号に基づいて第１の予測エコー信号を生成する第１の適応フィルタと、
前記受話信号に基づいて前記第１の適応フィルタより高速に学習する第２の適応フィルタと、
前記受話信号が受話信号路から送話信号路に回り込むことにより発生するエコー信号を含む送話信号からの前記第１、第2の予測エコー信号の減算に基づいて前記エコー信号を低減する加算器と、
前記減算によって低減されない残差に基づいて前記第１、第２の適応フィルタの係数を更新する修正部とを有する音声処理装置。 A first adaptive filter that generates a first predicted echo signal based on the received signal;
A second adaptive filter that learns faster than the first adaptive filter based on the received signal;
An adder for reducing the echo signal based on subtraction of the first and second predicted echo signals from a transmission signal including an echo signal generated when the reception signal wraps around the transmission signal path from the reception signal path When,
A speech processing apparatus comprising: a correction unit that updates coefficients of the first and second adaptive filters based on a residual that is not reduced by the subtraction.

前記修正部は、前記減算前の前記第１と第2の適応フィルタの係数の比の平均と分散が所定の関係を満たすとき、前記第２の適応フィルタの係数に基づいて前記第１の適応フィルタの係数を修正することを特徴とする請求項１に記載の音声処理装置。 When the average and variance of the ratios of the coefficients of the first and second adaptive filters before the subtraction satisfy a predetermined relationship, the correction unit performs the first adaptation based on the coefficients of the second adaptive filter. The audio processing apparatus according to claim 1, wherein a coefficient of the filter is corrected.

受話信号に基づいて第１の予測エコー信号と第２の予測エコー信号とを生成し、
前記受話信号が受話信号路から送話信号路に回り込むことにより発生するエコー信号を含む送話信号からの前記第１の予測エコー信号の減算に基づいて前記第１のエコー信号を低減し、
前記減算によって低減されない残差に基づいて前記第１、第２の予測エコー信号を生成するための係数を更新するエコー除去方法。 Generating a first predicted echo signal and a second predicted echo signal based on the received signal;
Reducing the first echo signal based on subtraction of the first predicted echo signal from a transmission signal including an echo signal generated by the reception signal wrapping around the transmission signal path from the reception signal path;
An echo removal method of updating coefficients for generating the first and second predicted echo signals based on a residual that is not reduced by the subtraction.