CN110931032A

CN110931032A - Dynamic echo cancellation method and device

Info

Publication number: CN110931032A
Application number: CN201911133606.1A
Authority: CN
Inventors: 黄绍锋; 靳冠军
Original assignee: Xi'an Aaan Acoustics Technology Co Ltd
Current assignee: Xi'an Aaan Acoustics Technology Co Ltd
Priority date: 2019-11-19
Filing date: 2019-11-19
Publication date: 2020-03-27
Anticipated expiration: 2039-11-19
Also published as: CN110931032B

Abstract

The invention discloses a dynamic echo cancellation method and a device, wherein the method comprises the following steps: acquiring a near-end signal picked up by a microphone and a far-end signal played by a loudspeaker; determining respective frequency domain sub-band signals through the acquired far-end and near-end signals; updating corresponding sub-band thresholds according to sub-band signals of the far-end signal and the near-end signal; determining delay values through sub-band signals and sub-band thresholds of the far-end signal and the near-end signal; counting a delay value within a period of time, and performing smoothing to determine a current delay value; and adjusting the self-adaptive filter according to the far-end signal and the near-end signal of the current elimination delay to obtain the subtraction of the estimated value of the echo signal and the signal of the receiving end, thereby realizing the dynamic echo elimination. The invention can improve the performance of the algorithm, prevent the time delay mutation, ensure the accuracy of the time delay estimation and improve the echo cancellation effect.

Description

Dynamic echo cancellation method and device

Technical Field

The present invention relates to echo cancellation methods and apparatuses, and in particular, to a dynamic echo cancellation method and apparatus.

Background

Acoustic echo is the result of coupling between the microphone and the speaker. In the development of intelligent information, echo interference is formed in application scenes such as a remote audio conference system, a Bluetooth headset, an intelligent sound box, a smart phone and the like, so that echo cancellation has far-reaching significance on a voice device.

The echo cancellation algorithm establishes a far-end signal echo estimation model by using the correlation between a far-end signal of a known device and a multi-path echo signal generated by the far-end signal as a main basis, approaches an optimal echo signal by using a self-adaptive filter, and subtracts the estimated echo signal from a near end to achieve the purpose of echo cancellation. The echo cancellation algorithm also compares the signal received by the microphone with the far-end signal history to cancel acoustic echoes of various delayed multiple reflections. The historical values of the far-end signals are stored in the calculation process, so that the calculation amount of the algorithm is large, and the algorithm is not suitable for embedded equipment with limited calculation capacity. Meanwhile, sudden changes of time delay parameters can be generated, and the echo cancellation effect is influenced. The existing echo cancellation method calculates the maximum similarity between a near-end signal and a far-end signal as a delay estimation value, and moves the far-end signal to update the coefficient of an adaptive filter to obtain an echo signal which is close to the echo of the near-end signal, and then the echo cancellation is achieved by subtracting the echo signal which is close to the echo of the near-end signal. The existing scheme also has the problems of sudden change of a delay estimation value, estimation accuracy of a delay value and algorithm efficiency. The accuracy of the delay estimation of the far-end signal has a large influence on the echo cancellation effect, so the accuracy of the delay estimation in the echo cancellation effect still needs to be improved.

Disclosure of Invention

The present invention provides a dynamic echo cancellation method and apparatus, which are used to solve the problem of poor echo cancellation effect caused by inaccurate estimation of far-end signal delay in the echo cancellation method in the prior art.

In order to realize the task, the invention adopts the following technical scheme:

a dynamic echo cancellation method, said method being performed according to the steps of:

step 1, obtaining a mark sequence T (m) of a current mth frame signal X (m), wherein m is a positive integer, an initial value of m is 1, the mark sequence T (1) does not exist, and the current frame mark sequence T (m) comprises a current frame near-end mark sequence T_n(m) and the current frame far-end marker sequence T_f(m)；

Marking the near end of the current frame with a sequence T_n(m) input to step3, marking the far end of the current frame with a sequence T_f(m) after the saving, performing step 2;

the obtaining of the marker sequence t (m) of the current mth frame signal x (m) specifically includes:

step 1.1, obtaining current mth frame signal X (m), performing short-time Fourier transform on the mth frame signal X (m) to obtain current frame frequency domain subband signal Y (m), wherein the current mth frame signal X (m) comprises current frame near-end signal X (m)_n(m) and the current frame far-end signal X_f(m); the current frame frequency domain sub-band signal Y (m) comprises a current frame near-end frequency domain sub-band signal Y_n(m) and the far-end frequency domain subband signal Y of the current frame_f(m), wherein n represents a proximal end and f represents a distal end;

when m is equal to 1, after the current frame frequency domain sub-band signal Y (1) is stored, setting m to m +1, and returning to the step 1.1;

step 1.2, obtaining a current frame frequency domain sub-band threshold P (m) by using L historical frame frequency domain sub-band signals { Y (m-1), Y (m-2), …, Y (m-L), …, Y (m-L) | L ═ 1,2, …, L } and a current frame frequency domain sub-band signal Y (m), wherein the current frame frequency domain sub-band threshold includes a current frame near-end frequency domain sub-band threshold P (m)_n(m) and the far-end frequency domain sub-band threshold P of the current frame_f(m)；

Wherein L < m is greater than or equal to 1, and Y (m-L) represents the frequency domain subband signal obtained in the step 1.1 performed for the first time before the step 1.1 is performed this time;

step 1.3, obtaining a current frame marker sequence t (m), specifically including:

the near-end frequency domain sub-band signal Y of the current frame obtained in the step 1.1_n(m) is recorded as a first sequence, and the near-end frequency domain sub-band threshold P of the current frame obtained in the step 1.2 is recorded as a first sequence_n(m) inputting the difference value marking method by marking as the second sequence to obtain the near-end marking sequence T of the current frame_n(m)；

The far-end frequency domain sub-band signal Y of the current frame obtained in the step 1.1_f(m) is recorded as a first sequence, and the far-end frequency domain sub-band threshold P of the current frame obtained in the step 1.2 is recorded as a first sequence_f(m) recording as a second sequence input difference value marking method to obtain a current frame far-end marking sequence T_f(m)；

The difference marking method specifically comprises the following steps:

subtracting the first sequence and the second sequence according to the bit to obtain a difference sequence;

carrying out binarization on the difference sequence to obtain a marking sequence;

when binarization is carried out, if the difference sequence is more than or equal to 0, the element at the same position in the mark sequence is 1, otherwise, the element is 0;

step 2, obtaining the far-end mark sequences { T ] of the S historical frames saved before the step 1 is executed this time_f(m-1),T_f(m-2),…,T_f(m-s),…,T_f(m-S) | S ═ 1,2, …, S } and step3 is performed, 1 ≦ S<m, wherein T_f(m-s) represents the remote marker sequence obtained by executing step 1 the s time before executing step 1 this time;

step3, marking the near-end mark sequence T of the current frame obtained in the step 1_n(m) and the S far-end marker sequences { T } of the historical frames obtained in step 2_f(m-1),T_f(m-2),…,T_f(m-s),…,T_fPerforming bitwise exclusive-or operation on (m-S) | S ═ 1,2, …, and S } respectively to obtain S binary sequences;

calculating the occurrence frequency of 1 in the binary sequence, and taking the historical frame far-end marker sequence with the most frequency as a marker signal sequence;

calculating the time difference between the marking signal sequence and the near-end signal sequence of the current frame, obtaining and storing the initial delay value of the current frame;

and 4, obtaining preliminary delay values of the N historical frames stored before the step3 is executed, obtaining the delay value with the largest occurrence frequency for the preliminary delay values of the N historical frames, and obtaining a delay value estimation N1, wherein N1< ═ S.

Step 5, for the current frame near-end signal X obtained in step 1.1_n(m) obtaining time domain correlation with a far-end signal sequence with a delay value of n1 to obtain a correlation value obtained in the step 5 executed this time;

comparing the correlation value obtained this time with the correlation value obtained by executing the step 5 last time, and selecting the far-end signal corresponding to the maximum correlation value as the far-end signal after the current frame is aligned;

step 6, utilizing the aligned far-end signal of the current frame obtained in step 5 and the near-end signal X of the current frame obtained in step 1.1_nAnd (m), after echo cancellation is carried out to obtain a signal after current frame echo cancellation, setting m to be m +1, returning to the step 1 until the current frame signal is the last frame signal, and ending.

Further, said step 1.2 adopts formula I to obtain the current frame frequency domain sub-band threshold p (m):

where win () represents a weighting function, 0< win () < 1.

Further, the step 1.2 specifically includes:

step A, obtaining a first threshold P of a signal by adopting a formula II¹(m), said signal first threshold P¹(m) a first threshold comprising a near-end signal

Or a first threshold of the far-end signal

Where win () represents a weighting function, 0< win () < 1;

step B, adopting a formula III to obtain a second threshold P²(m) said P²(m) includes a proximal second threshold

Or a remote second threshold

P²(m)＝P²(m-1)×gamma₁+Y(m)×(1-gamma₁) Formula III

Wherein P is²(m-1) is a second threshold, P, of the subband signal in the m-1 th frame frequency domain²(m-1) a near end second threshold comprising a subband signal in the m-1 th frame frequency domain

Or a far-end second threshold of the sub-band signal in the m-1 th frame frequency domain

gamma₁Is a first weighted value of 0<gamma₁<1；

Step C, obtaining a current frame frequency domain sub-band threshold P (m) by adopting a formula IV, wherein P (m) comprises a current frame near-end frequency domain sub-band threshold P_n(m) or far-end frequency domain sub-band threshold P of current frame_f(m)：

P(m)＝P¹(m)×gamma₂+P²(m)×(1-gamma₂) Formula IV

Wherein gamma is₂Is a second weighted value, 0<gamma₂<1。

Further, the step 1.2 specifically includes:

step I, adopting a formula II to obtain a first threshold P¹(m), said first threshold P¹(m) includes a near-end first threshold

Or a remote first threshold

Where win () represents a weighting function, 0< win () < 1;

step II, adopting the formula III to obtain a second threshold P²(m) said P²(m) includes a proximal second threshold

Or a remote second threshold

P²(m)＝P²(m-1)×gamma₁+Y(m)×(1-gamma₁) Formula III

Wherein P is²(m-1) is a second threshold, P, of the subband signal in the m-1 th frame frequency domain²(m-1) including a second threshold of the m-1 th frame near-end frequency domain sub-band signal

Or the second threshold of the far-end frequency domain sub-band signal of the m-1 th frame

gamma₁Is a first weighted value of 0<gamma₁<1；

Step III, obtaining a current frame frequency domain sub-band threshold P (m) by adopting a formula IV, wherein P (m) comprises a current frame near-end frequency domain sub-band threshold P_n(m) or far-end frequency domain sub-band threshold P of current frame_f(m)：

P(m)＝max(P¹(m)×gamma₃,P²(m)×gamma₄) Formula V

Where max () denotes taking the maximum value, gamma₃Is a third weighted value, 0<gamma₃<1，gamma₄Is a fourth weight value of 0<gamma₄<1。

A kind of dynamic echo cancellation device, including the current frame marks the sequence and obtains the module, historical frame marks the sequence and obtains the module, preliminary delay value obtains the module, current delay value obtains the module, delay value judge module and echo cancellation module;

the current frame marking sequence obtaining module comprises a signal obtaining sub-module, a threshold obtaining sub-module and a marking sequence obtaining sub-module;

the signal acquisition submodule is used for acquiring a current mth frame signal X (m), wherein m is an integer and an initial value of m is 1, and performing short-time Fourier transform on the current frame signal to acquire a current frame frequency domain subband signalNumber Y (m), wherein the current frame frequency domain sub-band signal Y (m) includes a current frame near-end frequency domain sub-band signal Y_n(m) and the far-end frequency domain subband signal Y of the current frame_f(m), wherein n represents a proximal end and f represents a distal end;

when m is 1, after the current frame frequency domain sub-band signal Y (1) is stored, setting m to be m +1 and returning to a signal acquisition sub-module;

the threshold obtaining sub-module is used for obtaining sub-band signals { Y (m-1), Y (m-2), …, Y (m-L), …, Y (m-L) | L ═ 1,2, …, L } and current frame frequency domain sub-band signals Y (m), wherein L is more than or equal to 1<m, Y (m-l) represents the frequency domain sub-band signal obtained at the m-l times, and the current frame frequency domain sub-band threshold P (m) is obtained, wherein the current frame frequency domain sub-band threshold comprises the current frame near-end frequency domain sub-band threshold P_n(m) and the far-end frequency domain sub-band threshold P of the current frame_f(m)；

The marker sequence obtaining submodule is used for obtaining a marker sequence T (m) of the current frame,

wherein obtaining the current frame marker sequence t (m) specifically includes:

the near-end frequency domain sub-band signal Y of the current frame_n(m) recording as a first sequence, and taking the near-end frequency domain sub-band threshold P of the current frame_n(m) inputting the difference value marking method by marking as the second sequence to obtain the near-end marking sequence T of the current frame_n(m)；

The far-end frequency domain sub-band signal Y of the current frame_f(m) recording as a first sequence, and taking the far-end frequency domain sub-band threshold P of the current frame_f(m) recording as a second sequence input difference value marking method to obtain a current frame far-end marking sequence T_f(m)；

The difference marking method specifically comprises the following steps:

when binarization is carried out, if any element in the difference sequence is more than or equal to 0, the element at the same position in the mark sequence is 1, otherwise, the element is 0;

the historical frame mark sequence is obtainedThe module is used for obtaining the remote mark sequence { T } of the previously saved S historical frames_f(m-1),T_f(m-2),…,T_f(m-s),…,T_f(m-S) | S ═ 1,2, …, S } and step3 is performed, 1 ≦ S<m, wherein T_f(m-s) represents the distal marker sequence obtained the s-th time before the present execution;

the preliminary delay value obtaining module is used for obtaining the near-end mark sequence T of the current frame_n(m) and S sequences of remote tags for historical frames { T_f(m-1),T_f(m-2),…,T_f(m-s),…,T_fPerforming bitwise exclusive-or operation on (m-S) | S ═ 1,2, …, and S } respectively to obtain S binary sequences;

taking a history frame far-end signal sequence corresponding to a history frame far-end marking sequence corresponding to the binary sequence with the most 1 occurrence times in the S binary sequences as a marking signal sequence;

calculating the time difference between the marking signal sequence and the current frame near-end signal sequence, obtaining and storing the initial delay value of the current frame;

the current delay value obtaining module is used for obtaining preliminary delay values of N historical frames stored before, solving the delay value with the largest number of times of the current frame and the preliminary delay values of the N historical frames, and obtaining a current frame delay value N1, wherein N1< ═ S;

the delay value judging module is used for judging the near-end signal X of the current frame_n(m) obtaining time domain correlation from a far-end signal sequence corresponding to the delay value of the current frame to obtain a current correlation value;

comparing the current correlation value with the last correlation value, and selecting the far-end signal corresponding to the maximum correlation value as the far-end signal after the current frame is aligned;

the echo cancellation module is used for utilizing the obtained far-end signal after the current frame is aligned and the current frame near-end signal X_n(m) obtaining adaptive filter coefficients;

obtaining an echo signal according to the self-adaptive filter coefficient and the far-end signal after the current frame is aligned;

using the near-end signal X of the current frame_n(m) subtracting saidAfter the echo signal obtains the signal after the echo of the current frame is eliminated, setting m to be m +1, returning to the current frame marking sequence obtaining module until the current frame signal is the last frame signal, and ending.

Further, the threshold obtaining sub-module obtains the current frame frequency domain sub-band threshold p (m) by adopting formula I:

where win () represents a weighting function, 0< win () < 1.

Further, the threshold obtaining sub-module includes a first threshold obtaining unit, a second threshold obtaining unit and a threshold obtaining unit;

the first threshold obtaining unit is used for obtaining a first threshold P by adopting a formula II¹(m), said first threshold P¹(m) includes a near-end first threshold

Or a remote first threshold

Where win () represents a weighting function, 0< win () < 1;

the second threshold obtaining unit is used for obtaining a second threshold P by adopting a formula III²(m) said P²(m) includes a proximal second threshold

Or a remote second threshold

P²(m)＝P²(m-1)×gamma₁+Y(m)×(1-gamma₁) Formula III

gamma₁Is a first weighted value of 0<gamma₁<1；

The threshold obtaining unit is configured to obtain a current frame frequency domain sub-band threshold P (m) using formula IV, where P (m) includes a current frame near-end frequency domain sub-band threshold P_n(m) or far-end frequency domain sub-band threshold P of current frame_f(m)：

P(m)＝P¹(m)×gamma₂+P²(m)×(1-gamma₂) Formula IV

Wherein gamma is₂Is a second weighted value, 0<gamma₂<1。

Or a remote first threshold

Where win () represents a weighting function, 0< win () < 1;

the second threshold obtaining unit is used for obtaining a second threshold P by adopting a formula III²(m) said P²(m) Including a proximal second threshold

Or a remote second threshold

P²(m)＝P²(m-1)×gamma₁+Y(m)×(1-gamma₁) Formula III

gamma₁Is a first weighted value of 0<gamma₁<1；

P(m)＝max(P¹(m)×gamma₃,P²(m)×gamma₄) Formula V

Compared with the prior art, the invention has the following technical effects:

1. after the frequency domain sub-band signals are obtained, the dynamic echo cancellation method and the device only adopt the frequency point range of interest, improve the accuracy of delay detection and remove the interference of other frequency components;

2. according to the dynamic echo cancellation method and device, the sub-band threshold is obtained by the methods of weighted summation, averaging, maximum value solving and the like of the sub-band signal energy of a plurality of frames or two adjacent frames of frequency spectrums, so that the problems of energy mutation, jitter and the like are effectively solved, the energy of the frequency spectrums and the sub-bands can be more stable, the anti-interference capability is stronger, and the accuracy of subsequent delay detection is improved;

3. the dynamic echo cancellation method and the device provided by the invention provide the judgment of updating the delay value, the delay value is smooth, and the problems of sudden change and jitter of the delay value are solved;

4. the dynamic echo cancellation method and the device provided by the invention have the advantages that the calculation of the intra-frame margin is provided, and the precision of delay alignment is effectively improved.

Drawings

Fig. 1 is a schematic diagram of a dynamic echo cancellation scenario provided by the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and examples. So that those skilled in the art can better understand the present invention. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.

The following definitions or conceptual connotations relating to the present invention are provided for illustration:

example one

In this embodiment, a dynamic echo cancellation method is disclosed, which is performed according to the following steps:

step 1, obtaining a mark sequence T (m) of a current mth frame signal X (m), wherein m is a positive integer, the initial value of m is 1, the mark sequence T (1) does not exist, and the current frame mark sequence T (m) comprises a current frame near-end mark sequence T_n(m) and the current frame far-end marker sequence T_f(m)；

Marking the near end of the current frame with a sequence T_n(m) inputting to step3, and labeling the far-end marker sequence T of the current frame_f(m) after the saving, performing step 2;

step 1.1, obtaining current mth frame signal X (m), m being an integer and m being an initial value of 1, performing short-time Fourier transform on the current frame signal to obtain current frame frequency domain sub-band signal Y (m), wherein the current frame frequency domain sub-band signal Y (m) comprises current frame near-end frequency domain sub-band signal Y (m)_n(m) and the far-end frequency domain subband signal Y of the current frame_f(m), wherein n represents a proximal end and f represents a distal end;

when the current frame signal is the first frame signal X (1), after the current frame frequency domain sub-band signal Y (1) is stored, setting m to m +1, and returning to the step 1;

in this embodiment, first, a current frame near-end signal and a current frame far-end signal are obtained;

in the present embodiment, as shown in fig. 1, a description is given of the echo source in a scenario. The near-end microphone often picks up signals due to reflections of far-end speech from indoor walls or other obstacles. These reflected signals are signals that are reflected back through the obstruction due to the sound played by the speaker, generally referred to as echoes, as shown at 420 in FIG. 1; the time difference between the signal sent from the far end to the near end is recorded as the delay of the far end signal as the echo signal to the near end.

In the step, a near-end signal picked up by a microphone and a far-end signal played by a loudspeaker are obtained; the near-end signal comprises a target output signal and an echo signal delayed with respect to the far-end signal.

In this embodiment, short-time fourier transform is performed on the far-end signal and the near-end signal, respectively, to obtain frequency-domain subband signals.

Wherein L < m is greater than or equal to 1, and Y (m-L) represents the frequency domain subband signal obtained in the step 1.1 performed the first time before the step 1 is performed this time;

three methods of obtaining frequency domain sub-band thresholds are provided in the present invention.

Optionally, in step 1.2, obtaining the current frame frequency domain sub-band threshold p (m) by using formula I:

where win () represents a weighting function, 0< win () < 1.

In this embodiment, taking the near-end signal as an example, the calculation steps are as follows:

directly weighting and summing the historical values to obtain a sub-band threshold;

according to historical near-end signal frequency domain information Y_n(m-l),l∈[1，L]And weighting to form a sub-band threshold:

wherein, P_n(m) is the subband threshold of the near-end signal, win (L) is the weight value, L ∈ [1, L [)]；

The threshold obtaining method provided in the embodiment sufficiently considers the influence of the historical frame data on the current frame data, so as to obtain a more accurate threshold.

Optionally, step 1.2 specifically includes:

step A, adopting a formula II to obtain a first threshold P¹(m), said first threshold P¹(m) includes a near-end first threshold

Or a remote first threshold

Where win () represents a weighting function, 0< win () < 1;

step B, extractingObtaining a second threshold P using equation III²(m) said P²(m) includes a proximal second threshold

Or a remote second threshold

P²(m)＝P²(m-1)×gamma₁+Y(m)×(1-gamma₁) Formula III

Wherein P is²(m-1) is the second threshold, P, of the subband signal in the previous frame frequency domain²(m-1) a near end second threshold comprising a subband signal in the previous frame frequency domain

Or a far-end second threshold of the subband signal in the previous frame frequency domain

gamma₁Is a first weighted value of 0<gamma₁<1；

P(m)＝P¹(m)×gamma₂+P²(m)×(1-gamma₂) Formula IV

Wherein gamma is₂Is a second weighted value, 0<gamma₂<1。

In this embodiment, a subband threshold is obtained according to the weighted sum of the historical thresholds and the historical smooth value of the current threshold, taking the near-end signal as an example:

step A, according to historical near-end signal frequency domain information Y_n(m-l),l∈[1，L]And weighting to form a sub-band threshold:

step B, according to the current near-end signal frequency domain information Y_n(m), smoothing to obtain a subband threshold:

wherein the content of the first and second substances,

a near-end second threshold, gamma, of the subband signal in the previous frame frequency domain₁Is a smoothing factor;

step3: pair

And

weighting to obtain a near-end signal sub-band threshold P_n(m)：

Wherein, gamma is₂Are weighted values.

The threshold obtaining method provided in this embodiment can fully reflect the difference between the current signal and the threshold when the signal component is complex, so as to obtain a more accurate threshold.

Optionally, step 1.2 specifically includes:

Or a remote first threshold

Where win () represents a weighting function, 0< win () < 1;

Or a remote second threshold

P²(m)＝P²(m-1)×gamma₁+Y(m)×(1-gamma₁) Formula III

gamma₁Is a first weighted value of 0<gamma₁<1；

P(m)＝max(P¹(m)×gamma₃,P²(m)×gamma₄) Formula V

In this embodiment, the calculation idea of the third method is the same as that of the second method, namely, Step a and Step B, and the near-end signal is taken as an example and is applied to Step3

And

weighting and solving the maximum value as the sub-band threshold of the near-end signal;

wherein max represents the maximum of the two numbers; gamma of₃And gamma₄Are weighted values.

In the step, the subband threshold is obtained by methods of weighted summation, averaging, maximum value solving and the like of the energy of the subband signals of the frequency spectrums of multiple frames or two adjacent frames, so that the problems of energy mutation, jitter and the like are effectively solved, the energy of the frequency spectrums and the subband can be more stable, and the anti-jamming capability is stronger. Greatly improving the stability and accuracy of subsequent calculation.

The difference marking method specifically comprises the following steps:

in the step, the similarity is determined by the coherence between the energy of the spectrum sub-band signal and the threshold frequency domain of the sub-band. Taking the near-end signal as an example, the calculation steps are as follows;

and determining the mark of each frequency point by subtracting or dividing the energy of the current frequency spectrum sub-band signal of the near-end signal from the sub-band threshold, and determining whether the subtraction is greater than zero or whether the ratio exceeds 1, namely, the subtraction is greater than zero or the ratio is greater than 1, and the sub-band mark is marked as 1, otherwise, the mark is marked as 0.

The correlation is calculated as follows:

or

Wherein, T_nAnd (m, k) is a sub-band mark, m is a frame number, and k is a frequency domain number. T is_n(m, k) is a sequence for 0 and 1.

in this embodiment, S ═ 3 is taken as an example:

the current frame near-end marker sequence is:

[1,0,1,1,0,1]；

the historical far-end marker sequence of the previous frame of the current frame is as follows:

[0,0,1,1,1,0]；

the historical far-end marker sequence of the first two frames of the current frame is as follows:

[1,0,0,0,0,1]；

the historical far-end marker sequence of the first three frames of the current frame is as follows:

[1,1,1,1,1,0]；

firstly, carrying out bitwise exclusive OR operation on a current frame near-end mark sequence and a historical far-end mark sequence of a previous frame of the current frame, namely carrying out bitwise exclusive OR operation on [1,0,1,1,0,1] and [0,0,1,1,1,0] to obtain a sequence 1[0,1,1,1,0,0 ];

similarly, performing bitwise exclusive nor operation on the current frame near-end marker sequence and the historical far-end marker sequences of the previous two frames of the current frame to obtain a sequence 2[1,1,0,0,1,1 ];

similarly, performing bitwise exclusive-nor operation on the current frame near-end marker sequence and the historical far-end marker sequences of the first three frames of the current frame to obtain a sequence 3[1,0,1,1,0,0 ];

it can be judged from the sequence 1, the sequence 2 and the sequence 3 that the number of occurrences of 1 in the sequence 2 is the largest, and then the far-end signal sequence of the historical frame of the previous two frames of the current frame corresponding to the sequence 2 is used as the marker signal sequence, so as to obtain the preliminary delay value.

Step 4, obtaining preliminary delay values of N historical frames stored before the step3 is executed at this time, and obtaining the average value of the preliminary delay values of the current frame and the preliminary delay values of the N historical frames to obtain the delay value of the current frame, wherein N is less than or equal to S;

the method for determining the current delay value by smoothing the delay historical value is to prevent the influence of sudden delay change on the accuracy of a subsequent echo cancellation result. The sliding window method is used in the smoothing process, so that the delay value of each current frame can be effectively obtained in real time; the width of the sliding window is N, and new data is placed at the tail end of the window when the data is updated every time, so that the first data in the original window is overflowed and discarded. The sliding window can ensure the real-time performance of updating the delay value and simultaneously eliminate the problems of delay jitter, delay mutation and the like of the delay value.

Step 5, for the current frame near-end signal X obtained in step 1.1_n(m) obtaining time domain correlation from a far-end signal sequence corresponding to the current frame delay value to obtain a correlation value obtained in the step 5 executed this time;

in this embodiment, in order to further ensure the accuracy of the delay, the present invention determines whether to update the current delay value by a time domain correlation method. And obtaining time domain correlation between the near-end signal of the current frame and the far-end signal corresponding to the current frame delay, obtaining time domain correlation between the voice signal of the current frame and the voice signal corresponding to the previous frame delay, and judging the magnitude of a correlation value, wherein the magnitude of the correlation value is the current delay value.

Step 6, utilizing the aligned far-end signal of the current frame obtained in step 5 and the near-end signal X of the current frame obtained in step 1.1_n(m) obtaining adaptive filter coefficients;

utilizing the current frame near-end signal X obtained in step 1.1_nAnd (m) subtracting the echo signal to obtain a signal after the current frame echo is eliminated, setting m to be m +1, returning to the step 1 until the current frame signal is the last frame signal, and ending.

In this embodiment, the intra-frame delay margin may cause the filter coefficients to converge poorly, thereby affecting the echo cancellation effect. Therefore, the intra-frame delay margin is obtained through the filter coefficients, the specific method is to divide the filter length equally, and the variance of each section of filter coefficients is respectively obtained, wherein the section with the largest variance is the corresponding position of the intra-frame delay.

And adjusting the coefficient of the adaptive filter according to the processed far-end signal and the processed near-end signal to obtain the subtraction of the estimated value of the echo signal and the signal of the receiving end, thereby realizing the dynamic echo cancellation of the output signal. And inputting the far-end signal aligned with the echo signal contained in the near end after the time delay elimination into an adaptive filter, and estimating the echo signal contained in the near end. The convergence speed of the adaptive filter is improved by resetting the filter coefficient, moving the sliding window and the like. The implementation of dynamic echo cancellation is to subtract the echo signal contained in the near-end signal estimated by the adaptive filter.

Example two

the signal obtaining sub-module is used for obtaining current mth frame signal X (m), m is an integer and the initial value of m is 1, and performing short-time Fourier transform on the current frame signal to obtain current frame frequency domain sub-band signal Y (m), wherein the current frame frequency domain sub-band signal Y (m) comprises current frame near-end frequency domain sub-band signal Y (m)_n(m) and the far-end frequency domain subband signal Y of the current frame_f(m), wherein n represents a proximal end and f represents a distal end;

the threshold obtaining submodule is used for obtaining the frequency domain subband signals Y (m-1), Y (m-2), …, Y (m-L), …, Y (m-L) | L ═ 1,2, …, L according to L historical frame frequency domain subband signals { Y (m-1), Y (m-2), and current frame frequency domain subband signals Y (m), wherein 1 ≦ L<m, Y (m-l) represents the m-l frequency domain sub-band signal obtained before, and the current frame frequency domain sub-band threshold P (m) is obtained, wherein the current frame frequency domain sub-band threshold comprises the current frame near-end frequency domain sub-band threshold P_n(m) and current frame far-end frequency domain sub-band gateLimit of P_f(m)；

The tag sequence obtaining sub-module is used for obtaining a current frame tag sequence T (m),

The difference marking method specifically comprises the following steps:

the historical frame mark sequence obtaining module is used for obtaining the far-end mark sequences { T } of the S historical frames saved before_f(m-1),T_f(m-2),…,T_f(m-s),…,T_f(m-S) | S ═ 1,2, …, S } and step3 is performed, 1 ≦ S<m, wherein T_f(m-s) represents the distal marker sequence obtained the s-th time before the present execution;

the current delay value obtaining module is used for obtaining preliminary delay values of N previous stored historical frames, averaging the preliminary delay values of the current frame and the preliminary delay values of the N historical frames to obtain the delay value of the current frame, wherein N is less than or equal to S;

using the near-end signal X of the current frame_nAnd (m) subtracting the echo signal to obtain a signal after the current frame echo is eliminated, setting m to be m +1, returning to the current frame marking sequence obtaining module until the current frame signal is the last frame signal, and ending.

Optionally, the threshold obtaining sub-module obtains the current frame frequency domain sub-band threshold p (m) by using formula I:

where win () represents a weighting function, 0< win () < 1.

Optionally, the threshold obtaining sub-module includes a first threshold obtaining unit, a second threshold obtaining unit, and a threshold obtaining unit;

a first threshold obtaining unit for obtaining a first threshold P by using formula II¹(m)，A first threshold P¹(m) includes a near-end first threshold

Or a remote first threshold

Where win () represents a weighting function, 0< win () < 1;

a second threshold obtaining unit for obtaining a second threshold P using formula III²(m)，P²(m) includes a proximal second threshold

Or a remote second threshold

P²(m)＝P²(m-1)×gamma₁+Y(m)×(1-gamma₁) Formula III

gamma₁Is a first weighted value of 0<gamma₁<1；

The threshold obtaining unit is used for obtaining the current frame frequency domain sub-band threshold P (m) by adopting the formula IV, wherein P (m) comprises the current frame near-end frequency domain sub-band threshold P_n(m) or far-end frequency domain sub-band threshold P of current frame_f(m)：

P(m)＝P¹(m)×gamma₂+P²(m)×(1-gamma₂) Formula IV

Wherein gamma is₂Is a second weighted value, 0<gamma₂<1。

a first threshold obtaining unit for obtaining a first threshold P by using formula II¹(m), a first threshold P¹(m) includes a near-end first threshold

Or a remote first threshold

Where win () represents a weighting function, 0< win () < 1;

Or a remote second threshold

P²(m)＝P²(m-1)×gamma₁+Y(m)×(1-gamma₁) Formula III

gamma₁Is a first weighted value of 0<gamma₁<1；

P(m)＝max(P¹(m)×gamma₃,P²(m)×gamma₄) Formula V

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus necessary general hardware, and certainly may also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present invention may be substantially implemented or a part of the technical solutions contributing to the prior art may be embodied in the form of a software product, where the computer software product is stored in a readable storage medium, such as a floppy disk, a hard disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Claims

1. A dynamic echo cancellation method, characterized in that said method is performed according to the following steps:

The difference marking method specifically comprises the following steps:

step 6, utilizing the aligned far-end signal of the current frame obtained in step 5 and the near-end signal X of the current frame obtained in step 1.1_n(m), performing echo cancellation,and after obtaining the signal after the echo of the current frame is eliminated, setting m to be m +1, returning to the step 1 until the current frame signal is the last frame signal, and ending.

2. The dynamic echo cancellation method according to claim 1, wherein said step 1.2 uses formula I to obtain the current frame frequency domain sub-band threshold p (m):

where win () represents a weighting function, 0< win () < 1.

3. The dynamic echo cancellation method according to claim 1, wherein said step 1.2 specifically comprises:

Or a first threshold of the far-end signal

Where win () represents a weighting function, 0< win () < 1;

Or a remote second threshold

P²(m)＝P²(m-1)×gamma₁+Y(m)×(1-gamma₁) Formula III

gamma₁Is a first weighted value of 0<gamma₁<1；

P(m)＝P¹(m)×gamma₂+P²(m)×(1-gamma₂) Formula IV

Wherein gamma is₂Is a second weighted value, 0<gamma₂<1。

4. The dynamic echo cancellation method according to claim 1, wherein said step 1.2 specifically comprises:

Or a remote first threshold

Where win () represents a weighting function, 0< win () < 1;

Or a remote second threshold

P²(m)＝P²(m-1)×gamma₁+Y(m)×(1-gamma₁) Formula III

gamma₁Is a first weighted value of 0<gamma₁<1；

P(m)＝max(P¹(m)×gamma₃,P²(m)×gamma₄) Formula V

5. A dynamic echo cancellation device is characterized by comprising a current frame marking sequence obtaining module, a historical frame marking sequence obtaining module, a preliminary delay value obtaining module, a current delay value obtaining module, a delay value judging module and an echo cancellation module;

the signal obtaining sub-module is used for obtaining a current mth frame signal X (m), m is an integer, the initial value of m is 1, and performing short-time Fourier transform on the current frame signal to obtain a current frame frequency domain sub-band signal Y (m), wherein the current frame frequency domain sub-band signal Y (m) comprises a current frame near-end frequency domain sub-band signal Y (m)_n(m) and the far-end frequency domain subband signal Y of the current frame_f(m), wherein n represents a proximal end and f represents a distal end;

The difference marking method specifically comprises the following steps:

the historical frame marking sequence obtaining module is used for obtaining the remote marking sequences { T } of the previously saved S historical frames_f(m-1),T_f(m-2),…,T_f(m-s),…,T_f(m-S) | S ═ 1,2, …, S } and step3 is performed, 1 ≦ S<m, wherein T_f(m-s) represents the distal marker sequence obtained the s-th time before the present execution;

the echo cancellation module is used for utilizing the obtainedThe far-end signal after aligning the current frame and the near-end signal X of the current frame_n(m) obtaining adaptive filter coefficients;

6. The dynamic echo cancellation device of claim 5, wherein said threshold obtaining sub-module obtains the current frame frequency domain sub-band threshold P (m) using formula I:

where win () represents a weighting function, 0< win () < 1.

7. The dynamic echo cancellation device of claim 5, wherein said threshold acquisition sub-module comprises a first threshold acquisition unit, a second threshold acquisition unit, and a threshold acquisition unit;

Or a remote first threshold

Where win () represents a weighting function, 0< win () < 1;

Or a remote second threshold

P²(m)＝P²(m-1)×gamma₁+Y(m)×(1-gamma₁) Formula III

gamma₁Is a first weighted value of 0<gamma₁<1；

P(m)＝P¹(m)×gamma₂+P²(m)×(1-gamma₂) Formula IV

Wherein gamma is₂Is a second weighted value, 0<gamma₂<1。

8. The dynamic echo cancellation device of claim 5, wherein said threshold acquisition sub-module comprises a first threshold acquisition unit, a second threshold acquisition unit, and a threshold acquisition unit;

the first threshold obtaining unit is used for obtaining a first threshold P by adopting a formula II¹(m) ofA first threshold P¹(m) includes a near-end first threshold

Or a remote first threshold

Where win () represents a weighting function, 0< win () < 1;

Or a remote second threshold

P²(m)＝P²(m-1)×gamma₁+Y(m)×(1-gamma₁) Formula III

gamma₁Is a first weighted value of 0<gamma₁<1；

P(m)＝max(P¹(m)×gamma₃,P²(m)×gamma₄) Formula V