CN110431624A

CN110431624A - Residual echo detection method, residual echo detection device, speech processing chip and electronic equipment

Info

Publication number: CN110431624A
Application number: CN201980001068.2A
Authority: CN
Inventors: 郭红敬; 李国梁; 王鑫山; 韩文凯; 朱虎
Original assignee: Shenzhen Huiding Technology Co Ltd
Current assignee: Shenzhen Goodix Technology Co Ltd
Priority date: 2019-06-17
Filing date: 2019-06-17
Publication date: 2019-11-08
Anticipated expiration: 2039-06-17
Also published as: CN110431624B; WO2020252629A1

Abstract

A kind of residual echo detection method, residual echo detection device, speech processing chip and electronic equipment, residual echo detection method include: to determine residual echo detecting factor according to the related power between far-end speech signal and near-end voice signals；According to the residual echo detecting factor, detect whether that there are residual echos, to provide a kind of detection scheme of residual echo.

Description

Residual echo detection method, residual echo detection device, speech processing chip and electronics Equipment

Technical field

The invention relates to voice technology field more particularly to a kind of residual echo detection methods, residual echo inspection Survey device, speech processing chip and electronic equipment.

Background technique

With the fast development of the technologies such as the communication technology, artificial intelligence, interactive voice, to communication quality, wearable device User experience, the reliability of interactive voice etc. propose increasingly higher demands.No matter that application scenarios, simply by the presence of language Sound call scene, just certainly exist echo, therefore, it is necessary to by echo cancellor (acoustic echo cancellation, AEC echo) is eliminated, improves voice quality, to promote user experience.

In most cases mainly for acoustic echo in echo cancellor.Acoustic echo elimination is largely divided into two parts: line Property echo cancellor and residual echo eliminate.In linear echo cancellor, linear echo can be by using sef-adapting filter Echo path is estimated, as far as possible the sound field of approaching to reality, then estimates echo signal, and in microphone actual acquisition To voice signal in deduct the echo signal of estimation to achieve the effect that echo cancellor, but due to sef-adapting filter order The influence of the factors such as limited, data characteristic, nonlinear loudspeaker characteristic, can not thoroughly eliminate echo, still can exist residual Remaining echo.The presence of residual echo can seriously affect the voice quality and user experience of call.For this reason, it may be necessary to be returned by remaining Sound Processing for removing eliminates residual echo.

It is found during inventor realizes the application, the elimination of residual echo is to be accurately detected residual echo Premised on, the accuracy of testing result is higher, more can guarantee and effectively eliminates residual echo, it would therefore be highly desirable to provide a kind of solution Certainly scheme, to realize the detection of residual echo.

Summary of the invention

In view of this, the embodiment of the present application provides a kind of residual echo detection method, residual echo detection device, at voice Chip and electronic equipment are managed, at least to solve the above-mentioned problems in the prior art.

The embodiment of the present application provides a kind of residual echo detection method comprising:

According to the related power between far-end speech signal and near-end voice signals, residual echo detecting factor is determined；

According to the residual echo detecting factor, detect whether that there are residual echos.

The embodiment of the present application provides a kind of residual echo detection device comprising:

Detecting factor computing unit, for according to the related power between far-end speech signal and near-end voice signals, really Determine residual echo detecting factor；

Residual echo detection unit, for detecting whether that there are residual echos according to the residual echo detecting factor.

The embodiment of the present application provides a kind of speech processing chip comprising: residual echo described in residual echo detection device Detection device includes: detecting factor computing unit, for according to the related function between far-end speech signal and near-end voice signals Rate determines residual echo detecting factor；Residual echo detection unit, for according to the residual echo detecting factor, detection to be It is no that there are residual echos.

The embodiment of the present application provides a kind of electronic equipment comprising any speech processes core of the embodiment of the present application Piece.

By above technical scheme as it can be seen that in the embodiment of the present application, according between far-end speech signal and near-end voice signals Related power, determine residual echo detecting factor；According to the residual echo detecting factor, detect whether there is remaining return Sound, to provide a kind of detection scheme of residual echo.

Detailed description of the invention

Fig. 1 is the echo cancelling system structural schematic diagram that can apply the application residual echo detection scheme；

Fig. 2 is the echo cancelling system structural schematic diagram for applying the application residual echo detection device；

Fig. 3 is a kind of residual echo detection method flow diagram of the embodiment of the present application；

Fig. 4 is another residual echo detection method flow diagram of the embodiment of the present application；

Fig. 5 is another residual echo detection method flow diagram of the embodiment of the present application；

Fig. 6 is the flow diagram of residual echo removing method in the embodiment of the present application.

Specific embodiment

To make those skilled in the art more fully understand the technical solution in the embodiment of the present application, below with reference to attached The technical scheme in the embodiment of the application is clearly and completely described for figure.Obviously, described embodiment is only the application A part of the embodiment, rather than whole embodiments.Therefore, those of ordinary skill in the art are based on described embodiment and obtain The range of the embodiment of the present application protection all should belong in the other embodiments obtained.

Fig. 1 is the echo cancelling system structural schematic diagram that can apply the application residual echo detection scheme；As shown in Figure 1, Echo cancelling device specifically includes voice endpoint detection module 106, double talk detection device 108, sef-adapting filter 110, removes Except this, which can also include: voice acquisition module 102, voice playing module 104, addition module 112, Wherein, voice acquisition module 102 is logical with voice endpoint detection module 106, double talk detection device 108, addition module 112 respectively News connection, voice playing module 104 respectively with voice endpoint detection module 106,110 communication connection of sef-adapting filter, voice Endpoint detection module 106 and 108 communication connection of double talk detection device, double talk detection device 108 respectively with adaptive-filtering Device 110,112 communication connection of addition module.

Wherein, voice acquisition module 102 is for acquiring proximal end analog voice signal y (t), to generate the proximal end number language Message；In the present embodiment, voice acquisition module is specifically as follows microphone, and the proximal end analog voice signal y (t) of acquisition may Analog voice signal s (t) including near-end speaker, it is also possible to play far-end analog voice letter including voice playing module 104 Echo analog voice signal d (t) caused by number x (t).

Wherein, voice playing module 104, for playing the far-end analog voice signal x (t) received；In the present embodiment, Voice playing module 104 can be specially loudspeaker.

Wherein, voice endpoint detection module 106, for detecting whether there are echo analog voice signal d (t)；This implementation In example, voice endpoint detection module 106 can be referred to as speech terminals detection device (Voice Activity Detector, letter again Claim VAD).

The double talk detection device is for detecting whether exist simultaneously echo analog voice signal d (t) and proximal end is said The analog voice signal s (t) of words person, that is, distinguishing is single-ended talking state or dual end communication state, to determine filter coefficient Update.

Sef-adapting filter 110 is estimated for being generated according to filter coefficient and the far-end analog voice signal x (t) The echo analog voice signal of meterTo eliminate the letter of echo digital speech present in the proximal end analog voice signal y (t) Number d (t).In the present embodiment, the sef-adapting filter 110 is such as multi-time Delay block adaptive frequency domain filter.

Wherein, addition module 112, for the echo mould by subtracting estimation from the proximal end analog voice signal y (t) Quasi- voice signalThe error simulation voice signal e (t) is obtained, is deposited with eliminating in the proximal end analog voice signal y (t) Echo analog voice signal d (t).In the present embodiment, addition module 112 can be specially adder.The estimation is returned Acoustic simulation voice signalMore accurate, i.e., closer to the actual echo analog voice signal d (t), then voice is clear It spends higher.

Fig. 2 is the echo cancelling system structural schematic diagram for applying the application residual echo detection device；As shown in Fig. 2, Residual echo detection device 114 and residual echo cancellation element 118 can be increased in the echo cancelling system of above-mentioned Fig. 1, The residual echo detection device includes: detecting factor computing unit and residual echo detection unit, and detecting factor calculates single Member is for determining residual echo detecting factor according to the related power between far-end speech signal and near-end voice signals；It is remaining Detection of echoes unit is used to detect whether that there are residual echos according to the residual echo detecting factor；The residual echo disappears Except device can be used for eliminating the residual echo detected.

And the calculating of residual echo detecting factor can be calculated by the following two kinds situation:

The first situation: in an application scenarios, detecting factor computing unit is further used for according to the far-end speech It is described between related power and the far-end speech signal and near-end voice signals between signal and error voice signal Related power determines residual echo detecting factor.Further, detecting factor computing unit is further used for determining the distal end The related power and the far-end speech signal and near-end voice signals between voice signal and the error voice signal Between the related power ratio be the residual echo detecting factor.In the specific implementation, it can calculate described remote The related power between voice signal and the error voice signal is held, and calculates the far-end speech signal and close The related power between voice signal is held, then calculates odds ratio between the two related powers and as described residual The remaining detection of echoes factor.Residual echo detection unit detecting whether there are when residual echo, if residual echo detection because Son is greater than residual echo detecting factor thresholding, then determines that there are residual echos, otherwise determines that residual echo is not present.

Second case: in another application scenarios, detecting factor computing unit is further used for according to the distal end Related power and the far-end speech signal and near-end voice signals between voice signal and the echo voice signal of estimation Between the related power, determine the residual echo detecting factor.It is possible to further detecting factor computing unit into one Step is for determining the related power between the far-end speech signal and the echo voice signal of estimation and the distal end language The ratio of the related power between sound signal and near-end voice signals is the residual echo detecting factor.It is being embodied When, it can first determine the related power between the far-end speech signal and the echo voice signal of estimation, Yi Jisuo The related power between far-end speech signal and near-end voice signals is stated, then calculates the ratio between the two related powers And as the residual echo detecting factor.Residual echo detection unit is being detected whether there are when residual echo, if institute Residual echo detecting factor is stated less than residual echo detecting factor thresholding, then determines that there are residual echos, otherwise determines to be not present Residual echo.

Herein, it should be noted that taking above-mentioned different modes to calculate residual echo detecting factor, judging whether to deposit Residual echo detecting factor thresholding is used in residual echo Shi Douhui, therefore, in theory, individually corresponds to any one of them It calculates for residual echo detecting factor mode, residual echo detecting factor thresholding can flexibly be set according to accuracy in detection It sets.Alternatively, in other words, if any one of above two situation is taken to calculate residual echo detecting factor mode, remnants are returned Sound detection factor thresholding can carry out flexible setting according to accuracy in detection.But if to take above two situation simultaneously If the detection for carrying out residual echo, for the ease of distinguishing, according between far-end speech signal and error voice signal The related power between related power and the far-end speech signal and near-end voice signals, determining residual echo inspection It surveys the factor and is referred to as the first residual echo detecting factor, corresponding thresholding is referred to as the first residual echo detecting factor thresholding；It is right Ying Di, according to the related power and the distal end language between the far-end speech signal and the echo voice signal of estimation The related power between sound signal and near-end voice signals, it is residual that the residual echo detecting factor determined is referred to as second The remaining detection of echoes factor, corresponding thresholding are referred to as the second residual echo detecting factor thresholding；Preferably, the second residual echo is examined Factor thresholding is surveyed less than the first residual echo detecting factor thresholding.

Above two situation is respectively corresponded, specifically for the first situation, detecting factor computing unit is further used for According to the far-end speech signal and error voice letter between related power, the estimation echo voice signal power with And the residual echo detecting factor, the first residual echo inhibiting factor is determined, to press down according to first residual echo The factor processed eliminates the residual echo.

Above two situation is respectively corresponded, specifically for second case, detecting factor computing unit is further used for According to the far-end speech signal and the echo voice of estimation letter between related power, the error voice signal power with And the residual echo detecting factor, the first residual echo inhibiting factor is determined, to press down according to first residual echo The factor processed eliminates the residual echo.

Further, if in order to improve accuracy in detection, it may be incorporated into the normalization correlation factor between signal, it will Its detection that the residual echo is carried out in conjunction with the residual echo detecting factor.

Specifically, such as the first situation, it can also include correlation factor computing unit, determine the far-end speech The product of the power of the power and error voice signal of signal, according to the product and the far-end speech signal with it is described Related power between error voice signal, calculates normalization correlation factor, and the normalization correlation factor and the remnants are returned The sound detection factor combines, to carry out the detection of the residual echo.

Specifically, such as above-mentioned second case, it can also include correlation factor computing unit, be used to determine institute The product for stating the power of the power of far-end speech signal and the echo voice signal of estimation, according to the product and described remote The related power between voice signal and the echo voice signal of the estimation is held, normalization correlation factor, the normalizing are calculated Change correlation factor in conjunction with the residual echo detecting factor, to carry out the detection of the residual echo.

Again, it should be noted that for the ease of distinguishing, again for the correlation factor computing unit under also the first situation It can be referred to as the first correlation factor computing unit, can be referred to as again for the correlation factor computing unit under second case Second correlation factor computing unit.Certainly, the first correlation factor computing unit and the second correlation factor computing unit can also answer With.

Further, residual echo detection unit according to the normalization correlation factor and the residual echo detect because The combination of son specifically can be according to the normalization correlation factor and normalization correlation factor when carrying out the detection of residual echo The comparison result of the comparison result of thresholding and the residual echo detecting factor and residual echo detecting factor thresholding carries out The detection of residual echo.

Further, correspond to the first above-mentioned situation, residual echo detection unit can be further used for: if in single Talking state is held, then the second residual echo inhibiting factor is generated according to the normalization correlation factor, is taken described first remaining time Minimum value in sound inhibiting factor and the second residual echo inhibiting factor as effective residual echo inhibiting factor, for it is described Error voice signal carries out product calculation to eliminate the residual echo.

Further, correspond to above-mentioned second case, residual echo detection unit can be further used for: if in single Talking state is held, then the second residual echo inhibiting factor is generated according to the normalization correlation factor, is taken described first remaining time Maximum value in sound inhibiting factor and the second residual echo inhibiting factor as effective residual echo inhibiting factor, for it is described Error voice signal carries out product calculation to eliminate the residual echo.

Further, in above two situation, the second residual echo inhibiting factor is a stationary value and respective feelings The difference of the corresponding normalization factor of shape.

Further, it is preferable to ground, residual echo detection unit can be further used for: if being in dual end communication state, Third residual echo inhibiting factor is generated according to the echo voice signal power of the estimation of priori and near-end voice signals power, The maximum value in the first residual echo inhibiting factor and third residual echo inhibiting factor is taken to press down as effective residual echo The factor processed, to eliminate the residual echo；Alternatively, if be in dual end communication state, according to the normalization correlation factor with And the first residual echo inhibiting factor determines effective residual echo inhibiting factor, for adjusting filter coefficient to eliminate State residual echo.

Further, in the embodiment for using effective residual echo inhibiting factor, it is contemplated that avoid the damage to voice Wound, residual echo detection device can also include: amending unit, if the normalization correlation factor be greater than normalization correlation because The upper limit of cervical orifice of uterus limit, for reducing effective residual echo inhibiting factor to carry out to effective residual echo inhibiting factor Amendment；Alternatively, if the normalization correlation factor is less than the lower limit of normalization correlation factor thresholding, it is described effectively residual for increasing Remaining echo suppression factor is to be modified effective residual echo inhibiting factor.

Specifically, in dual end communication state, it is preferred to use above-mentioned inhibiting factor amending unit presses down effective residual echo The factor processed is modified.And in single-ended talking state, it can be using above-mentioned amending unit to effective residual echo inhibiting factor It is modified, can also not have to correct effective residual echo inhibiting factor.Certainly, in fact, if in dual end communication state The damage that may cause to voice is not considered, can also not have to correct effective residual echo inhibiting factor.

It similarly, can also include that detecting factor amendment is single especially in dual end communication state in order to avoid speech damage Member determines effective and invalid residual echo detecting factor, root according to the effective thresholding of residual echo detecting factor of setting The invalid residual echo detecting factor is modified according to the mean value of the effective residual echo detecting factor.

It is detecting that specifically residual echo can be carried out according to following scheme there are after residual echo by above scheme It eliminates.

Scheme one: according to the product of the residual echo inhibiting factor and the error voice signal, residual echo is obtained The error voice signal after elimination, executing subject can eliminate unit for residual echo.

Scheme two: according to the residual echo detecting factor, the filter factor for adjusting sef-adapting filter updates step-length, root Filter coefficient is adjusted according to the filter coefficient update step-length, to eliminate the residual echo.Adjust sef-adapting filter Filter factor updates step-length, and the executing subject that filter coefficient is adjusted according to the filter coefficient update step-length can be remnants Echo cancellation unit.

And may exist following three kinds of possible implementations for above scheme two:

2.1 determine that the mean value of effective residual echo detecting factor and filter coefficient maximum update the product of step-length, Step-length and the product are updated according to filter coefficient minimum, determines that filter coefficient effectively updates step-length.

2.2 determine that effective residual echo detecting factor and filter coefficient maximum update the product of step-length, according to filter Wave device coefficient minimum updates step-length and the product, determines that filter coefficient effectively updates step-length.

2.3 according to residual echo detecting factor and step-length transforming function transformation function, determines that filter coefficient effectively updates step-length.

Certainly, it should be noted that above scheme one or scheme two individually can be taken simultaneously, can also use Scheme one and scheme two combine, more thoroughly to eliminate residual echo.

The detection that embodiment illustrates how realization residual echo is provided with this in following embodiments herein, and is being examined How to be eliminated after measuring residual echo.

It is mainly how real based on calculating residual echo detecting factor for the first above-mentioned situation in following embodiments The detection of existing residual echo is illustrated.Simultaneously as realizing remaining return based on second case residual echo detecting factor Logic under the first situation of the logical AND of sound detected is interspersed to be briefly described in the following embodiments on the contrary, therefore, So that those of ordinary skill in the art are clearly understood from the technical solution of the application.

Fig. 3 is a kind of residual echo detection method flow diagram of the embodiment of the present application；As shown in figure 3, comprising:

S301, it determines related power between far-end speech signal and error voice signal, and determines the distal end language Related power between sound signal and near-end voice signals；

In the present embodiment, if the far-end speech signal is far-end speech time-domain signal, the error voice signal is to miss Poor voice time domain signal, the near-end voice signals are near-end speech time-domain signal, then further include: divide first in step 301 The far-end speech time-domain signal, the error voice time domain signal, the near-end speech time-domain signal frequency domain is not transformed into Far-end speech frequency-region signal, the error voice frequency domain signal, the near-end speech frequency-region signal are obtained, then is determined on frequency domain Related power between far-end speech signal and error voice signal, and determine that the far-end speech signal and near-end speech are believed Related power between number.

Further, in this embodiment far-end speech signal and error voice signal are determined in step S301 on frequency domain Between related power, and when determining the related power between the far-end speech signal and near-end voice signals, specifically with Frequency point is that unit determines between far-end speech frequency-region signal and the corresponding frequency-region signal of error voice frequency domain signal on frequency domain Related power, and determine the related function between far-end speech frequency-region signal and the correspondence frequency-region signal of near-end speech frequency-region signal Rate.

Specifically, in a kind of application scenarios, as previously described, it is assumed that far-end analog voice signal is denoted as x (t), proximal end mould Quasi- voice signal is denoted as y (t), and the echo analog voice signal of estimation is denoted asError simulation voice signal e (t).These moulds Quasi- voice signal obtains remote digital voice signal after analog-to-digital conversion and is denoted as x (n), and proximal end audio digital signals are denoted as y (n), the echo audio digital signals of estimation are denoted asError audio digital signals are denoted as e (n).

Above-mentioned each audio digital signals change to obtain the i-th frame signal of remote digital voice signal by fast Fourier Frequency domain signal X=[X (1), X (2) ... X (N)]^T, proximal end audio digital signals the i-th frame signal frequency-region signal Y=[Y (1), Y(2)...Y(N)]^T, estimation echo audio digital signals the i-th frame signal frequency-region signalFrequency-region signal E=[E (1), E of i-th frame signal of error audio digital signals (2)...E(N)]^T, N is the frequency point number of sef-adapting filter.

Shown in the calculation such as formula (1) of above-mentioned each related power:

In above-mentioned formula (1), n-th frame signal is believed with n-th frame in error audio digital signals in remote digital voice signal Related power number between the correspondence frequency-region signal on k-th of frequency point is denoted as S_xeIn (k, n) and remote digital voice signal N-th frame signal corresponds to the related power between frequency-region signal with n-th frame signal in the audio digital signals of proximal end on k-th of frequency point It is denoted as S_xy(k, n), in remote digital voice signal in the (n-1)th frame signal and error audio digital signals the (n-1)th frame signal in kth The related power between correspondence frequency-region signal on a frequency point is denoted as S_xe(n-1)th in (k, n-1) and remote digital voice signal Frame signal corresponds to the related power between frequency-region signal with the (n-1)th frame signal in the audio digital signals of proximal end on k-th of frequency point It is denoted as S_xy(k, n-1), X (k, n) indicate that n-th frame signal corresponds to frequency-region signal on k-th of frequency point in remote digital voice signal, Y (k, n) indicates that n-th frame signal corresponds to frequency-region signal on k-th of frequency point in the audio digital signals of proximal end, and E (k, n) indicates error N-th frame signal corresponds to frequency-region signal, Y (k, n) on k-th of frequency point in audio digital signals^*, E (k, n)^*Respectively indicate Y (k, N), the conjugation of E (k, n), λ are smoothing factor, 0 < λ < 1, k=1.........N.

S302, according between far-end speech signal and error voice signal the related power and the far-end speech The related power between signal and near-end voice signals, determines residual echo detecting factor, to detect whether to there are remnants Echo.

In the present embodiment, in step s 302 according to the related function between far-end speech signal and error voice signal The related power between rate and the far-end speech signal and near-end voice signals, determines residual echo detecting factor When, specifically can according between far-end speech signal and error voice signal the related power and the far-end speech signal The ratio of the related power between near-end voice signals is as the residual echo detecting factor.

Specifically, in an application scenarios, the residual echo detecting factor is calculated especially by following formula (2).

In above-mentioned formula (2), η_xe(k, n) indicates the residual echo detecting factor, and σ is controlling elements, prevents formula (2) Denominator be zero, σ value be less than S_xy(k,n)。

Further, the residual echo detecting factor is compared with residual echo detecting factor thresholding, if institute Residual echo detecting factor is stated greater than residual echo detecting factor thresholding, then shows otherwise to show there are more residual echo There are less or there is no residual echos.

Fig. 4 is another residual echo detection method flow diagram of the embodiment of the present application；As shown in figure 4, comprising:

S401, related power between far-end speech signal and the echo voice signal of estimation is determined, and described in determining Related power between far-end speech signal and near-end voice signals；

It indicates in remote digital voice signal n-th in n-th frame signal and the echo audio digital signals of estimation Related power of the frame signal between the correspondence frequency-region signal on k-th of frequency point,Indicate remote digital voice letter The (n-1)th frame signal is believed with corresponding frequency domain of the (n-1)th frame signal in the echo audio digital signals of estimation on k-th of frequency point in number Related power between number, X (k, n) indicate that n-th frame signal corresponds to frequency domain letter on k-th of frequency point in remote digital voice signal Number,Indicate that n-th frame signal corresponds to frequency-region signal, * table on k-th of frequency point in the echo audio digital signals of estimation Show conjugation.

Related power between the far-end speech signal and near-end voice signals is referring to above-mentioned formula (1).

S402, according to the related power between far-end speech signal and the echo voice signal of estimation and described remote The related power between voice signal and near-end voice signals is held, residual echo detecting factor is determined, to detect whether to deposit In residual echo.

In the present embodiment, in step S402 according between far-end speech signal and the echo voice signal of estimation The related power between related power and the far-end speech signal and near-end voice signals determines that residual echo detects Because of the period of the day from 11 p.m. to 1 a.m, specifically between far-end speech signal and the echo voice signal of estimation the related power and the far-end speech The ratio of the related power between signal and near-end voice signals is as residual echo detecting factor.

In a specific application scenarios, residual echo detecting factor is calculated referring to following formula (4).

In above-mentioned formula (4),Indicate residual echo detecting factor, σ is controlling elements, prevents point of formula (4) Mother is that zero, σ value is less than S_xy(k,n)。

Further, the residual echo detecting factor that will be calculated according to formula (4) and residual echo detection because Cervical orifice of uterus limit be compared, if the residual echo detecting factor be less than residual echo detecting factor thresholding, show exist compared with Otherwise more residual echos show that there are less or there is no residual echos.

It is carried out based on the residual echo detecting factor that formula (4) is calculated residual used in the detection of residual echo Remaining detection of echoes factor thresholding carries out the inspection of residual echo with the residual echo detecting factor being calculated based on formula (2) The size relation of residual echo detecting factor thresholding, refers to the related description in above-mentioned Fig. 1 embodiment used in surveying.

Herein, it should be noted that as previously described, because subtracting estimation from the proximal end audio digital signals y (n) Echo audio digital signalsThe error audio digital signals e (n) is obtained, therefore, in fact, due to according to formula (2) With the sum of two residual echo detecting factors being calculated according to formula (4) substantially 1 for theory, therefore, the two Residual echo detecting factor can be converted mutually.Therefore, it if carrying out the detection of residual echo using formula (4), carries out residual The detection logical AND of remaining echo carries out the detection logic of residual echo on the contrary, detection logic can specifically refer to herein with formula (2) It is residual echo inspection when being compared with corresponding residual echo detecting factor thresholding after obtaining residual echo detecting factor It surveys the factor and just determines that there are more residual echos above or below corresponding residual echo detecting factor thresholding, or show to deposit In treatment process less or there is no residual echo.

As previously mentioned, if it is determined there is more residual echo, then regard as that there are residual echos, then needs to be implemented subsequent Residual echo Processing for removing.If there is less or be completely absent residual echo, then can assert there is no residual echo, It does not need then to execute subsequent residual echo Processing for removing.

Fig. 5 is another residual echo detection method flow diagram of the embodiment of the present application；As shown in figure 5, implementing in Fig. 3 On the basis of example, in order to further increase the accuracy of residual echo detection and carry out simple quantitative point to residual echo It analyses, in the present embodiment, increases the correlation step of normalization correlation factor comprising:

S501, it determines related power between far-end speech signal and error voice signal, and determines the distal end language Related power between sound signal and near-end voice signals；

S502, according between far-end speech signal and error voice signal the related power and the far-end speech The related power between signal and near-end voice signals, determines residual echo detecting factor；

In the present embodiment, embodiment described in step S501 above-mentioned Fig. 3 similar with step S502.

The power of S503, the power for determining far-end speech signal and error voice signal；

In the present embodiment, as previously mentioned, being the power and error voice signal for counting far-end speech signal on frequency domain Power.I.e. far-end analog voice signal, error simulation voice signal progress analog-to-digital conversion obtain remote digital voice signal, miss Poor audio digital signals, then remote digital voice signal, error audio digital signals are transformed into frequency domain.

Specifically, following formula (5) calculation method that specifically can refer in a kind of application scenarios determines that far-end speech is believed Number power and error voice signal power.

In above-mentioned formula (5), S_xx(k, n) indicates pair of the n-th frame signal on k-th of frequency point in remote digital voice signal Answer the power of frequency-region signal, S_xx(k, n-1) indicates pair of (n-1)th frame signal on k-th of frequency point in remote digital voice signal Answer the power of frequency-region signal, S_ee(k, n) indicates corresponding frequency of the n-th frame signal on k-th of frequency point in error audio digital signals The power of domain signal, S_ee(k, n-1) indicates corresponding frequency of (n-1)th frame signal on k-th of frequency point in error audio digital signals The power of domain signal, X (k, n) indicate that n-th frame signal corresponds to frequency-region signal, E on k-th of frequency point in remote digital voice signal (k, n) indicates that n-th frame signal corresponds to frequency-region signal, X (k, n) on k-th of frequency point in error audio digital signals^** with E (k, n)^*The conjugation of X (k, n) and E (k, n) are respectively indicated, λ is smoothing factor, 0 < λ < 1.

In other embodiments, for above-mentioned second case, the power of the echo voice signal of estimation is according to following public affairs Formula (5) ' it calculates:

Formula (5) ' in,Indicate that n-th frame signal is right on k-th of frequency point in the echo audio digital signals of estimation Frequency-region signal is answered,Indicate pair of (n-1)th frame signal on k-th of frequency point in the echo audio digital signals of estimation The power of frequency-region signal is answered,It indicatesConjugation,In the echo audio digital signals for indicating estimation The power of correspondence frequency-region signal of the n-th frame signal on k-th of frequency point.

S504, according to the power of far-end speech signal and the power of error voice signal, determine the far-end speech signal With the normalization correlation factor of the error voice signal；

In the present embodiment, in a kind of application scenarios, according to the far-end speech signal and the error voice signal it Between related power, the ratio with the power product of the power and error audio digital signals of the remote digital voice signal, Calculate the normalization correlation factor.Specifically can refer to following formula (6) calculate calculated on frequency domain normalization correlation because Son.

In above-mentioned formula (6), by calculating n-th frame signal and error audio digital signals in remote digital voice signal Related power S of the middle n-th frame signal between the correspondence frequency-region signal on k-th of frequency point_xe(k, n) believes with remote digital voice The power S of correspondence frequency-region signal of the n-th frame signal on k-th of frequency point in number_xxN-th in (k, n) and error audio digital signals The power S of correspondence frequency-region signal of the frame signal on k-th of frequency point_eeThe ratio of (k, n) product, using the ratio as normalization phase The factor is closed, C is denoted as_xe(k, n), to indicate that n-th frame is believed in the remote digital voice signal and the error audio digital signals The normalization correlation factor of correspondence frequency-region signal number on k-th of frequency point.

In other embodiments, under above-mentioned second case, the far-end speech letter is determined referring to above-mentioned formula (6) Number power and estimation echo voice signal power product, according to the product and the far-end speech signal with Related power between the echo voice signal of the estimation calculates normalization correlation factor, and normalization factor is referring in particular to such as Lower formula (6) ' it calculates:

Above-mentioned formula (6) ' in parameter declaration referring to above-mentioned other embodiments.

S505, according to the normalization correlation factor and the residual echo detecting factor, to existing residual echo into Row quantitative analysis.

In the present embodiment, according to the normalization correlation factor and the residual echo detecting factor, i.e., the described normalization Correlation factor carries out quantitative analysis in conjunction with the residual echo detecting factor, to existing residual echo, specifically can root According to it is described normalization correlation factor and normalize correlation factor thresholding comparison result and the residual echo detecting factor with The comparison result of residual echo detecting factor thresholding carries out quantitative analysis to existing residual echo, substantially estimates remaining return The number of sound can carry out flexibly herein it should be noted that the number of residual echo is only relative concept according to application scenarios Setting, residual echo is more, i.e., it is believed that there are residual echo, otherwise, i.e., it is believed that residual echo is not present.

Further, in this embodiment in order to further increase accuracy, it is related according to the normalization in step S505 The factor and the residual echo detecting factor are examined according to dual end communication first when carrying out quantitative analysis to existing residual echo The talking state that device detects is surveyed, that is, is in single-ended talking state or dual end communication state, later further according to talking state, benefit With the residual echo detecting factor, alternatively, using the residual echo detecting factor and the normalization correlation factor, it is right Existing residual echo carries out quantitative analysis.Herein, the residual echo detecting factor is calculated referring to above-mentioned formula (2).

In a kind of application scenarios, if talking state is single-ended talking state, directly examined according to the residual echo It surveys the factor to be compared with residual echo detecting factor thresholding, if it is greater than residual echo detecting factor thresholding, then shows exist Otherwise residual echo is not present in residual echo, there is no residual echos to be regarded as there are less residual echo herein, ideal In the case of it is believed that be not present residual echo.Herein, it should be noted that in single-ended talking state, if sef-adapting filter The ability for eliminating echo is weaker, thus will lead to S_xe(k, n) and S_xyBoth (k, n) just relatively so that referring to upper It states formula (2) and the residual echo detecting factor η is calculated_xe(k, n) is larger, to be greater than residual echo detecting factor door Limit, shows that residual echo is more.On the contrary, more thoroughly disappearing if the ability of sef-adapting filter elimination echo is stronger In addition to echo, error voice frequency domain signal E (k) is smaller, levels off to 0 substantially, calculated η_xe(k, n) is also smaller, thus small In residual echo detecting factor thresholding, shows that residual echo is less or not there is residual echo theoretically.

In other embodiments, if calculating residual echo detecting factor based on above-mentioned second case, it is less than corresponding Residual echo detecting factor thresholding, shows that residual echo is more.Greater than corresponding residual echo detecting factor thresholding, show remnants Echo is less or not there is residual echo theoretically.

Alternatively, of course, it if it is single-ended talking state, can also be carried out in conjunction with the normalization correlation factor remaining The quantitative analysis of echo.But in actual implementation, it is contemplated that if the accuracy requirement of residual echo detection is not high, preferably The quantitative analysis of residual echo is only carried out with residual echo detecting factor.

In another application scenarios, for the first above-mentioned situation, if talking state is dual end communication state, due to There is the voice signal of near-end speaker at this time, the language of near-end speaker is accidentally injured when if avoiding carrying out the elimination of residual echo Sound signal, it is preferably it is also contemplated that described in addition to residual echo detecting factor to be considered when carrying out the quantitative analysis of residual echo Normalize correlation factor.

For this purpose, in order to improve the accuracy of residual echo detection, the upper limit η provided with residual echo detecting factor thresholding_up And lower limit η_low, and the upper limit C of normalization correlation factor thresholding_upAnd lower limit C_low, therefore, carry out quantifying for residual echo Analytic process is as follows:

(1) if η_xe(k, n) is more than or equal to upper limit η_up, and C_xe(k, n) is more than or equal to upper limit C_up, then show residual echo It is more；

(2) if η_xe(k, n) is more than or equal to upper limit η_up, and C_xe(k, n) is less than lower limit C_low, show at this time residual echo compared with It is few；If C_xe(k, n) is between upper limit C_upWith lower limit C_lowBetween, then show the residual echo there are moderate.

(3) if η_xe(k, n) is between thresholding η_lowWith η_upBetween, then show to eliminate partial echo, but exists remaining Echo, further according to C_xeThe quantitative analysis of (k, n) progress residual echo；If C_xe(k, n) is more than or equal to upper limit C_up, then show There are more residual echos；If C_xe(k, n) is between upper limit C_upWith lower limit C_lowBetween, then show that there are the remnants of moderate to return Sound.

(4) if η_xe(k, n) is less than thresholding η_lowOr C_xe(k, n) is less than C_low, then show that there are less residual echos.

In another application scenarios, for above-mentioned second case, corresponding residual echo detecting factor thresholding is utilized Upper limit η_upAnd lower limit η_low, and the upper limit C of normalization correlation factor thresholding_upAnd lower limit C_lowCarry out the inspection of residual echo It surveys opposite with said circumstances:

(1) ' ifLess than lower limit η_low, andLess than lower limit C_low, then show that residual echo is more；

(2) ' ifLess than lower limit η_low, andMore than or equal to upper limit C_up, show residual echo at this time It is less；IfBetween upper limit C_upWith lower limit C_lowBetween, then show the residual echo there are moderate.

(3) ' ifBetween lower limit η_lowWith upper limit η_upBetween, then show to eliminate partial echo, but exists Residual echo, further basisCarry out the quantitative analysis of residual echo；IfLess than lower limit C_low, then show There are more residual echos；IfBetween upper limit C_upWith lower limit C_lowBetween, then show that there are the remnants of moderate to return Sound.

(4) ' ifMore than or equal to upper limit η_up, orMore than or equal to upper limit C_up, then show exist Less residual echo.

Individually use the first situation or second case, corresponding residual echo detecting factor thresholding under respective situation Upper limit η_upAnd lower limit η_lowSize, and normalization correlation factor thresholding upper limit C_upAnd lower limit C_lowSize, root According to accuracy in detection flexible setting.But if using the first above-mentioned situation or second case simultaneously, first The upper limit η of residual echo detecting factor thresholding under kind situation_upAnd lower limit η_lowThe remnants respectively corresponded under second case are returned The lower limit η of sound detection factor thresholding_lowWith upper limit η_up, the upper limit C of the normalization correlation factor thresholding under the first situation_upAnd Lower limit C_lowRespectively correspond the lower limit C of the normalization correlation factor thresholding under second case_lowWith upper limit C_up。

Fig. 6 is the flow diagram of residual echo removing method in the embodiment of the present application；It is any to refer again to above-mentioned Fig. 3-Fig. 5 Embodiment is determined after there is the residual echo that should be eliminated, as shown in fig. 6, the process eliminated to residual echo is specifically wrapped Include following steps:

S601, judgement pass through state in single-ended talking state or both-end；

If S602, being in single-ended talking state, machine is eliminated according to for the residual echo being arranged under single-ended talking state System carries out the elimination of residual echo；

In the present embodiment, the result of talking state comes from double talk detection device.

In the present embodiment, in a kind of application scenarios, in step S602 carry out residual echo elimination when, setting it is residual Remaining echo cancellor mechanism can be with are as follows: determines the first residual echo inhibiting factor.It, can when determining the first residual echo inhibiting factor With according to the far-end speech signal and error voice letter between related power, estimation echo voice signal power and Residual echo detecting factor determines the first residual echo inhibiting factor.

Further, if of less demanding to the elimination degree of residual echo or not stringent, directly can directly may be used The elimination of residual echo is carried out using the first residual echo inhibiting factor as effective residual echo inhibiting factor.But it if examines Considering more will thoroughly eliminate residual echo, then can also generate the second residual echo according to normalization correlation factor and inhibit The factor takes the minimum value in the first residual echo inhibiting factor and the second residual echo inhibiting factor to return as effectively remaining Sound inhibiting factor, to eliminate the residual echo, effective residual echo inhibiting factor is smaller, then the dynamics that residual echo is eliminated is got over Greatly, effective residual echo is determined according to the normalization correlation factor and the first residual echo inhibiting factor to realize Inhibiting factor, to eliminate the residual echo.

Specifically, in the first above-mentioned situation, inhibited when for above-mentioned single-ended talking state based on effective residual echo The factor carries out the first residual echo inhibiting factor that residual echo elimination uses and the second residual echo inhibiting factor can be according to such as Lower formula (7) and (8) calculate, and effectively residual echo inhibiting factor can calculate according to following formula (9).

G₁(k, n)=1-C_xe(k, n) (8)

G ' (k, n)=min (G (k, n), G₁(k, n)) (9)

In above-mentioned formula (7), G (k, n) indicates the first residual echo inhibiting factor；In above-mentioned formula (8), G₁(k, n) table Show the second residual echo inhibiting factor；In above-mentioned formula (9), G ' (k, n) indicates effective residual echo inhibiting factor.

Referring to above-mentioned formula (8), in the same way, the second residual echo inhibiting factor is a stationary value and the normalizing Change the difference of the factor.The stationary value is theoretically equal to 1, but in practice due to various other influences, which is likely larger than 1.

Specifically, in above-mentioned second case, referring to formula (7) ' i.e. by according to the far-end speech signal with estimate The power and the residual echo detecting factor of related power, the error voice signal between the echo voice letter of meter, The first residual echo inhibiting factor is determined, to eliminate the residual echo according to the first residual echo inhibiting factor. If single-ended talking state is in, referring to formula (8) ' according to the normalization correlation factor generate the inhibition of the second residual echo because Son takes the maximum value conduct in the first residual echo inhibiting factor and the second residual echo inhibiting factor referring to formula (9) ' Effective residual echo inhibiting factor, for carrying out product calculation with the error voice signal to eliminate the residual echo.

G ' (k, n)=max (G (k, n), G₁(k, n)) (9) '

After obtaining effective residual echo inhibiting factor, it is referred to following formula (10) and carries out disappearing for residual echo It removes.

In above-mentioned formula (10), E (k, n) indicates that n-th frame signal is corresponding on k-th of frequency point in error audio digital signals Frequency-region signal,Indicate that n-th frame signal is in k-th of frequency point in the error audio digital signals after residual echo is eliminated It is described to eliminate that upper corresponding frequency-region signal, i.e., effective residual echo inhibiting factor and the error voice signal carry out product calculation Residual echo.

In the present embodiment, in another application scenarios, for the situation of single-ended talking state, then directly returned according to remnants The sound detection factor adjusts the filter factor of sef-adapting filter, to eliminate the residual echo.Further, according to remnants The detection of echoes factor adjusts the filter factor of sef-adapting filter, when eliminating the residual echo, with specific reference to residual echo Detecting factor, the filter factor for adjusting sef-adapting filter update step-length, are adjusted and filtered according to the filter coefficient update step-length Wave device coefficient, to eliminate the residual echo.

Specifically, according to residual echo detecting factor, the filter factor for adjusting sef-adapting filter updates step-length, comprising: It determines effective frequency point, and effective residual echo detecting factor is screened according to effective frequency point；Calculate effective residual echo detection The mean value and filter coefficient maximum of the factor update the product of step-length；Maximum update of the product and filter coefficient is taken to walk Long maximum value effectively updates step-length as filter coefficient；Step-length is effectively updated according to the filter coefficient updates the filter Wave device coefficient, to eliminate the residual echo, this residual echo is corresponded in time domain.

In the present embodiment, effective frequency point is screened according to the effective frequency range 300-3400Hz of human speech.

Specifically, it (11) can go to determine that filter coefficient effectively updates step-length according to the following formula.

In above-mentioned formula (11), the mean value and filter coefficient maximum that μ indicates effective residual echo detecting factor are more The product of new step-length, μ_maxIndicate that filter coefficient maximum updates step-length；μ_minIndicate that filter coefficient minimum updates step-length；μ ' table Show that filter coefficient effectively updates step-length, takes the maximum step-length that updates that can realize the very fast of filter coefficient faster in formula (11) It updates.

Specifically, (12) filter coefficient can be updated in the time domain according to the following formula.

In above-mentioned formula (12), w (n) indicates the filter coefficient for being directed to remote digital voice signal n-th frame signal；x(n) Indicate n-th frame signal in remote digital voice signal；E (n) indicates n-th frame signal in error audio digital signals；‖x(n)‖²Table Show the energy of n-th frame signal in remote digital voice signal.

Above-mentioned filter coefficient update step-length is determined in the time domain.In fact, alternatively, it can also be in frequency Domain is filtered the determination that device coefficient effectively updates step-length, i.e., the elimination that residual echo is carried out as unit of frequency point specifically disappears Except the far-end speech time-domain signal is in residual echo present on each frequency point.On frequency domain according to residual echo detect because Son, the filter factor that sef-adapting filter is adjusted as unit of frequency point update step-length, and for n-th frame signal, step-length updates step It comprises determining that effective frequency point, and effective residual echo detecting factor is screened according to effective frequency point；For effective frequency point k, calculate Effective residual echo detecting factor η (k, n) and filter coefficient maximum update step size mu_maxProduct be new update step size mu (k, n)；Calculated new step size mu (k, n) and filter coefficient minimum is taken to update step size mu_minMaximum value as it is described effectively The filter coefficient of frequency point effectively updates step size mu ' (k, n)；Step-length is effectively updated according to the filter coefficient updates the filter Wave device coefficient, with eliminate one by one the far-end speech time-domain signal corresponded on effective frequency point it is described residual present in frequency-region signal Remaining echo, this residual echo are also to correspond on frequency domain.A kind of specific implementation in application scenarios can be found in following formula (13)。

Alternatively, for the situation on frequency domain, when obtaining k-th of frequency point filter coefficient and effectively updating step-length, with specific reference to K-th of frequency point residual echo detecting factor η (k, n) and selected step-length transforming function transformation function f (η (k, n)), determine k-th of frequency point Effective filter update step-length, exists to eliminate the far-end speech time-domain signal and correspond in frequency-region signal on k-th of frequency point The residual echo.A kind of specific implementation in application scenarios can be found in following formula (14).Based on formula (13) The mode that filter coefficient effectively updates step-length is calculated, the mode that formula (14) calculating filter coefficient effectively updates step-length is simpler Single, data calculation amount is smaller.

Specifically, following formula (13) are referred to determine for the filter coefficient being generally applicable on each frequency point Effectively update step-length.

μ (k, n)=η (k, n) * μ_max

μ ' (k, n)=max (μ (k, n), μ_min) (13)

Step-length is effectively updated for the filter coefficient of each frequency point according to formula (14) adjustment.

μ (k, n)=μ ' (k, n-1)+f (η (k, n)) μ_step

μ ' (k, n)=max (μ (k, n), μ_min) (14)

In above-mentioned formula (13), (14), η (k, n) indicates that n-th frame signal is in k-th of frequency point in remote digital voice signal Upper corresponding residual echo detecting factor；μ (k, n) indicates the corresponding residual echo inspection effective on k-th of frequency point of n-th frame signal It surveys the factor and filter coefficient maximum updates the product of step-length；μ ' (k, n) indicates n-th frame letter in corresponding remote digital voice signal Corresponding filter coefficient effectively updates step-length number on k-th of frequency point；F (η (k, n)) indicates corresponding remote digital voice signal Middle n-th frame signal step-length transforming function transformation function on k-th of frequency point, is substantially the function of residual echo detecting factor, according to answering With scene flexible setting, the specific numerical value of f (η (k, n)) can be positive number, or negative, primarily to meeting application The needs of scene are to reduce or increase when updating step-length；μ_stepFor the stepping of the adjusting of each step-length, μ ' (k, n-1) is indicated The (n-1)th frame signal corresponding filter coefficient on k-th of frequency point effectively updates step-length, In in corresponding remote digital voice signal The step size mu (k, n) and filter coefficient minimum for taking k-th of frequency point newly to calculate in formula (14) update step size mu_minMinimum value in The maximally efficient filter update step-length of maximum value.

Specifically, (15) filter coefficient can be updated on frequency domain according to the following formula.

w_n+1(k) it indicates to correspond to frequency-region signal on k-th of frequency point for the (n+1)th frame signal in remote digital voice signal Filter coefficient；w_n(k) it indicates to correspond to frequency domain letter on k-th of frequency point for n-th frame signal in remote digital voice signal Number filter coefficient；| | X (k, n) | |²Indicate n-th frame signal corresponding frequency on k-th of frequency point in remote digital voice signal The energy of domain signal；δ indicates controlling elements；X^*(k, n) indicates that n-th frame signal is in k-th of frequency point in remote digital voice signal The conjugation of upper corresponding frequency-region signal；E (k, n) indicates n-th frame signal corresponding frequency on k-th of frequency point in error voice time domain signal Domain signal.

If S603, being in dual end communication state, machine is eliminated according to for the residual echo being arranged under dual end communication state System carries out the elimination of residual echo.

In the present embodiment, in the present embodiment, in a kind of application scenarios, the elimination of residual echo is carried out in step S602 When have determined that the first residual echo inhibiting factor.Therefore, further, for the remnants being arranged under dual end communication state Echo cancellor mechanism are as follows: third residual echo is generated according to the echo voice signal power of estimation and near-end voice signals power Inhibiting factor takes the maximum value in the first residual echo inhibiting factor and third residual echo inhibiting factor as effectively residual Remaining echo suppression factor, and the residual echo is eliminated referring to above-mentioned formula (10).As previously mentioned, the reason of being maximized exists In, since effective residual echo inhibiting factor is bigger, the dynamics that residual echo is eliminated is bigger, when being in dual end communication state, The dynamics that residual echo is eliminated is bigger, not will cause the accidental injury of voice to eliminate residual echo as much as possible again, for this purpose, this Place is maximized.

Herein, in a specific application scenarios, can be believed by testing the echo voice for the estimation for obtaining a priori Number, and the power of the echo voice signal of the estimation of priori is calculated, further third is generated together with near-end voice signals power Residual echo inhibiting factor, herein, as previously mentioned, be on frequency domain calculate priori estimation echo voice signal power with And near-end voice signals power.

Specifically, the power and proximal end language of the echo voice signal of the estimation of following formula (16) calculating priori be can refer to Sound signal power calculates third residual echo inhibiting factor referring to formula (17).

Indicate n-th frame signal corresponding frequency on k-th of frequency point in the estimated echo audio digital signals of priori The power of domain signal,Indicate that the (n-1)th frame signal is in k-th of frequency point in the estimated echo audio digital signals of priori The power of upper corresponding frequency-region signal,Indicate that n-th frame signal is at k-th in the estimated echo audio digital signals of priori Frequency-region signal is corresponded on frequency point,Indicate that n-th frame signal is in k-th of frequency in the estimated echo audio digital signals of priori The conjugation of frequency-region signal is corresponded on point；S_yy(k, n) indicates that n-th frame signal is right on k-th of frequency point in the audio digital signals of proximal end Answer the power of frequency-region signal, S_yy(k, n-1) indicates that the (n-1)th frame signal is corresponding on k-th of frequency point in the audio digital signals of proximal end The power of frequency-region signal；Y (k, n) indicates that n-th frame signal corresponds to frequency domain letter on k-th of frequency point in the audio digital signals of proximal end Number, Y (k, n)^*N-th frame signal corresponds to the conjugation of frequency-region signal on k-th of frequency point in expression proximal end audio digital signals.G₂(k, N) third residual echo inhibiting factor is indicated, β indicates that controlling elements, the calculation of third residual echo inhibiting factor are joined in detail See formula (17).

In addition, actually can also be according to n-th frame signal in posterior estimated echo audio digital signals in k-th of frequency point The power of upper corresponding frequency-region signal, the (n-1)th frame signal is corresponding on k-th of frequency point in posterior estimated echo audio digital signals The power of frequency-region signal, n-th frame signal corresponds to frequency domain letter on k-th of frequency point in posterior estimated echo audio digital signals Number, n-th frame signal corresponds to the conjugation of frequency-region signal on k-th of frequency point to carry out in posterior estimated echo audio digital signals The detection of residual echo.But comparatively, because n-th frame signal is at k-th in the estimated echo audio digital signals of priori The power ratio that frequency-region signal is corresponded on frequency point is more accurate, it is therefore preferable that the (n-1)th frame in the estimated echo audio digital signals of priori The power that signal corresponds to frequency-region signal on k-th of frequency point is updated to above-mentioned formula (16).

Take the maximum value in the first residual echo inhibiting factor and third residual echo inhibiting factor as effectively residual Shown in the specific calculation such as formula (18) of remaining echo suppression factor.

G ' (k, n)=max (G (k, n), G₂(k, n)) (18)

In above-mentioned formula (18), G (k, n) indicates the first residual echo inhibiting factor, G₂(k, n) indicates third residual echo Inhibiting factor, G'(k, n) indicate effective residual echo inhibiting factor.

After effective residual echo inhibiting factor when obtaining for dual end communication state, referring to above-mentioned formula (10) into The elimination of row residual echo.

Preferably, if the upper limit for normalizing correlation factor and being greater than normalization correlation factor thresholding, described in reduction Effective residual echo inhibiting factor is to be modified effective residual echo inhibiting factor；Alternatively, if the normalization phase The lower limit that the factor is less than normalization correlation factor thresholding is closed, then increases effective residual echo inhibiting factor to described effective Residual echo inhibiting factor is modified.When such as dual end communication state, it is possible that C_xe(k, n) is more than or equal to the upper limit C_up, alternatively, it is possible that C_xe(k, n) is less than lower limit C_low.Actually if C_xe(k, n) > C_up, illustrate near-end voice signals It is more that middle n-th frame signal corresponds to residual echo in frequency-region signal on k-th of frequency point, is near-end speaker voice in other words The probability of signal is smaller, therefore, n-th frame signal in near-end voice signals can be corresponded to frequency-region signal on k-th of frequency point and set The effective residual echo inhibiting factor G (k, n) set, which slightly reduces, realizes amendment；If C_xe(k, n) < C_low, illustrate near-end speech It is less to correspond to residual echo in frequency-region signal on k-th of frequency point for n-th frame signal in signal, is near-end speaker in other words The probability of voice signal is smaller, thoroughly eliminates residual echo while in order to reduce voice-loss as far as possible, by near-end speech N-th frame signal corresponds to effective residual echo the inhibiting factor G'(k, n of frequency-region signal setting on k-th of frequency point in signal) slightly Increase and realizes amendment.

In the above-described embodiments, effective and nothing can also be determined according to the effective thresholding of residual echo detecting factor of setting The residual echo detecting factor of effect, according to the mean value of the effective residual echo detecting factor to the invalid remnants The detection of echoes factor is modified.When it is implemented, if the residual echo detecting factor calculated is detected greater than residual echo The residual echo detecting factor of the effective thresholding of the factor, corresponding frequency point is then invalid, otherwise, the residual echo detecting factor of corresponding frequency point Then effectively.Then preferably invalid residual echo detecting factor is modified.Such as with the effective residual echo detection because The mean value of son replaces invalid residual echo detecting factor.Or more directly, not to invalid residual echo detecting factor Any processing is done, to directly ignore the invalid corresponding frequency point of residual echo detecting factor on residual echo Processing for removing.

In addition, specifically judge residual echo detecting factor it is invalid or effective when, as previously mentioned, can be according to above-mentioned Residual echo detecting factor addition and value that formula (2) and (4) are calculated judges, theoretically the two detecting factors plus It is 1 with value, but on actual product, it is contemplated that the associated value of the two detecting factors of the influence of other various factors actually may be used 1 can be greater than, but lie substantially in a stationary value, for this purpose, in practice, if above-mentioned formula (2) and (4) are calculated two The addition and value of residual echo detecting factor is greater than the stationary value, then shows that the residual echo detecting factor at corresponding frequency point is invalid, On the contrary, showing that the residual echo detecting factor at corresponding frequency point is effective.Effective to being calculated according to above-mentioned formula (2) Residual echo detecting factor is averaged to replace invalid residual echo detecting factor.

In the specific implementation, above-mentioned residual echo detection device can be integrated into speech processing chip.

The embodiment of the present application provides a kind of electronic equipment comprising speech processes core described in the application any embodiment Piece.

The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:

(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..

(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.

(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.

(4) server: the equipment of the service of calculating is provided, the composition of server includes processor 810, hard disk, memory, system Bus etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy Power, stability, reliability, safety, scalability, manageability etc. are more demanding.

(5) other electronic devices with data interaction function.

So far, the specific embodiment of this theme is described.Other embodiments are in the appended claims In range.In some cases, the movement recorded in detail in the claims can execute and still in a different order Desired result may be implemented.In addition, process depicted in the drawing not necessarily requires the particular order shown or continuous suitable Sequence, to realize desired result.In some embodiments, multitasking and parallel processing can be advantageous.

For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when application.

It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.

Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes: but is not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.

It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.

The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routine, programs, objects, the group for executing particular transaction or realizing particular abstract data type Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Affairs are executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.

All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.

The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal Replacement, improvement etc., should be included within the scope of the claims of this application.

Claims

1. a kind of residual echo detection method characterized by comprising

2. the method according to claim 1, wherein it is described according to far-end speech signal and near-end voice signals it Between related power, determine residual echo detecting factor, comprising: believe according to the far-end speech signal and the echo voice of estimation The related power between related power and the far-end speech signal and near-end voice signals between number, determine described in Residual echo detecting factor.

3. the method according to claim 1, wherein it is described according to far-end speech signal and near-end voice signals it Between related power, determine residual echo detecting factor, comprising: according between the far-end speech signal and error voice signal Related power and the far-end speech signal and near-end voice signals between the related power, determine residual echo examine Survey the factor.

4. according to the method described in claim 2, it is characterized in that, according to the echo voice of the far-end speech signal and estimation The related power between the related power and the far-end speech signal and near-end voice signals between signal, really The fixed residual echo detecting factor, comprising: determine the institute between the far-end speech signal and the echo voice signal of estimation The ratio for stating the related power between related power and the far-end speech signal and near-end voice signals is the remnants The detection of echoes factor.

5. according to the method described in claim 3, it is characterized in that, it is described according to far-end speech signal and error voice signal it Between the related power and the far-end speech signal and near-end voice signals between the related power, determine remaining The detection of echoes factor, comprising: determine the related power between the far-end speech signal and the error voice signal with The ratio of the related power between the far-end speech signal and near-end voice signals is the residual echo detecting factor.

6. according to the method described in claim 2, it is characterized in that, described detect whether that there are residual echos, comprising: if described Residual echo detecting factor is less than residual echo detecting factor thresholding, then determines that there are residual echos, otherwise determines that there is no residual Remaining echo.

7. according to the method described in claim 3, it is characterized in that, described detect whether that there are residual echos, comprising: if described Residual echo detecting factor is greater than residual echo detecting factor thresholding, then determines that there are residual echos, otherwise determines that there is no residual Remaining echo.

8. according to the method described in claim 2, it is characterized by further comprising: determine the power of the far-end speech signal with The product of the power of the echo voice signal of estimation, according to the product and the far-end speech signal and the estimation Related power between echo voice signal, calculates normalization correlation factor, and the normalization correlation factor and the remnants are returned The sound detection factor combines, to carry out the detection of the residual echo.

9. according to the method described in claim 3, it is characterized by further comprising: determine the power of the far-end speech signal with The product of the power of error voice signal, according to the product and the far-end speech signal and the error voice signal Between related power, calculate normalization correlation factor, the normalization correlation factor and the residual echo detecting factor knot It closes, to carry out the detection of the residual echo.

10. method according to claim 6 or 7, which is characterized in that the normalization correlation factor and the residual echo Detecting factor combines, to carry out the detection of residual echo, comprising: according to the normalization correlation factor and normalization correlation factor The comparison result of the comparison result of thresholding and the residual echo detecting factor and residual echo detecting factor thresholding carries out The detection of residual echo.

11. according to the method described in claim 10, it is characterized by further comprising: according to the far-end speech signal and error The power and the residual echo detecting factor of the echo voice signal of related power, the estimation between voice letter, really The fixed first residual echo inhibiting factor, to eliminate the residual echo according to the first residual echo inhibiting factor.

12. according to the method described in claim 10, it is characterized by further comprising: according to the far-end speech signal and estimation Echo voice letter between related power, the error voice signal power and the residual echo detecting factor, really The fixed first residual echo inhibiting factor, to eliminate the residual echo according to the first residual echo inhibiting factor.

13. according to the method for claim 12, which is characterized in that eliminate institute according to the first residual echo inhibiting factor State residual echo, comprising: if being in single-ended talking state, the suppression of the second residual echo is generated according to the normalization correlation factor The factor processed takes the maximum value in the first residual echo inhibiting factor and the second residual echo inhibiting factor as effectively remaining Echo suppression factor, for carrying out product calculation with the error voice signal to eliminate the residual echo.

14. according to the method for claim 11, which is characterized in that eliminate institute according to the first residual echo inhibiting factor State residual echo, comprising: if being in single-ended talking state, the suppression of the second residual echo is generated according to the normalization correlation factor The factor processed takes the minimum value in the first residual echo inhibiting factor and the second residual echo inhibiting factor as effectively remaining Echo suppression factor, for carrying out product calculation with the error voice signal to eliminate the residual echo.

15. method described in 3 or 14 according to claim 1, which is characterized in that the second residual echo inhibiting factor is one The difference of stationary value and the normalization factor.

16. according to the method for claim 11, which is characterized in that eliminate institute according to the first residual echo inhibiting factor State residual echo, comprising: if dual end communication state is in, according to the echo voice signal power of the estimation of priori and proximal end Voice signal power generates third residual echo inhibiting factor, takes the first residual echo inhibiting factor and third residual echo Maximum value in inhibiting factor is as effective residual echo inhibiting factor, to eliminate the residual echo.

17. method according to claim 11 or 12, which is characterized in that disappeared according to the first residual echo inhibiting factor Except the residual echo, comprising: had according to the normalization correlation factor and the first residual echo inhibiting factor determination Residual echo inhibiting factor is imitated, to eliminate the residual echo.

18. method described in any one of 3-17 according to claim 1, which is characterized in that further include: if the normalization phase The upper limit that the factor is greater than normalization correlation factor thresholding is closed, then reduces effective residual echo inhibiting factor to described effective Residual echo inhibiting factor is modified；Alternatively, if the normalization correlation factor is less than under normalization correlation factor thresholding Limit, then increase effective residual echo inhibiting factor to be modified to effective residual echo inhibiting factor.

19. method described in any one of -18 according to claim 1, which is characterized in that further include: it is returned according to the remnants of setting The effective thresholding of the sound detection factor determines effective and invalid residual echo detecting factor, according to effective described remaining time The mean value of the sound detection factor is modified the invalid residual echo detecting factor.

20. method described in any one of 3-18 according to claim 1, which is characterized in that further include: effective residual echo Inhibiting factor and the error voice signal carry out product calculation, to eliminate the residual echo.

21. method described in any one of -20 according to claim 1, which is characterized in that further include: according to the residual echo Detecting factor adjusts the filter factor of sef-adapting filter, to eliminate the residual echo.

22. according to the method for claim 21, which is characterized in that according to the residual echo detecting factor, adjust adaptive The filter factor of filter is answered, to eliminate the residual echo, comprising: according to the residual echo detecting factor, adjust adaptive It answers the filter factor of filter to update step-length, filter coefficient is adjusted according to the filter coefficient update step-length, to eliminate State residual echo.

23. according to the method for claim 22, which is characterized in that according to the residual echo detecting factor, adjust adaptive The filter factor of filter is answered to update step-length, comprising: to determine mean value and the filter system of effective residual echo detecting factor The maximum product for updating step-length of number, updates step-length and the product according to filter coefficient minimum, determines that filter coefficient has Effect updates step-length.

24. according to the method for claim 22, which is characterized in that according to the residual echo detecting factor, adjust adaptive The filter factor of filter is answered to update step-length, comprising: to determine that effective residual echo detecting factor and filter coefficient are maximum The product for updating step-length updates step-length and the product according to filter coefficient minimum, determines that filter coefficient effectively updates Step-length.

25. according to the method for claim 22, which is characterized in that according to the residual echo detecting factor, adjust adaptive The filter factor of filter is answered to update step-length, comprising: according to residual echo detecting factor and step-length transforming function transformation function, to determine filter Coefficient effectively updates step-length.

26. a kind of residual echo detection device characterized by comprising

Detecting factor computing unit, for determining residual according to the related power between far-end speech signal and near-end voice signals The remaining detection of echoes factor；

27. a kind of speech processing chip characterized by comprising residual echo detection device, the residual echo detection device It include: detecting factor computing unit, for determining residual according to the related power between far-end speech signal and near-end voice signals The remaining detection of echoes factor；Residual echo detection unit, for according to the residual echo detecting factor, detecting whether there are remnants Echo.

28. a kind of electronic equipment, which is characterized in that including the speech processing chip described in claim 27.