CN113362842A - Audio signal processing method and device - Google Patents

Audio signal processing method and device Download PDF

Info

Publication number
CN113362842A
CN113362842A CN202110739121.8A CN202110739121A CN113362842A CN 113362842 A CN113362842 A CN 113362842A CN 202110739121 A CN202110739121 A CN 202110739121A CN 113362842 A CN113362842 A CN 113362842A
Authority
CN
China
Prior art keywords
echo path
path vector
echo
vector
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110739121.8A
Other languages
Chinese (zh)
Other versions
CN113362842B (en
Inventor
操陈斌
何梦楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd, Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN202110739121.8A priority Critical patent/CN113362842B/en
Publication of CN113362842A publication Critical patent/CN113362842A/en
Application granted granted Critical
Publication of CN113362842B publication Critical patent/CN113362842B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Telephone Function (AREA)

Abstract

The present disclosure relates to the field of voice communication technologies, and in particular, to an audio signal processing method and apparatus. An audio signal processing method comprising: performing first filtering processing on the basis of a reference signal and a first audio signal picked up by a microphone to obtain a first echo path vector; wherein the first audio signal comprises an echo signal resulting from the playing of the reference signal by a speaker; performing second filtering processing on the basis of the reference signal and the first audio signal to obtain a second echo path vector; a filter update rate of the first filter process is different from a filter update rate of the second filter process; determining that a change in echo path is detected in response to the correlation of the first echo path vector and the second echo path vector not being greater than a preset threshold. The method disclosed by the embodiment of the invention can effectively detect the change of the echo path, has stronger detection universality and robustness and improves the echo cancellation effect.

Description

Audio signal processing method and device
Technical Field
The present disclosure relates to the field of voice communication technologies, and in particular, to an audio signal processing method and apparatus.
Background
For the voice communication field, after the near-end speaker plays the sound transmitted from the far-end, the near-end microphone picks up the sound again and transmits the sound to the far-end, so as to generate acoustic echo. Acoustic echo can severely impact voice call quality, and echo cancellation is a necessary process for voice communications.
In the related art, an adaptive filter is often used to estimate an echo path for echo cancellation, but for a complex Double talk (Double talk) acoustic scene such as multi-person online voice, the echo path frequently changes, and the echo cancellation effect is poor.
Disclosure of Invention
In order to improve an echo cancellation effect of a voice communication system, embodiments of the present disclosure provide an audio signal processing method and apparatus, an electronic device, and a storage medium.
In a first aspect, the disclosed embodiments provide an audio signal processing method, including:
performing first filtering processing on the basis of a reference signal and a first audio signal picked up by a microphone to obtain a first echo path vector; wherein the first audio signal comprises an echo signal resulting from the playing of the reference signal by a speaker;
performing second filtering processing on the basis of the reference signal and the first audio signal to obtain a second echo path vector; a filter update rate of the first filter process is different from a filter update rate of the second filter process;
determining that a change in echo path is detected in response to the correlation of the first echo path vector and the second echo path vector not being greater than a preset threshold.
In some embodiments, the first filtering process is a kalman filtering and the second filtering process is an NLMS filtering.
In some embodiments, the kalman filtering is a time domain kalman filtering.
In some embodiments, the performing a first filtering process based on the reference signal and a first audio signal picked up by a microphone to obtain a first echo path vector includes:
determining a first residual signal at the current moment according to the reference signal and the echo path vector at the previous moment;
and updating the echo path vector at the previous moment according to the first residual error signal and the Kalman gain vector at the current moment to obtain the first echo path vector at the current moment.
In some embodiments, the performing a second filtering process based on the reference signal and the first audio signal to obtain a second echo path vector includes:
determining an error signal at the current moment according to the reference signal and the echo path vector at the previous moment;
obtaining a second residual signal at the current moment according to the first audio signal and the error signal;
and updating the echo path vector at the previous moment according to the second residual signal at the current moment and a preset self-adaptive step size parameter to obtain the second echo path vector at the current moment.
In some embodiments, the determining that the echo path is detected to be changed in response to the correlation between the first echo path vector and the second echo path vector not being less than a preset threshold includes:
determining a correlation coefficient of the first echo path vector and the second echo path vector according to the first echo path vector and the second echo path vector;
and determining that the echo path is detected to be changed in response to the correlation coefficient not being greater than a preset correlation threshold.
In some embodiments, after the determining detects that the echo path has changed, the method further comprises:
initializing parameters of the first filtering process and the second filtering process.
In a second aspect, the present disclosure provides an audio signal processing apparatus, including:
a first filtering module configured to perform a first filtering process based on a reference signal and a first audio signal picked up by a microphone, so as to obtain a first echo path vector; wherein the first audio signal comprises an echo signal resulting from the playing of the reference signal by a speaker;
a second filtering module configured to perform second filtering processing based on the reference signal and the first audio signal to obtain a second echo path vector; a filter update rate of the first filter process is different from a filter update rate of the second filter process;
an echo path determination module configured to determine that an echo path change is detected in response to a correlation of the first echo path vector and the second echo path vector not being greater than a preset threshold.
In some embodiments, the first filtering process is a kalman filtering and the second filtering process is an NLMS filtering.
In some embodiments, the kalman filtering is a time domain kalman filtering.
In some embodiments, the first filtering module is specifically configured to:
determining a first residual signal at the current moment according to the reference signal and the echo path vector at the previous moment;
and updating the echo path vector at the previous moment according to the first residual error signal and the Kalman gain vector at the current moment to obtain the first echo path vector at the current moment.
In some embodiments, the second filtering module is specifically configured to:
determining an error signal at the current moment according to the reference signal and the echo path vector at the previous moment;
obtaining a second residual signal at the current moment according to the first audio signal and the error signal;
and updating the echo path vector at the previous moment according to the second residual signal at the current moment and a preset self-adaptive step size parameter to obtain the second echo path vector at the current moment.
In some embodiments, the echo path determination module is specifically configured to:
determining a correlation coefficient of the first echo path vector and the second echo path vector according to the first echo path vector and the second echo path vector;
and determining that the echo path is detected to be changed in response to the correlation coefficient not being greater than a preset correlation threshold.
In some embodiments, the audio signal processing apparatus of the present disclosure further includes:
an initialization module configured to initialize parameters of the first filtering process and the second filtering process.
In a third aspect, the disclosed embodiments provide an electronic device, including:
a microphone and a speaker;
a processor; and
a memory storing computer instructions for causing a processor to perform the method according to any of the embodiments of the first aspect.
In a fourth aspect, the embodiments of the present disclosure provide a storage medium storing computer instructions for causing a computer to execute the method according to any one of the embodiments of the first aspect.
The audio signal processing method of the embodiment of the disclosure includes performing first filtering processing on a reference signal and a first audio signal picked up by a microphone to obtain a first echo path vector, performing second filtering processing on the reference signal and the first audio signal picked up by the microphone to obtain a second echo path vector, and determining that a change of an echo path is detected in response to that a correlation between the first echo path vector and the second echo path vector is not greater than a preset threshold. The method of the embodiment of the disclosure adopts two filtering processes with different update rates, determines whether the echo path changes based on the correlation of two echo roadbed vectors, effectively detects the change of the echo path, has stronger detection universality and robustness, and improves the echo cancellation effect.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flow chart of an audio signal processing method in some embodiments according to the present disclosure.
Figure 2 is a schematic diagram of echo impulse responses before and after an echo path change in some embodiments according to the present disclosure.
Fig. 3 is a block diagram of a voice communication system in accordance with some embodiments of the present disclosure.
Fig. 4 is a flow chart of an audio signal processing method in some embodiments according to the present disclosure.
Fig. 5 is a flow chart of an audio signal processing method in some embodiments according to the present disclosure.
Fig. 6 is a flow chart of an audio signal processing method in some embodiments according to the present disclosure.
Fig. 7 is a block diagram of an audio signal processing apparatus according to some embodiments of the present disclosure.
FIG. 8 is a block diagram of an electronic device suitable for implementing the method of the present disclosure.
Detailed Description
The technical solutions of the present disclosure will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure. In addition, technical features involved in different embodiments of the present disclosure described below may be combined with each other as long as they do not conflict with each other.
In a voice communication system, due to the coupling of the speaker and the microphone, a far-end signal played by the speaker is received by the microphone and transmitted to the far-end again, forming an acoustic echo. Acoustic echo can seriously affect the quality of voice communication and simultaneously reduce voice awakening and voice recognition of man-machine interaction, so that echo cancellation needs to be carried out on a voice communication system in order to improve the quality of voice communication.
In the field of echo cancellation, adaptive filtering techniques based on variable step size control are generally used to estimate the echo path for echo cancellation, such as NLMS (Normalized Least Mean square) filters. However, in a Double Talk (Double Talk) scene with a complex acoustic environment, such as a multiplayer online game, the echo path may change frequently, and if the echo path estimation is performed based on the current filter update rate, the filter may diverge, the echo signal may not be estimated accurately, and a large amount of residual echo may be generated.
In the related art, when a scene of echo path change is dealt with, one method is to set an adaptive filter with a smaller update rate, so that when the echo path changes, a poor steady-state residual echo can be obtained. Another type of approach is to set a Double Talk Detector (DTD) to stop the filter update or reduce the update rate of the filter when a Double Talk scene is detected. However, both methods cannot solve the problem fundamentally, and since the filter update rate is slow, the echo signal cannot be estimated quickly at the initial time of the echo path change, resulting in more residual echoes.
Based on the defects in the related art, the embodiments of the present disclosure provide an audio signal processing method, an audio signal processing apparatus, an electronic device, and a storage medium, which aim to accurately detect echo path changes in a complex acoustic scene, thereby improving an echo cancellation effect.
In a first aspect, the embodiments of the present disclosure provide an audio signal processing method, which may be applied to an electronic device with a voice communication system, such as a mobile phone, a tablet computer, a notebook computer, and the like, and the disclosure is not limited thereto.
As shown in fig. 1, in some embodiments, an audio signal processing method of an example of the present disclosure includes:
s110, performing first filtering processing based on the reference signal and a first audio signal picked up by the microphone to obtain a first echo path vector.
And S120, performing second filtering processing based on the reference signal and the first audio signal to obtain a second echo path vector.
Specifically, the voice communication system of the embodiment of the present disclosure includes a speaker and a microphone, and the sound played by the speaker is picked up by the microphone and transmitted to the far end along with the near-end voice, so as to form an acoustic echo.
The reference signal refers to a far-end voice signal received by the system, and taking a mobile phone conversation scene as an example, an audio signal generated by speaking of a far-end speaker is received by the near-end system, namely the reference signal. After the reference signal is played by the loudspeaker, the echo signal reaches the microphone after propagating through an echo path between the loudspeaker and the microphone, so that the microphone picks up the echo signal when the reference signal reaches the microphone.
Meanwhile, for a double-talk scene, the microphone also collects a near-end voice signal generated when a near-end speaker speaks and a near-end background noise signal. That is, the first audio signal picked up by the microphone includes: a near-end speech signal, a background noise signal, and an echo signal.
In the embodiment of the present disclosure, the voice communication system is provided with two filters, that is, a first filter and a second filter, and the update rates of the first filter and the second filter are different. For example, the update rate of the first filter is greater than the update rate of the second filter, and for example, the update rate of the second filter is greater than the update rate of the first filter, which is not limited by the present disclosure.
The first filter performs first filtering processing on the first audio signal, and the second filter performs second filtering processing on the first audio signal. Because the filtering update rates of the two filtering processes are different, the filter with the higher update rate can quickly track the sudden change of the echo path at the initial moment when the echo path changes, and a transient echo path vector, namely a first echo path, is estimated; and the filter with the slower updating speed tracks the change of the echo path relatively slowly, and a steady-state echo path vector, namely a second echo path vector, is obtained through estimation.
In some embodiments, the first filtering process may be iteratively updated based on the reference signal and the first audio signal picked up by the microphone to obtain the first echo path vector. The second filtering process may be iteratively updated based on the reference signal and the first audio signal picked up by the microphone to obtain a second echo path vector.
In some embodiments, the first filtering process may be a kalman filter and the second filtering process may be an NLMS filter.
It will be appreciated that the echo path vector represents the echo path between the microphone to the loudspeaker. In the embodiment of the present disclosure, since the update rates of the two filtering processes are different, for example, in one example, the filtering update rate of the first filtering process is greater than the filtering update rate of the second filtering process. Therefore, at the initial moment when the echo path changes, the first filtering process can quickly track that the echo path changes, and the first echo path vector obtained after the first filtering process can represent the transient echo path at the current moment. And the second filtering process is slower in updating rate, so that the filter is easier to converge, and a second echo vector obtained after the second filtering process can represent a steady-state echo path at the current moment.
In a real scene, when the echo path has a large abrupt change, the first echo path vector and the second echo path vector should generate a large difference; and when the echo path is unchanged or changed little, the first echo path vector and the second echo path vector should not make a significant difference. Based on this principle, the embodiments of the present disclosure can determine whether an echo path change currently occurs. The present disclosure will be specifically explained in the following embodiments, and will not be described in detail here.
For the specific calculation process of the first echo path vector and the second echo path vector, the following embodiments of the present disclosure are explained, and will not be described in detail here.
And S130, determining that the echo path is detected to be changed in response to the fact that the correlation between the first echo path vector and the second echo path vector is not larger than a preset threshold value.
Specifically, based on the foregoing, the first echo path vector and the second echo path vector respectively represent a transient echo path and a steady echo path, and when a sudden change occurs in a real echo path, the two echo paths should have a larger difference, and vice versa. Therefore, in the embodiment of the present disclosure, a preset threshold may be preset, and a relationship between the correlation and the preset threshold may be determined according to the correlation between the first echo path vector and the second echo path vector, so as to determine whether the echo path changes.
The preset threshold refers to a preset threshold representing the change of the echo path, and when the correlation between the first echo path vector and the second echo path vector is greater than the preset threshold, the correlation between the first echo path vector and the second echo path vector is higher, that is, the transient echo path and the steady echo path are closer, so that the echo path can be determined to be unchanged. And when the correlation between the first echo path vector and the second echo path vector is not greater than the preset threshold, the correlation between the first echo path vector and the second echo path vector is low, that is, the difference between the transient echo path and the steady echo path is large, so that the echo path can be determined to be changed.
In some embodiments, after determining that the echo path changes, the filtering processing parameters may be initialized, so as to avoid filter divergence and improve the echo cancellation effect. The present disclosure is described in detail below, and will not be described in detail here.
The first echo path vector and the second echo path vector in the embodiments of the present disclosure represent a transient impulse response and a steady impulse response at a current time. When the echo path changes, the echo impulse response at the current moment and the echo impulse response at the moment before the change have obviously different shapes. For example, the shape change of the echo impulse response before and after the echo path change is shown in fig. 2, and it can be seen that the shape of the echo impulse response before and after the echo path change also changes greatly. Therefore, the change of the echo path can be accurately detected by using the difference between the transient impulse response and the steady impulse response.
In addition, it should be noted that when the echo path changes significantly, the energy difference between the output signals of the two filters with different update rates is also significant, but considering that the speech signal is highly unstable and has a large dynamic range of energy, if the echo path changes based on the energy difference between the output signals of the two filters, a proper and general energy threshold cannot be designed, and thus the robustness is poor. In the embodiment of the disclosure, the echo path change is judged based on the correlation of the echo path vectors of the two filters with different update rates, and the correlation of the echo path vectors does not depend on the energy value of a specific signal, so that the setting of the preset threshold value is independent of the specific signal, and the method of the disclosure has stronger universality and robustness.
As can be seen from the above, in the embodiment of the present disclosure, based on the two filtering processes with different update rates, the echo path change is determined according to the correlation between the obtained first echo path vector and the obtained second echo path vector, the detection result is more accurate, the echo cancellation effect is improved, and the method of the embodiment of the present disclosure has stronger universality and robustness.
A voice communication system in some embodiments of the present disclosure is shown in fig. 3. As shown in fig. 3, the voice communication system includes a microphone 100 and a speaker 200. When the speaker 200 plays the reference signal x (n), the microphone 100 receives the echo signal y (n). Meanwhile, for a double-talk scene, the first audio signal d (n) picked up by the microphone also includes a near-end audio signal s (n). That is, the first audio signal d (n) ═ y (n) + s (n), where s (n) includes a near-end speech signal and a background noise signal generated by the near-end speaker speaking.
In the system according to some embodiments of the present disclosure, two adaptive filters, namely a first filter h1 and a second filter h2, are included. In some embodiments, the first filter h1 is a kalman filter, which is configured to perform a first filtering process on the first audio signal d (n) to obtain a first echo path vector; the second filter h2 is an NLMS filter, and is configured to perform a second filtering process on the first audio signal d (n) to obtain a second echo path vector.
Kalman filters are widely used in many practical applications, and because kalman filters are robust and fast to large interfering signals, in the embodiments of the present disclosure, the kalman filters can be used to quickly track changes in the echo path. That is, in the disclosed embodiment, the update rate of the first filter h1 is greater than the second filter h 2.
The procedure of the first filtering process in the audio signal processing method of the present disclosure is shown in fig. 4, and is specifically described below with reference to fig. 4.
As shown in fig. 4, in some embodiments, an audio signal processing method of an example of the present disclosure includes:
and S410, determining a first residual signal at the current moment according to the reference signal and the echo path vector at the previous moment.
And S420, updating the echo path vector at the previous moment according to the first residual error signal and the Kalman gain vector at the current moment to obtain the first echo path vector at the current moment.
Specifically, in some embodiments of the present disclosure, a time domain kalman filter in a time sample iterative update mode is used, considering that the first filter h1 needs to quickly detect a sudden change of the echo path.
The observation equation for the first filter h1 is expressed as:
d(n)=y(n)+s(n)
wherein d (n) represents the first audio signal, y (n) represents the echo signal, and s (n) represents the near-end audio signal.
First, a first residual signal e1(n) at the current time may be determined according to the reference signal x (n) at the current time and the echo path vector at the previous time, and is represented as:
Figure BDA0003142493190000101
Figure BDA0003142493190000102
wherein the content of the first and second substances,
Figure BDA0003142493190000103
representing the echo path vector estimated by the first filter h1 at the previous instant,
Figure BDA0003142493190000104
representing the echo error signal, e1(n) represents the first residual signal at the current time instant.
Next, after obtaining the first residual signal e1(n), the echo path vector of the first filter h1 may be updated. Specifically, for the kalman filter, a kalman gain k (n) is first calculated, which is expressed as:
Figure BDA0003142493190000105
Figure BDA0003142493190000106
Figure BDA0003142493190000107
Rμ(n)=[IL-k(n)xT(n)]Rm(n)
wherein the content of the first and second substances,
Figure BDA0003142493190000108
is the a priori error signal variance and,
Figure BDA0003142493190000109
is the variance of the noise, Rm(n) is the correlation matrix of the a priori misadjustment errors, k (n) is the Kalman gain vector, Rμ(n) is a correlation matrix of a priori error vectors, ILRepresenting an identity matrix.
Then, based on the first residual signal e1(n) and the kalman gain vector, a first echo path vector at the current time is obtained, which is expressed as:
Figure BDA00031424931900001010
wherein the content of the first and second substances,
Figure BDA00031424931900001011
a first echo path vector representing the updated current time instant.
Two parameters need to be estimated in the Kalman filter, the first parameter being
Figure BDA00031424931900001012
Represents the state vector h1 uncertainty, which can be represented by computing the norm between two iterations:
Figure BDA00031424931900001013
the second parameter being the noise energy
Figure BDA00031424931900001014
It can be assumed that the first filter h1 has converged to a certain extent and is thus obtained by calculating the energy difference between the desired signal and the echo estimate.
Figure BDA00031424931900001015
Figure BDA0003142493190000111
Figure BDA0003142493190000112
Where β is a forgetting factor, 0< β < 1.
Through the process, the first echo path vector of the current moment can be obtained
Figure BDA0003142493190000113
Which represents the transient echo path at the current time.
The process of the second filtering process in the audio signal processing method of the present disclosure is shown in fig. 5, and is specifically described below with reference to fig. 5.
As shown in fig. 5, in some embodiments, an audio signal processing method of an example of the present disclosure includes:
and S510, determining an error signal at the current moment according to the reference signal and the echo path vector at the previous moment.
And S520, obtaining a second residual signal at the current moment according to the first audio signal and the error signal.
S530, updating the echo path vector at the previous moment according to the second residual signal at the current moment and a preset self-adaptive step size parameter to obtain the second echo path vector at the current moment.
Specifically, in some embodiments, the second filter h2 may be an NLMS adaptive filter. The second filter h2 first determines the error signal at the current time according to the reference signal x (n) and the echo path vector at the previous time
Figure BDA0003142493190000114
Expressed as:
Figure BDA0003142493190000115
wherein the content of the first and second substances,
Figure BDA0003142493190000116
an error signal indicative of the current time of day,
Figure BDA0003142493190000117
representing the echo path vector at the previous time instant. Then, the first audio signal can be usedThe number d (n) and the error signal, and a second residual signal e2(n) at the current time is calculated and expressed as:
Figure BDA0003142493190000118
then, the echo path vector at the previous time is updated according to the second residual signal, which is represented as:
Figure BDA0003142493190000119
wherein the content of the first and second substances,
Figure BDA0003142493190000121
and a second echo path vector, μ, representing the updated current time instant, is a preset adaptation step size parameter of the second filter h 2. In some embodiments, the second filter h2 is considered for estimating the stationary echo path, so μ can take a relatively small positive number.
Through the process, the second echo path vector of the current moment can be obtained
Figure BDA0003142493190000122
After the first echo path vector and the second echo path vector are obtained through the embodiments of fig. 4 and fig. 5, the correlation between the first echo path vector and the second echo path vector may be determined, so as to determine whether the echo path changes. The following describes the process of determining the echo path change with reference to fig. 6.
As shown in fig. 6, in some embodiments, an audio signal processing method of an example of the present disclosure includes:
s610, determining a correlation coefficient of the first echo path vector and the second echo path vector according to the first echo path vector and the second echo path vector.
And S620, determining that the echo path is detected to be changed in response to the fact that the correlation coefficient is not larger than a preset correlation threshold value.
Referring to fig. 2, the echo path detecting module 300 may calculate a correlation coefficient between a first echo path vector processed by the first filter h1 and a second echo path vector processed by the second filter h 2.
In particular, in some embodiments, the first echo path vector may be calculated based on a method of Pearson correlation analysis
Figure BDA0003142493190000123
And a second echo path vector
Figure BDA0003142493190000124
The Pearson correlation coefficient of (a) represents the correlation between the two, expressed as:
Figure BDA0003142493190000125
where ρ represents a correlation coefficient of the first echo path vector and the second echo path vector, and L represents a filter length.
The value of the correlation coefficient p is between-1 and +1, and the properties are as follows:
1) when | ρ | ═ 1, it means that the first echo path vector and the second echo path vector are completely linearly related, that is, they are completely the same.
2) When ρ is 0, it represents that the first echo path vector and the second echo path vector are not related wirelessly, i.e., they are not related at all.
3) When 0< | ρ | <1, it means that there is a certain degree of linear correlation between the first echo path vector and the second echo path vector. The closer | ρ | is to 1, the closer the linear relationship between the | ρ | and the | ρ | is; the closer | ρ | is to 0, the weaker the linear correlation between the two is.
Based on the above properties, a suitable preset correlation threshold value may be set between 0 and 1, where the preset correlation threshold value indicates that the first echo path vector and the second echo path vector have a linear correlation threshold value. When the correlation coefficient | ρ | is greater than the preset correlation threshold, it indicates that the first echo path vector and the second echo path vector are linearly correlated, thereby determining that the echo path is unchanged. And when the correlation coefficient | ρ | is not greater than the preset correlation threshold, it indicates that the first echo path vector and the second echo path vector are wirelessly correlated, thereby determining that the echo path is detected to be changed.
It is understood that the preset correlation threshold may be obtained according to a priori knowledge or a limited number of experiments, and those skilled in the art may set the preset correlation threshold according to specific requirements of a scene, which is not limited by the present disclosure.
As can be seen from the above, in the embodiment of the present disclosure, based on the two filtering processes with different update rates, the echo path change is determined according to the correlation between the obtained first echo path vector and the obtained second echo path vector, the detection result is more accurate, the echo cancellation effect is improved, and the method of the embodiment of the present disclosure has stronger universality and robustness. In addition, compared with the method for detecting the echo path change by utilizing the correlation between the residual signal after the adaptive filtering and the reference signal, the method disclosed by the invention avoids the problem of the echo path change false detection caused by the increase of the correlation due to the residual component in the residual signal, and improves the detection accuracy.
Under the condition that the echo path changes, if the current filter parameters are continuously adopted for iterative updating, the filter diverges, and the changed echo path cannot be accurately estimated. Therefore, in an embodiment of the present disclosure, after determining that the echo path is detected to have changed, the audio signal processing method further includes:
parameters of the first filtering process and the second filtering process are initialized.
Specifically, after detecting that the echo path changes, as shown in fig. 2, the parameters of the first filter h1 and the second filter h2 may be initialized, so that the first filter h1 and the second filter h2 restart iterative convergence based on the initialized parameters, thereby avoiding the problem of filter divergence or long-time incorrect operation caused by the echo path change, and improving the echo cancellation effect in a complex scene.
In some embodiments, as shown in fig. 2, the voice communication system further includes a residual echo suppression module 400, and the residual echo suppression module 400 may suppress the residual echo in the first audio signal after the echo is removed, so as to obtain a cleaner near-end audio signal. In one example, the residual echo suppression module 400 may employ a RES module.
The processes and principles of the residual echo suppression module 400 can be understood and fully implemented by those skilled in the art based on the relevant art, and the present disclosure is not limited thereto.
Therefore, in the embodiment of the present disclosure, two filters with different update rates are used to detect the transient echo path and the steady echo path, respectively, and determine whether the echo path changes based on the correlation coefficients of the transient echo path and the steady echo path, so as to effectively detect the change of the echo path, and the detection is more versatile and robust, thereby improving the echo cancellation effect. Compared with the method for detecting the echo path change by utilizing the correlation between the residual signal after the adaptive filtering and the reference signal, the method disclosed by the invention avoids the problem of the echo path change false detection caused by the increase of the correlation due to the residual component in the residual signal, and improves the detection accuracy.
In a second aspect, the embodiments of the present disclosure provide an audio signal processing apparatus, which may be applied to an electronic device with a voice communication system, such as a mobile phone, a tablet computer, a notebook computer, and the like, and the disclosure is not limited thereto.
As shown in fig. 7, in some embodiments, an audio signal processing apparatus of an example of the present disclosure includes:
a first filtering module 701 configured to perform a first filtering process based on a reference signal and a first audio signal picked up by a microphone, so as to obtain a first echo path vector; wherein the first audio signal comprises an echo signal generated by a loudspeaker playing a reference signal;
a second filtering module 702 configured to perform a second filtering process based on the reference signal and the first audio signal to obtain a second echo path vector; the filter update rate of the first filter processing is different from the filter update rate of the second filter processing;
an echo path determination module 703 configured to determine that the echo path is detected to be changed in response to a correlation of the first echo path vector and the second echo path vector not being greater than a preset threshold.
As can be seen from the above, in the embodiment of the present disclosure, based on the two filtering processes with different update rates, the echo path change is determined according to the correlation between the obtained first echo path vector and the obtained second echo path vector, the detection result is more accurate, the echo cancellation effect is improved, and the method of the embodiment of the present disclosure has stronger universality and robustness.
In some embodiments, the first filtering process is kalman filtering and the second filtering process is NLMS filtering.
In some embodiments, the first filtering module 701 is specifically configured to:
determining a first residual signal at the current moment according to the reference signal and the echo path vector at the previous moment;
and updating the echo path vector at the previous moment according to the first residual error signal and the Kalman gain vector at the current moment to obtain the first echo path vector at the current moment.
In some embodiments, the second filtering module 702 is specifically configured to:
determining an error signal at the current moment according to the reference signal and the echo path vector at the previous moment;
obtaining a second residual signal at the current moment according to the first audio signal and the error signal;
and updating the echo path vector at the previous moment according to the second residual signal at the current moment and a preset self-adaptive step size parameter to obtain the second echo path vector at the current moment.
In some embodiments, the echo path determination module 703 is specifically configured to:
determining a correlation coefficient of the first echo path vector and the second echo path vector according to the first echo path vector and the second echo path vector;
and determining that the echo path is detected to be changed in response to the correlation coefficient not being greater than the preset correlation threshold.
In some embodiments, the audio signal processing apparatus of the present disclosure further includes:
an initialization module configured to initialize parameters of the first filtering process and the second filtering process.
As can be seen from the above, in the embodiment of the present disclosure, based on the two filtering processes with different update rates, the echo path change is determined according to the correlation between the obtained first echo path vector and the obtained second echo path vector, the detection result is more accurate, the echo cancellation effect is improved, and the method of the embodiment of the present disclosure has stronger universality and robustness. In addition, compared with the method for detecting the echo path change by utilizing the correlation between the residual signal after the adaptive filtering and the reference signal, the method disclosed by the invention avoids the problem of the echo path change false detection caused by the increase of the correlation due to the residual component in the residual signal, and improves the detection accuracy.
In a third aspect, the disclosed embodiments provide an electronic device, including:
a processor; and
a memory storing computer instructions for causing the processor to perform the method according to any of the embodiments of the first aspect.
In a fourth aspect, the disclosed embodiments provide a storage medium storing computer instructions for causing a computer to perform the method according to any one of the embodiments of the first aspect.
Fig. 8 is a block diagram of an electronic device according to some embodiments of the present disclosure, and the following describes principles related to the electronic device and a storage medium according to some embodiments of the present disclosure with reference to fig. 8.
Referring to fig. 8, the electronic device 1800 may include one or more of the following components: processing component 1802, memory 1804, power component 1806, multimedia component 1808, audio component 1810, input/output (I/O) interface 1812, sensor component 1816, and communications component 1818.
The processing component 1802 generally controls the overall operation of the electronic device 1800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1802 may include one or more processors 1820 to execute instructions. Further, the processing component 1802 may include one or more modules that facilitate interaction between the processing component 1802 and other components. For example, the processing component 1802 can include a multimedia module to facilitate interaction between the multimedia component 1808 and the processing component 1802. As another example, the processing component 1802 can read executable instructions from a memory to implement electronic device related functions.
The memory 1804 is configured to store various types of data to support operation at the electronic device 1800. Examples of such data include instructions for any application or method operating on the electronic device 1800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 1806 provides power to various components of the electronic device 1800. The power components 1806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 1800.
The multimedia component 1808 includes a display screen that provides an output interface between the electronic device 1800 and a user. In some embodiments, the multimedia component 1808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera can receive external multimedia data when the electronic device 1800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
Audio component 1810 is configured to output and/or input audio signals. For example, the audio component 1810 can include a Microphone (MIC) that can be configured to receive external audio signals when the electronic device 1800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1804 or transmitted via the communication component 1818. In some embodiments, audio component 1810 also includes a speaker for outputting audio signals.
I/O interface 1812 provides an interface between processing component 1802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 1816 includes one or more sensors to provide status evaluations of various aspects for the electronic device 1800. For example, the sensor component 1816 can detect an open/closed state of the electronic device 1800, the relative positioning of components such as a display and keypad of the electronic device 1800, the sensor component 1816 can also detect a change in position of the electronic device 1800 or a component of the electronic device 1800, the presence or absence of user contact with the electronic device 1800, orientation or acceleration/deceleration of the electronic device 1800, and a change in temperature of the electronic device 1800. Sensor assembly 1816 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1816 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1816 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 1818 is configured to facilitate communications between the electronic device 1800 and other devices in a wired or wireless manner. The electronic device 1800 may access a wireless network based on a communication standard, such as Wi-Fi, 2G, 3G, 4G, 5G, or 6G, or a combination thereof. In an exemplary embodiment, the communication component 1818 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1818 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 1800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components.
It should be understood that the above embodiments are only examples for clearly illustrating the present invention, and are not intended to limit the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the present disclosure may be made without departing from the scope of the present disclosure.

Claims (15)

1. An audio signal processing method, comprising:
performing first filtering processing on the basis of a reference signal and a first audio signal picked up by a microphone to obtain a first echo path vector; wherein the first audio signal comprises an echo signal resulting from the playing of the reference signal by a speaker;
performing second filtering processing on the basis of the reference signal and the first audio signal to obtain a second echo path vector; a filter update rate of the first filter process is different from a filter update rate of the second filter process;
determining that a change in echo path is detected in response to the correlation of the first echo path vector and the second echo path vector not being greater than a preset threshold.
2. The method of claim 1,
the first filtering process is Kalman filtering, and the second filtering process is NLMS filtering.
3. The method of claim 2,
the Kalman filtering is time domain Kalman filtering.
4. The method according to claim 2 or 3, wherein the performing a first filtering process based on the reference signal and a first audio signal picked up by a microphone to obtain a first echo path vector comprises:
determining a first residual signal at the current moment according to the reference signal and the echo path vector at the previous moment;
and updating the echo path vector at the previous moment according to the first residual error signal and the Kalman gain vector at the current moment to obtain the first echo path vector at the current moment.
5. The method of claim 2 or 3, wherein performing the second filtering process based on the reference signal and the first audio signal to obtain a second echo path vector comprises:
determining an error signal at the current moment according to the reference signal and the echo path vector at the previous moment;
obtaining a second residual signal at the current moment according to the first audio signal and the error signal;
and updating the echo path vector at the previous moment according to the second residual signal at the current moment and a preset self-adaptive step size parameter to obtain the second echo path vector at the current moment.
6. The method of claim 1, wherein the determining that the echo path is detected to have changed in response to the correlation between the first echo path vector and the second echo path vector not being less than a predetermined threshold value comprises:
determining a correlation coefficient of the first echo path vector and the second echo path vector according to the first echo path vector and the second echo path vector;
and determining that the echo path is detected to be changed in response to the correlation coefficient not being greater than a preset correlation threshold.
7. The method of claim 1, wherein after the determining detects a change in the echo path, the method further comprises:
initializing parameters of the first filtering process and the second filtering process.
8. An audio signal processing apparatus, comprising:
a first filtering module configured to perform a first filtering process based on a reference signal and a first audio signal picked up by a microphone, so as to obtain a first echo path vector; wherein the first audio signal comprises an echo signal resulting from the playing of the reference signal by a speaker;
a second filtering module configured to perform second filtering processing based on the reference signal and the first audio signal to obtain a second echo path vector; a filter update rate of the first filter process is different from a filter update rate of the second filter process;
an echo path determination module configured to determine that an echo path change is detected in response to a correlation of the first echo path vector and the second echo path vector not being greater than a preset threshold.
9. The apparatus of claim 8,
the first filtering process is Kalman filtering, and the second filtering process is NLMS filtering.
10. The apparatus of claim 9, wherein the first filtering module is specifically configured to:
determining a first residual signal at the current moment according to the reference signal and the echo path vector at the previous moment;
and updating the echo path vector at the previous moment according to the first residual error signal and the Kalman gain vector at the current moment to obtain the first echo path vector at the current moment.
11. The apparatus of claim 9, wherein the second filtering module is specifically configured to:
determining an error signal at the current moment according to the reference signal and the echo path vector at the previous moment;
obtaining a second residual signal at the current moment according to the first audio signal and the error signal;
and updating the echo path vector at the previous moment according to the second residual signal at the current moment and a preset self-adaptive step size parameter to obtain the second echo path vector at the current moment.
12. The apparatus of claim 8, wherein the echo path determination module is specifically configured to:
determining a correlation coefficient of the first echo path vector and the second echo path vector according to the first echo path vector and the second echo path vector;
and determining that the echo path is detected to be changed in response to the correlation coefficient not being greater than a preset correlation threshold.
13. The apparatus of claim 8, further comprising:
an initialization module configured to initialize parameters of the first filtering process and the second filtering process.
14. An electronic device, comprising:
a speaker and a microphone;
a processor; and
memory storing computer instructions for causing a processor to perform the method according to any one of claims 1 to 7.
15. A storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1 to 7.
CN202110739121.8A 2021-06-30 2021-06-30 Audio signal processing method and device Active CN113362842B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110739121.8A CN113362842B (en) 2021-06-30 2021-06-30 Audio signal processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110739121.8A CN113362842B (en) 2021-06-30 2021-06-30 Audio signal processing method and device

Publications (2)

Publication Number Publication Date
CN113362842A true CN113362842A (en) 2021-09-07
CN113362842B CN113362842B (en) 2022-11-11

Family

ID=77537528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110739121.8A Active CN113362842B (en) 2021-06-30 2021-06-30 Audio signal processing method and device

Country Status (1)

Country Link
CN (1) CN113362842B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6226380B1 (en) * 1998-02-19 2001-05-01 Nortel Networks Limited Method of distinguishing between echo path change and double talk conditions in an echo canceller
CN1937432A (en) * 2006-09-30 2007-03-28 南京大学 Sound echo cancellation processing method based on optimized parameter predication
JP2009033549A (en) * 2007-07-27 2009-02-12 Toshiba Corp Speech processor and echo removing method
CN103179296A (en) * 2011-12-26 2013-06-26 中兴通讯股份有限公司 Echo canceller and echo cancellation method
US20150181017A1 (en) * 2013-12-23 2015-06-25 Imagination Technologies Limited Echo Path Change Detector
US9602922B1 (en) * 2013-06-27 2017-03-21 Amazon Technologies, Inc. Adaptive echo cancellation
CN109379501A (en) * 2018-12-17 2019-02-22 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN111755020A (en) * 2020-08-07 2020-10-09 南京时保联信息科技有限公司 Stereo echo cancellation method
CN112689056A (en) * 2021-03-12 2021-04-20 浙江芯昇电子技术有限公司 Echo cancellation method and echo cancellation device using same

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6226380B1 (en) * 1998-02-19 2001-05-01 Nortel Networks Limited Method of distinguishing between echo path change and double talk conditions in an echo canceller
CN1937432A (en) * 2006-09-30 2007-03-28 南京大学 Sound echo cancellation processing method based on optimized parameter predication
JP2009033549A (en) * 2007-07-27 2009-02-12 Toshiba Corp Speech processor and echo removing method
CN103179296A (en) * 2011-12-26 2013-06-26 中兴通讯股份有限公司 Echo canceller and echo cancellation method
US9602922B1 (en) * 2013-06-27 2017-03-21 Amazon Technologies, Inc. Adaptive echo cancellation
US20150181017A1 (en) * 2013-12-23 2015-06-25 Imagination Technologies Limited Echo Path Change Detector
CN109379501A (en) * 2018-12-17 2019-02-22 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN111755020A (en) * 2020-08-07 2020-10-09 南京时保联信息科技有限公司 Stereo echo cancellation method
CN112689056A (en) * 2021-03-12 2021-04-20 浙江芯昇电子技术有限公司 Echo cancellation method and echo cancellation device using same

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王杰等: "具有双方对讲保护的自适应回波消除新算法", 《控制理论与应用》 *
袁红星等: "一种低延时双端发音检测方法", 《计算机工程与应用》 *

Also Published As

Publication number Publication date
CN113362842B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
US11587574B2 (en) Voice processing method, apparatus, electronic device, and storage medium
CN109361828B (en) Echo cancellation method and device, electronic equipment and storage medium
EP2783504B1 (en) Acoustic echo cancellation based on ultrasound motion detection
EP3779968A1 (en) Audio processing
CN113362843B (en) Audio signal processing method and device
CN106791245B (en) Method and device for determining filter coefficients
CN109256145B (en) Terminal-based audio processing method and device, terminal and readable storage medium
CN110970015B (en) Voice processing method and device and electronic equipment
CN111009239A (en) Echo cancellation method, echo cancellation device and electronic equipment
CN113362842B (en) Audio signal processing method and device
CN112447184A (en) Voice signal processing method and device, electronic equipment and storage medium
CN112489653A (en) Speech recognition method, device and storage medium
WO2022198820A1 (en) Speech processing method and apparatus, and apparatus for speech processing
CN112217948B (en) Echo processing method, device, equipment and storage medium for voice call
CN111694539B (en) Method, device and medium for switching between earphone and loudspeaker
CN111294473B (en) Signal processing method and device
CN111292760B (en) Sounding state detection method and user equipment
CN111629104B (en) Distance determination method, distance determination device, and computer storage medium
CN113489855A (en) Sound processing method, sound processing device, electronic equipment and storage medium
CN113345456B (en) Echo separation method, device and storage medium
CN113470675B (en) Audio signal processing method and device
WO2020191512A1 (en) Echo cancellation apparatus, echo cancellation method, signal processing chip and electronic device
CN111986693B (en) Audio signal processing method and device, terminal equipment and storage medium
CN113470676B (en) Sound processing method, device, electronic equipment and storage medium
CN115883736A (en) Echo cancellation method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant