WO2018086444A1 - 噪声抑制信噪比估计方法和用户终端 - Google Patents

噪声抑制信噪比估计方法和用户终端 Download PDF

Info

Publication number
WO2018086444A1
WO2018086444A1 PCT/CN2017/106502 CN2017106502W WO2018086444A1 WO 2018086444 A1 WO2018086444 A1 WO 2018086444A1 CN 2017106502 W CN2017106502 W CN 2017106502W WO 2018086444 A1 WO2018086444 A1 WO 2018086444A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise ratio
estimated
audio frame
current audio
signal
Prior art date
Application number
PCT/CN2017/106502
Other languages
English (en)
French (fr)
Inventor
谢单辉
Original Assignee
电信科学技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 电信科学技术研究院 filed Critical 电信科学技术研究院
Publication of WO2018086444A1 publication Critical patent/WO2018086444A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise

Definitions

  • the present disclosure relates to the field of voice technologies, and in particular, to a noise suppression signal to noise ratio estimation method and a user terminal.
  • a single microphone noise reduction method is generally used in a user terminal to perform noise reduction on an audio signal.
  • the method mainly includes the following steps:
  • the noisy speech is used to decompose the frequency domain signal Y in the frequency domain;
  • FFT fast Fourier Transformation
  • the noise-reduced frequency domain signal is transformed into a time domain signal by Inverse Fast Fourier Transform (IFFT).
  • IFFT Inverse Fast Fourier Transform
  • the a priori signal-to-noise ratio is estimated using a direct decision method, that is, estimated by the following formula:
  • An estimate of the a priori signal-to-noise ratio of the current frame, ⁇ usually needs to take a smoothing number close to 1, specifically 0.95 to 1.
  • the estimated value of the posterior SNR is heavily biased towards the noise reduction processing result of the previous frame. and Can be seen as the previous frame of speech variance Instantaneous value. Therefore, the a priori estimated signal-to-noise ratio ⁇ estimated by the above formula is not an estimate of the signal-to-noise ratio ⁇ (m) of the current frame, and can be regarded as estimating the a priori signal-to-noise ratio ⁇ (m-1) of the previous frame. It can be seen that it is currently estimated that the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is not conducive to the problem of noise suppression of the current audio frame.
  • the purpose of the present disclosure is to provide a noise suppression signal to noise ratio estimation method and a user terminal, which solves the problem that estimating the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is disadvantageous to the noise of the current audio frame.
  • the problem of suppression is to provide a noise suppression signal to noise ratio estimation method and a user terminal, which solves the problem that estimating the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is disadvantageous to the noise of the current audio frame.
  • an embodiment of the present disclosure provides a method for estimating a priori signal to noise ratio, including:
  • MMSE minimum mean square error
  • a final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
  • the estimating an a priori signal to noise ratio of the current audio frame includes:
  • the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame including:
  • the estimated a priori SNR of the current audio frame is estimated by the following formula:
  • the estimated a priori SNR of the current audio frame is estimated by the following formula:
  • the method further includes:
  • a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
  • the step of estimating an estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice further comprising:
  • the estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
  • Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames
  • the calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame including:
  • An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
  • the calculating a voice existence probability of the current audio frame includes:
  • Y) represents the probability of existence of the speech
  • p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
  • exp() is an exponential function
  • ⁇ min and ⁇ max are two empirical values
  • ⁇ min ⁇ max , p max and p min are two empirical values
  • the estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value including:
  • the final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
  • the embodiment of the present disclosure further provides a user terminal, including:
  • a first estimating module configured to estimate an estimated a priori signal to noise ratio of the current audio frame
  • a first calculating module configured to calculate an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
  • a second calculating module configured to calculate a voice existence probability of the current audio frame
  • a second estimating module configured to estimate a final a priori signal to noise ratio of the current audio frame in combination with the voice presence probability and the estimated value.
  • the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame.
  • the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
  • the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
  • the user terminal further includes:
  • An adjustment module for adjusting a smoothing number required to estimate the estimated a priori signal to noise ratio by the following formula:
  • a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
  • the first estimation module is further configured to further estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
  • the first calculating module is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame by using:
  • An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
  • the second calculating module is configured to calculate a voice existence probability of the current audio frame by using the following formula:
  • Y) represents the probability of existence of the speech
  • p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
  • exp() is an exponential function
  • ⁇ min and ⁇ max are two empirical values
  • ⁇ min ⁇ max , p max and p min are two empirical values
  • the second estimation module is configured to estimate a final a priori signal to noise ratio of the current audio frame by using the following formula:
  • the embodiment of the present disclosure further provides a user terminal, including: a processor, a memory, and a transceiver, where:
  • the processor is configured to read a program in the memory and perform the following process:
  • the transceiver is configured to receive and transmit data
  • the memory is capable of storing data used by the processor when performing operations.
  • the final a priori signal-to-noise ratio estimated by combining the estimated probability of the voice of the current frame with the estimated a priori SNR of the current audio frame, compared to the prior art according to the previous frame. Detecting the signal to noise ratio for estimation, the a priori signal to noise ratio that can be estimated by the embodiments of the present disclosure is more correlated with the current audio frame, thereby facilitating noise suppression of the current audio frame.
  • FIG. 1 is a schematic flowchart diagram of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of another noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of another experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of another experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of a user terminal according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of another user terminal according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of another user terminal according to an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a noise suppression signal to noise ratio estimation method, as shown in FIG. 1 , including the following steps:
  • the current audio frame may be a current frame collected by a microphone of the user terminal, and the current frame may be a voice frame or a noise frame.
  • the above-mentioned estimated a priori signal-to-noise ratio may be an a priori signal-to-noise ratio estimated by a direct decision method or a maximum likelihood method.
  • the above estimated MMSE estimate for estimating the a priori SNR may be an estimate of the MMSE using the MMSE algorithm to obtain the above-described estimated prior SNR.
  • the voice existence probability of the current audio frame may be calculated according to the posterior signal to noise ratio of the current audio frame, or may be averaged or smoothed by combining the posterior signal to noise ratio of the same frequency point of the previous frames. The value of the calculation calculates the probability of speech presence of the current audio frame.
  • step 103 may be performed first, then step 101 may be performed, or step 101 may be performed first. Then step 103 is performed.
  • the final a priori signal to noise ratio of the current audio frame may be understood as a priori signal to noise ratio for gain calculation in the process of performing noise reduction on the audio frame, or may also be understood as being directed to the embodiments of the present disclosure.
  • the a priori signal-to-noise ratio of the current audio frame output may be understood as a priori signal to noise ratio for gain calculation in the process of performing noise reduction on the audio frame, or may also be understood as being directed to the embodiments of the present disclosure.
  • Estimating the final a priori signal to noise ratio of the current audio frame according to the voice existence probability and the estimated value may be: determining a probability that the current audio frame is a voice frame according to the voice existence probability, and determining that the current audio frame is pure noise Frame, then set the final a priori SNR to a stable minimum, such as ⁇ min , to ensure smooth processing of pure noise segments and reduce music noise; and when determining that the current audio frame is an audio frame in a speech segment Then, the final a priori SNR is calculated to be biased toward the estimated minimum azimuth error of the a priori SNR, so that the final a priori SNR estimation is more accurate.
  • the final a priori SNR of the estimated value of the minimum mean square error of the current frame and the estimated a priori SNR of the current audio frame can be realized, the estimated a priori SNR and the current
  • the correlation of audio frames is higher, which is beneficial to the noise suppression of the current audio frame to improve the noise suppression effect.
  • the estimating an a priori signal to noise ratio of the current audio frame includes:
  • the posterior signal to noise ratio of the current audio frame is common knowledge and will not be described in detail herein.
  • the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame may be based on the a posteriori signal to noise ratio estimation value of the current audio frame, using a direct decision method to estimate the current
  • the estimated a priori signal to noise ratio of the audio frame is of course not limited by the embodiments of the present disclosure.
  • the estimating the a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame including:
  • the estimated a priori SNR of the current audio frame is estimated by the following formula:
  • the estimated a priori signal to noise ratio can be estimated by any one of the above two formulas. According to experiments Corresponding formulas are better for calculating the above-mentioned estimated a priori signal-to-noise ratio. In this method, mainly the musical tone is less, so in the embodiment of the present disclosure, optionally, The corresponding formula calculates the above-mentioned estimated prior signal-to-noise ratio.
  • the smoothing number may be a value set in advance, for example, a value of 0.95 to 1, or a value of 0.98 or 0.3, which is not limited thereto, and the noise variance is common knowledge, and will not be described in detail.
  • the foregoing method further includes:
  • a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
  • the ⁇ factor needs to be as large as possible in pure noise, so that the estimated value is as stable as possible, and needs to be as small as possible when there is a voice segment, so as to ensure fast tracking of the voice.
  • the above-mentioned a 1 and a 2 may be 0.98 and 0.3, respectively.
  • the embodiment of the present disclosure does not limit this, for example, it may be 0.95 and 0.28, etc., and may be adjusted according to actual conditions.
  • the accuracy of estimating the a priori signal to noise ratio can be improved by the above a 1 and a 2 .
  • the step of estimating the estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice further comprising:
  • the estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
  • Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames
  • the estimated a priori signal to noise ratio may be switched according to the audio presence probability of the current audio frame to improve the accuracy of the estimated a priori signal to noise ratio.
  • calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame including:
  • An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
  • the estimated a priori signal to noise ratio calculated in step 101 is not limited to the above mentioned The estimated a priori signal-to-noise ratio calculated by the formula.
  • a super Gaussian model of speech can also be used to calculate E(X 2
  • the a priori SNR is mainly to estimate the variance of the speech signal. By definition This only depends on the speech signal X. But X is not available, so most of the pairs The estimation algorithm has to be estimated from the noisy signal Y. This can also be seen from the direct decision method. In the second half of the calculation formula of the direct decision method, ⁇ -1 is the variance of the speech. The maximum likelihood estimate for the case where ⁇ is known (ieY known), the first half is the instantaneous value To replace E(X 2 ).
  • conditional expectations are employed. or To estimate the variance of speech Based on this idea, from the definition of conditional expectations It can be seen that the corresponding is actually the MMSE estimation of the speech amplitude spectrum X 2 . Considering the probability p(H 1
  • the above The formula of the representation can pass the complex Gaussian model Super Gaussian model Derived.
  • the estimated value of the minimum mean square error of the estimated prior signal to noise ratio may be directly calculated by using the above formula, without performing the derivation process desired by the above condition, and performing the corresponding steps. That is, the above conditions are expected to be merely explanations of the principles at the time of implementation in the embodiments of the present disclosure.
  • the calculating a voice existence probability of the current audio frame includes:
  • Y) represents the probability of existence of the speech
  • p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
  • exp() is an exponential function
  • ⁇ min and ⁇ max are two empirical values
  • ⁇ min ⁇ max , p max and p min are two empirical values
  • speech and noise are distinguished by the above formula.
  • the probability of existence of speech when the above formula is used to calculate the probability of existence of speech, the probability of existence of the current audio frame can be calculated by combining the a posteriori signal-to-noise ratio of the same frequency points of the previous frames to obtain an average or smoothed value. Additionally, the above formula may be derived directly from the complex Gaussian model provided above.
  • the probability of existence by voice is to provide a probability of existence of a voice, so that the current estimated a priori signal-to-noise ratio can be soft-switched in pure noise and voice segments, thereby accelerating the tracking delay problem existing in the direct decision method.
  • the advantages of the direct decision method can be retained.
  • the foregoing estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value including:
  • the final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
  • the calculation of the above formula is such that the final a priori signal-to-noise ratio pure noise is kept as small as possible at a stable small value, such as ⁇ min , and in the speech segment, the estimated a priori signal-to-noise ratio is biased toward Or understand that the estimated a priori signal-to-noise ratio is biased toward
  • the voice state and the voiceless state can be distinguished, and the optimal a priori signal and noise estimate is derived according to the MMSE criterion in the voice state.
  • the existence and non-existence state of speech are calculated by the probability of existence of speech.
  • the probability is calculated by using the fixed value a priori SNR, which makes the a priori SNR estimation more accurate and can solve the existence of direct judgment. Tracking delay issues.
  • the estimated a priori signal to noise ratio may be used for gain calculation of the noise reduction process of the audio signal, and optionally, gain calculation using a single microphone noise reduction process may be applied.
  • the a posteriori signal-to-noise ratio and the power spectrum of the previous frame processing structure are obtained, and the a priori of the current audio frame is calculated using a direct decision method based on the posterior signal-to-noise ratio and the power spectrum of the previous frame processing structure.
  • Signal-to-noise ratio calculating a voice existence probability of a current audio signal frame based on a posteriori signal-to-noise ratio, calculating an estimated value of the MMSE estimating the a priori signal-to-noise ratio, and estimating the current in combination with the voice existence probability and the estimated value
  • the final a priori signal-to-noise ratio of the audio frame which is used for gain calculation.
  • the effect of the inherent delay of one frame can be eliminated by the above steps, and the initial segment of the speech is attenuated and the tail of the end segment is degraded, thereby improving the noise reduction performance.
  • the following is an explanation of the results through experimental data:
  • the experiment uses the noisy MMSE database, the data sampling rate is 8 kHz, the white noise is generated using Cool Edit (for an audio processing software), and the other noise is the noisyzus database.
  • the frame length is 20ms, the overlap rate is 50%, and the square root Hanning window is used before and after. Take 15dB. ⁇ min takes -20dB, the suppression criterion uses MMSE-STSA (Short-Time Spectral Amplitude) algorithm, and the noise estimation uses unbiased MMSE algorithm.
  • MMSE-STSA Short-Time Spectral Amplitude
  • Figures 3 and 4 show a comparison between the direct decision and the method of the present disclosure when the signal to noise ratio is 0 dB and 5 dB, respectively.
  • the speech in Figure 3 is sp01
  • the noise is white noise
  • the speech in Figure 4 is sp04
  • the noise is car noise.
  • sp01 and sp04 are the speech numbers in the data set.
  • Figure 5 shows the noisysus database of 30 sets of car noise and white noise, and the average segment signal-to-noise ratio is improved at 0/5/10/15 dB. It is easy to see from the figure that the performance of the present disclosure method is superior to the direct decision.
  • any user terminal with a microphone such as a mobile phone, a tablet personal computer, a laptop computer, a personal digital assistant (PDA), and a mobile device.
  • a terminal device such as a Mobile Intemet Device (MID), an in-vehicle device, or a wearable device, it should be noted that the specific type of the user terminal is not limited in the embodiment of the present disclosure.
  • Estimating an estimated a priori signal to noise ratio of the current audio frame calculating an estimated value of the estimated MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
  • the probability of speech presence of the current audio frame; the final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
  • the user terminal 600 includes the following modules:
  • the first estimating module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame
  • the first calculating module 602 is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame;
  • a second calculating module 603, configured to calculate a voice existence probability of the current audio frame
  • the second estimation module 604 is configured to estimate a final a priori signal to noise ratio of the current audio frame in conjunction with the voice presence probability and the estimated value.
  • the first estimating module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame.
  • the first estimation module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
  • the first estimation module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
  • the user terminal 600 further includes:
  • the adjusting module 605 is configured to adjust, by using the following formula, a smoothing number required to estimate the estimated a priori signal to noise ratio:
  • a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
  • the first estimation module 601 is further configured to further estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
  • Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames
  • the first calculating module 602 is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame by using a formula :
  • An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
  • the second calculating module 603 is configured to calculate a voice existence probability of the current audio frame by using the following formula:
  • Y) represents the probability of existence of the speech
  • p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
  • exp() is an exponential function
  • ⁇ min and ⁇ max are two empirical values
  • ⁇ min ⁇ max , p max and p min are two empirical values
  • the second estimation module 604 is configured to estimate a final a priori signal to noise ratio of the current audio frame by using the following formula:
  • the user terminal 600 may be a user terminal corresponding to the voice signal noise reduction method provided by the method embodiment in the embodiment of the present disclosure, and any implementation in the method embodiment in the embodiment of the present disclosure The method can be implemented by the foregoing user terminal 600 in the embodiment, and achieve the same beneficial effects, and details are not described herein again.
  • an embodiment of the present disclosure provides a structure of another user terminal, including: a processor 800, a transceiver 810, a memory 820, a user interface 830, and a bus interface, where:
  • the processor 800 is configured to read a program in the memory 820 and perform the following process:
  • a final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
  • the microphone included in the user interface 830, the transceiver 810, is configured to receive and transmit data under the control of the processor 800.
  • the bus architecture may include any number of interconnected buses and bridges, specifically linked by one or more processors represented by processor 800 and various circuits of memory represented by memory 820.
  • the bus architecture can also link various other circuits such as peripherals, voltage regulators, and power management circuits.
  • the bus interface provides an interface.
  • Transceiver 810 can be a plurality of components, including a transmitter and a receiver, providing means for communicating with various other devices on a transmission medium.
  • the user interface 830 may also be an interface capable of externally connecting the required devices, including but not limited to a keypad, a display, a speaker, a microphone, a joystick, and the like.
  • the processor 800 is responsible for managing the bus architecture and general processing, and the memory 820 can store data used by the processor 800 in performing operations.
  • the estimating an a priori signal to noise ratio of the current audio frame includes:
  • the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame including:
  • the estimated a priori SNR of the current audio frame is estimated by the following formula:
  • the estimated a priori SNR of the current audio frame is estimated by the following formula:
  • processor 800 is further configured to:
  • a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
  • the step of estimating an estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice further comprising:
  • the estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
  • Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames
  • the calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame including:
  • An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
  • the calculating a voice existence probability of the current audio frame includes:
  • Y) represents the probability of existence of the speech
  • p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
  • exp() is an exponential function
  • ⁇ min and ⁇ max are two empirical values
  • ⁇ min ⁇ max , p max and p min are two empirical values
  • the estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value including:
  • the final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
  • the user terminal may be a user terminal corresponding to the voice signal noise reduction method provided by the method embodiment in the embodiment of the present disclosure, and any of the method embodiments in the embodiments of the present disclosure It can be implemented by the above user terminal in this embodiment, and achieve the same beneficial effects, and will not be described again here.
  • the disclosed method and apparatus may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may be physically included separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium.
  • the software functional unit described above is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform part of the steps of the method of transmitting and receiving described in various embodiments of the present disclosure.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), and a random access memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

一种噪声抑制信噪比估计方法和用户终端,该方法可包括:估计当前音频帧的预估先验信噪比(101);根据预估先验信噪比,计算当前音频帧的所述预估先验信噪比对应的MMSE的估计值(102);计算所述当前音频帧的语音存在概率(103);结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比(104)。

Description

噪声抑制信噪比估计方法和用户终端
相关申请的交叉引用
本申请主张在2016年11月10日在中国提交的中国专利申请No.201611039463.4的优先权,其全部内容通过引用包含于此。
技术领域
本公开文本涉及语音技术领域,尤其涉及一种噪声抑制信噪比估计方法和用户终端。
背景技术
目前用户终端中通常采用单麦克风降噪方法对音频信号进行降噪,该方法中主要包括如下步骤:
将带噪语音使用快速傅氏变换(Fast Fourier Transformation,FFT)或者其他变换方法,将带噪语音在频域分解频域信号Y;
估计频域信号Y的噪声方差;
基于上述噪声方差推算先验信噪比和后验信噪比;
根据先验信噪比和后验信噪比计算出适合的增益;
对频域信号Y的每个频域乘以上述增益,以得到降噪后的频域信号;
将降噪后的频域信号通过快速傅氏逆变换(Inverse Fast Fourier Transform,IFFT)变换成时域信号。
然而,上述技术中,先验信噪比是采用直接判决方法估计的,即通过如下公式进行估计的:
Figure PCTCN2017106502-appb-000001
其中,
Figure PCTCN2017106502-appb-000002
表示当前帧的先验信噪比的估计值,α通常需要取接近1的平滑数,具体为0.95~1的值,
Figure PCTCN2017106502-appb-000003
表示前一帧的降噪处理结果,
Figure PCTCN2017106502-appb-000004
表示噪声方差,
Figure PCTCN2017106502-appb-000005
表示当前帧的后验信噪比估计值。
通过上述公式可以看出,后验信噪比的估计值严重偏向于前一帧的降噪 处理结果
Figure PCTCN2017106502-appb-000006
Figure PCTCN2017106502-appb-000007
可以看成是前一帧语音方差
Figure PCTCN2017106502-appb-000008
的瞬时值。所以,通过上述公式最终估计到的先验信噪比ξ并非是估计当前帧的信噪比ξ(m),可以视为估计前一帧的先验信噪比ξ(m-1)。可见,目前估算当前音频帧的先验信噪比存在与当前音频帧的相关性较差,不利于当前音频帧的噪声抑制的问题。
发明内容
本公开文本的目的在于提供一种噪声抑制信噪比估计方法和用户终端,解决了估算当前音频帧的先验信噪比存在与当前音频帧的相关性较差,不利于当前音频帧的噪声抑制的问题。
为了达到上述目的,本公开文本实施例提供一种先验信噪比估计方法,包括:
估计当前音频帧的预估先验信噪比;
根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差(Minimum Mean Square Error,MMSE)的估计值;
计算所述当前音频帧的语音存在概率;
结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。
可选地,所述估计当前音频帧的预估先验信噪比,包括:
基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。
可选地,所述基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比,包括:
通过如下公式估计当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000009
其中,表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000011
表示前一帧的降噪处理结果,
Figure PCTCN2017106502-appb-000012
表示噪声方差,
Figure PCTCN2017106502-appb-000013
表示所述当前音频帧的后验信噪比估计值;
或者,
通过如下公式估计当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000014
其中,
Figure PCTCN2017106502-appb-000015
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000016
为前一帧的先验信噪比,
Figure PCTCN2017106502-appb-000017
表示当前帧的后验信噪比估计值。
可选地,所述方法还包括:
通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:
Figure PCTCN2017106502-appb-000018
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。
可选地,所述基于所述语音存在概率估计值估计当前音频帧的预估先验信噪比的步骤,进一步还包括:
通过如下公式进一步估计所述当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000019
或者
Figure PCTCN2017106502-appb-000020
其中,
Figure PCTCN2017106502-appb-000021
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000022
Figure PCTCN2017106502-appb-000023
分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
可选地,所述根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值,包括:
根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:
Figure PCTCN2017106502-appb-000024
其中,
Figure PCTCN2017106502-appb-000025
表示所述预估先验信噪比对应的最小均方误差的估计值,
Figure PCTCN2017106502-appb-000026
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000027
表示所述当前音频帧的后验信噪比估计值。
可选地,所述计算所述当前音频帧的语音存在概率,包括:
通过如下公式计算所述当前音频帧的语音存在概率:
Figure PCTCN2017106502-appb-000028
Figure PCTCN2017106502-appb-000029
或者
Figure PCTCN2017106502-appb-000030
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
Figure PCTCN2017106502-appb-000031
为一固定值,
Figure PCTCN2017106502-appb-000032
表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
可选地,所述结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,包括:
通过如下公式估计所述当前音频帧的最终先验信噪比:
Figure PCTCN2017106502-appb-000033
其中,
Figure PCTCN2017106502-appb-000034
表示所述当前音频帧的最终先验信噪比,
Figure PCTCN2017106502-appb-000035
表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
本公开文本实施例还提供一种用户终端,包括:
第一估计模块,用于估计当前音频帧的预估先验信噪比;
第一计算模块,用于根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;
第二计算模块,用于计算所述当前音频帧的语音存在概率;
第二估计模块,用于结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。
可选地,所述第一估计模块用于基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。
可选地,所述第一估计模块用于通过如下公式估计当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000036
其中,
Figure PCTCN2017106502-appb-000037
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000038
表示前一帧的降噪处理结果,
Figure PCTCN2017106502-appb-000039
表示噪声方差,
Figure PCTCN2017106502-appb-000040
表示所述当前音频帧的后验信噪比估计值;
或者,
所述第一估计模块用于通过如下公式估计当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000041
其中,
Figure PCTCN2017106502-appb-000042
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000043
为前一帧的先验信噪比,
Figure PCTCN2017106502-appb-000044
表示当前帧的后验信噪比估计值。
可选地,所述用户终端还包括:
调整模块,用于通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:
Figure PCTCN2017106502-appb-000045
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。
可选地,所述第一估计模块还用于通过如下公式进一步估计所述当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000046
或者
Figure PCTCN2017106502-appb-000047
其中,
Figure PCTCN2017106502-appb-000048
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000049
Figure PCTCN2017106502-appb-000050
Figure PCTCN2017106502-appb-000051
分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
可选地,所述第一计算模块用于根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值:
Figure PCTCN2017106502-appb-000052
其中,
Figure PCTCN2017106502-appb-000053
表示所述预估先验信噪比对应的最小均方误差的估计值,
Figure PCTCN2017106502-appb-000054
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000055
表示所述当前音频帧的后验信噪比估 计值。
可选地,所述第二计算模块用于通过如下公式计算所述当前音频帧的语音存在概率:
Figure PCTCN2017106502-appb-000056
Figure PCTCN2017106502-appb-000057
或者
Figure PCTCN2017106502-appb-000058
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
Figure PCTCN2017106502-appb-000059
为一固定值,
Figure PCTCN2017106502-appb-000060
表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
可选地,所述第二估计模块用于通过如下公式估计所述当前音频帧的最终先验信噪比:
Figure PCTCN2017106502-appb-000061
其中,
Figure PCTCN2017106502-appb-000062
表示所述当前音频帧的最终先验信噪比,
Figure PCTCN2017106502-appb-000063
表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
本公开文本实施例还提供一种用户终端,包括:处理器、存储器和收发机,其中:
所述处理器用于读取存储器中的程序,执行下列过程:
估计当前音频帧的预估先验信噪比;
根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;
计算所述当前音频帧的语音存在概率;
结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,
其中,所述收发机用于接收和发送数据,所述存储器能够存储处理器在执行操作时所使用的数据。
本公开文本的上述技术方案至少具有如下有益效果:
本公开文本实施例,估计当前音频帧的预估先验信噪比;根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;计算所述当前音频帧的语音存在概率;结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。由于是结合当前帧的语音存在概率和当前音频帧的预估先验信噪比对应的最小均方误差的估计值估计的最终先验信噪比,相比相关技术中根据前一帧的先验信噪比进行估计,本公开文本实施例可以估算的先验信噪比与当前音频帧的相关性更高,从而有利于当前音频帧的噪声抑制。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。以下附图并未刻意按实际尺寸等比例缩放绘制,重点在于示出本申请的主旨。
图1为本公开文本实施例提供的一种噪声抑制信噪比估计方法的流程示意图;
图2为本公开文本实施例提供的另一种噪声抑制信噪比估计方法的示意图;
图3为本公开文本实施例提供的一种噪声抑制信噪比估计方法的实验数据示意图;
图4为本公开文本实施例提供的一种噪声抑制信噪比估计方法的另一实验数据示意图;
图5为本公开文本实施例提供的一种噪声抑制信噪比估计方法的另一实验数据示意图;
图6为本公开文本实施例提供的一种用户终端的结构示意图;
图7为本公开文本实施例提供的另一种用户终端的结构示意图;
图8为本公开文本实施例提供的另一种用户终端的结构示意图。
具体实施方式
下面将结合本公开文本实施例中的附图,对本公开文本实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开文本一部分实施例,而不是全部的实施例。基于本公开文本中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开文本保护的范围。
参见图1,本公开文本实施例提供一种噪声抑制信噪比估计方法,如图1所示,包括以下步骤:
101、估计当前音频帧的预估先验信噪比;
102、根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;
103、计算所述当前音频帧的语音存在概率;
104、结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。
本公开文本实施例中,上述当前音频帧可以是用户终端的麦克风采集的当前帧,该当前帧可能是语音帧,也有可能是噪声帧。
另外,上述预估先验信噪比可以是采用直接判决方法或者最大似然方法等方法进行估计的先验信噪比。上述计算预估先验信噪比的MMSE的估计值可以是采用MMSE算法得到上述预估先验信噪比的MMSE的估计值。上述当前音频帧的语音存在概率可以根据当前音频帧的后验信噪比计算当前音频帧的语音存概率,也可以是结合前几帧相同频点的后验信噪比做一个平均或者平滑得到的值计算当前音频帧的语音存在概率。
需要说明的是,对于步骤103与步骤101和步骤102之间的执行顺序,本公开文本实施例不作限定,例如:可以是先执行步骤103,再执行步骤101,或者可以是先执行步骤101,之后再执行步骤103。
另外,上述当前音频帧的最终先验信噪比可以是理解为,在对音频帧进 行降噪过程中用于增益计算的先验信噪比,或者也可以理解为本公开文本实施例中针对当前音频帧输出的先验信噪比。结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比可以是,根据上述语音存在概率确定当前音频帧为语音帧的概率,若确定当前音频帧为纯噪声帧,则将上述最终先验信噪比设置为一个稳定的最小值,例如ξmin,以保证纯噪声段处理平稳,减小音乐噪声;而当确定当前音频帧为语音段中的音频帧时,则计算最终先验信噪比偏向于上述预估先验信噪比对应的最小均方误差的估计值,使得最终先验信噪比估计更为准确。
通过上述步骤可以实现结合当前帧的语音存在概率和当前音频帧的预估先验信噪比的最小均方误差的估计值估计的最终先验信噪比,估算的先验信噪比与当前音频帧的相关性更高,从而有利于当前音频帧的噪声抑制,以提高噪声抑制效果。
可选地,所述估计当前音频帧的预估先验信噪比,包括:
基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。
其中,当前音频帧的后验信噪比为公知常识,此处不作详细说明。其中,基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比可以是基于所述当前音频帧的后验信噪比估计值采用直接判决方法估计当前音频帧的预估先验信噪比,当然,本公开文本实施例对此并不作限定。
可选地,上述基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比,包括:
通过如下公式估计当前音频帧的预估验信噪比:
Figure PCTCN2017106502-appb-000064
其中,
Figure PCTCN2017106502-appb-000065
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000066
表示前一帧的降噪处理结果,
Figure PCTCN2017106502-appb-000067
表示噪声方差,
Figure PCTCN2017106502-appb-000068
表示所述当前音频帧的后验信噪比估计值;
或者,
通过如下公式估计当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000069
其中,
Figure PCTCN2017106502-appb-000070
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000071
为前一帧的先验信噪比,
Figure PCTCN2017106502-appb-000072
表示当前帧的后验信噪比估计值。
该实施方式中,可以通过上述两个公式中的任一公式估算上述预估先验信噪比。根据实验表明采用
Figure PCTCN2017106502-appb-000073
对应的公式计算上述预估先验信噪比效果更好,该方法中主要是音乐噪声(musical tone)会少,所以本公开文本实施例中可选地,采用
Figure PCTCN2017106502-appb-000074
对应的公式计算上述预估先验信噪比。
另外,上述平滑数可以是预先设置的数值,例如,为0.95~1的值,或者为0.98或者0.3等数值,对此不作限定,而噪声方差为公知常识,对此不作详细说明。
可选地,上述方法还包括:
通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:
Figure PCTCN2017106502-appb-000075
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。
该实施方式中,考虑到α因子需要在纯噪声时,保证尽可能的大,使得估计出来的值尽可能的稳定,而在有语音段的时候需要尽可能的小,以便保证快速的跟踪语音。其中,上述a1和a2可以分别为0.98和0.3,当然,本公开文本实施例对此并不作限定,例如:还可以是0.95和0.28等,具体还可以根据实际进行调整。
该实施方式中,通过上述a1和a2可以提高预估先验信噪比的准确性。
可选地,该实施方式中,上述基于所述语音存在概率估计值估计当前音频帧的预估先验信噪比的步骤,进一步还包括:
通过如下公式进一步估计所述当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000076
或者
Figure PCTCN2017106502-appb-000077
其中,
Figure PCTCN2017106502-appb-000078
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000079
Figure PCTCN2017106502-appb-000080
分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
该实施方式中,可以根据当前音频帧的音频存在概率切换预估先验信噪比,以提高预估先验信噪比的准确性。
可选地,上述根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值,包括:
根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:
Figure PCTCN2017106502-appb-000081
其中,
Figure PCTCN2017106502-appb-000082
表示所述预估先验信噪比对应的最小均方误差的估计值,
Figure PCTCN2017106502-appb-000083
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000084
表示所述当前音频帧的后验信噪比估计值。
需要说明的是,上述
Figure PCTCN2017106502-appb-000085
表示步骤101计算得到的所述预估先验信噪比,并不限定是通过上述提到的关于
Figure PCTCN2017106502-appb-000086
公式计算的预估先验信噪比。
其中,上述可以是根据复高斯模型得到的
Figure PCTCN2017106502-appb-000087
此外,还可以采用语音的超高斯模型来计算E(X2|Y)。其中,
Figure PCTCN2017106502-appb-000088
可以等效于E(X2|Y)。因为在实际应用中,先验信噪比主要是估计语音信号的方差
Figure PCTCN2017106502-appb-000089
根据定义
Figure PCTCN2017106502-appb-000090
这只依赖于语音信号X。但X无从获取,所以大部分对
Figure PCTCN2017106502-appb-000091
的估计算法,都得从带噪信号Y估计。这一点也可以从直接判决方法看出,在直接判决方法的计算公式的后一半中的γ-1是对语音方差
Figure PCTCN2017106502-appb-000092
在γ已知(i.e.Y已知)的情况的最大似然估计,前一半是使用瞬时值
Figure PCTCN2017106502-appb-000093
来替换E(X2)。
所以,从大部分信噪比估计算法来看,都需要建立在带噪信号Y已知的条件下。换句话说,实际上,并不能直接估计语音方差
Figure PCTCN2017106502-appb-000094
而是在Y已知的条件,估计
Figure PCTCN2017106502-appb-000095
因此,本公开文本实施例中,采用条件期望
Figure PCTCN2017106502-appb-000096
Figure PCTCN2017106502-appb-000097
来估计语音方差
Figure PCTCN2017106502-appb-000098
在这种想法的基础上,从条件期望的定义
Figure PCTCN2017106502-appb-000099
可以看出,对应的其实是对语音幅度谱X2的MMSE估计。考虑Y中有语音的概率p(H1|Y),条件期望最终的表达式为:
Figure PCTCN2017106502-appb-000100
根据复高斯模型:
Figure PCTCN2017106502-appb-000101
其中,p(H0|Y)表示Y已知的条件下,无语音H0的概率,即条件概率,二元假设:
H0:Y=N,表示无语音
H1:Y=X+N,表示有语音
E(X2|Y,H0)根据上述二元假设,E(X2|Y,H0)=0。
上式中
Figure PCTCN2017106502-appb-000102
是真正的语音方差,实际需要进一步估计,可以采用最大似然或者直接判决方法估计,另一个方面,还可以从假设语音服从其它模型,例如超高斯模型等,例如卡方(chi)分布:
Figure PCTCN2017106502-appb-000103
之后推导出
Figure PCTCN2017106502-appb-000104
Figure PCTCN2017106502-appb-000105
上面
Figure PCTCN2017106502-appb-000106
是汇通型超几何函数。由于包含超越函数,使得整体计算比较复杂,一般需要查表等方式来实现。
通过上述分析可知,上述关于
Figure PCTCN2017106502-appb-000107
表示所的公式可以通过复高斯模型
Figure PCTCN2017106502-appb-000108
和超高斯模型
Figure PCTCN2017106502-appb-000109
推导得到。
需要说明的是,本公开文本实施例中,直接可以采用上述公式计算预估先验信噪比的最小均方误差的估计值,而不需要执行上述条件期望的推导过程,而执行相应的步骤即可,上述条件期望仅是本公开文本实施例中在实施时的原理解释说明。
可选地,所述计算所述当前音频帧的语音存在概率,包括:
通过如下公式计算所述当前音频帧的语音存在概率:
Figure PCTCN2017106502-appb-000110
Figure PCTCN2017106502-appb-000111
或者
Figure PCTCN2017106502-appb-000112
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
Figure PCTCN2017106502-appb-000113
为一固定值,
Figure PCTCN2017106502-appb-000114
表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
该实施方式中,通过上述公式区分语音和噪声。另外,使用上面公式计算语音存在概率时可以结合前几帧相同频点的后验信噪比做一个平均或者平滑得到的值计算当前音频帧的语音存在概率。另外,上面公式可以是根据上面提供的复高斯模型直接推导出来的。
本公开文本实施例中,通过语音存在概率是提供一个语音存在的概率,使得当前估计的先验信噪比能够在纯噪声和语音段进行软切换,从而加快直接判决方法存在的跟踪时延问题,同时又能保留直接判决方法的优点。
可选地,上述结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,包括:
通过如下公式估计所述当前音频帧的最终先验信噪比:
Figure PCTCN2017106502-appb-000115
其中,
Figure PCTCN2017106502-appb-000116
表示所述当前音频帧的最终先验信噪比,
Figure PCTCN2017106502-appb-000117
表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
该实施方式中,通过上述公式计算使得最终先验信噪比纯噪声尽可能保持在一个稳定的小的数值,例如ξmin,而在语音段时,估计的先验信噪比偏向于
Figure PCTCN2017106502-appb-000118
或者理解为估计的先验信噪比偏向于
Figure PCTCN2017106502-appb-000119
该实施方式中,可以区分有语音状态和无语音状态,在有语音状态根据MMSE准则推导出最优的先验信噪估计。无语音状态,使用某一个最小值来作为最大抑制力度的限制,可以保证纯噪声段处理平稳,减小音乐噪声。语音存在和不存在状态的采用语音存在概率进行计算,该概率采用固定值先验信噪比计算,从而使得先验信噪比估计的更为准确,可以解决直接判决存在 的跟踪时延问题。
需要说明的是,本公开文本实施例中,上述介绍的多种实施方式可以相互结合实现,也可以单独实现,对此本公开文本实施例不作限定。另外,本公开文本实施例中,估算的先验信噪比可以用于音频信号的降噪过程的增益计算,可选地,可以应用采用单个麦克风降噪过程的增益计算。例如:如图2所示,获取后验信噪比和前一帧处理结构功率谱,基于后验信噪比和前一帧处理结构功率谱使用直接判决方法计算当前音频帧的预估先验信噪比,基于后验信噪比计算当前音频信号帧的语音存在概率,计算预估先验信噪比的MMSE的估计值,以及结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,该先验信噪比用于增益计算。
本公开文本实施例中,通过上述步骤可以消除固有延时一帧的影响,缓解语音的起始段被衰减和末尾段存在的拖尾,进而带来降噪性能的提升。以下通过实验数据进行效果说明:
实验采用Noizus数据库,数据的采样率为8kHz,白噪声使用Cool Edit(为一音频处理软件)生成,其它噪声则为Noizus数据库自带。帧长取20ms,重叠率为50%,前后各使用平方根哈宁窗(Hanning window),
Figure PCTCN2017106502-appb-000120
取15dB。ξmin取-20dB,抑制准则采用MMSE-STSA(Short-Time Spectral Amplitude,短时谱幅度)算法,噪声估计采用无偏MMSE算法。
图3和图4分别是信噪比为0dB和5dB时的直接判决和本公开文本方法之间的对比。图3的语音为sp01,噪声为白噪,图4的语音为sp04,噪声为汽车噪声,其中,sp01和sp04是数据集里面的语音编号。箭头处可以看出,本公开文本方法明显优于对比算法。主观对比听,处理结果音乐噪声均不明显。图5为Noizus数据库30组汽车噪声和白噪声,在0/5/10/15dB下的平均段信噪比提升,从图中不难看出,本公开文本方法性能优于直接判决。
需要说明的是,上述方法可以应用于任何具备麦克风的用户终端,例如:手机、平板电脑(Tablet Personal Computer)、膝上型电脑(Laptop Computer)、个人数字助理(personal digital assistant,PDA)、移动上网装置(Mobile Intemet Device,MID)、车载设备或可穿戴式设备(Wearable Device)等终端设备,需要说明的是,在本公开文本实施例中并不限定用户终端的具体类型。
估计当前音频帧的预估先验信噪比;根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;计算所述当前音频帧的语音存在概率;结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。由于是结合当前帧的语音存在概率和当前音频帧的预估先验信噪比对应的最小均方误差的估计值估计的最终先验信噪比,相比相关技术中根据前一帧的先验信噪比进行估计,本公开文本实施例可以估算的先验信噪比与当前音频帧的相关性更高,从而有利于当前音频帧的噪声抑制。
参见图6,本公开文本实施例提供一种用户终端,如图6所示,用户终端600,包括以下模块:
第一估计模块601,用于估计当前音频帧的预估先验信噪比;
第一计算模块602,用于根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;
第二计算模块603,用于计算所述当前音频帧的语音存在概率;
第二估计模块604,用于结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。
可选地,第一估计模块601用于基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。
可选地,第一估计模块601用于通过如下公式估计当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000121
其中,
Figure PCTCN2017106502-appb-000122
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000123
表示前一帧的降噪处理结果,
Figure PCTCN2017106502-appb-000124
表示噪声方差,
Figure PCTCN2017106502-appb-000125
表示所述当前音频帧的后验信噪比估计值;
或者,
所述第一估计模块601用于通过如下公式估计当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000126
其中,
Figure PCTCN2017106502-appb-000127
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000128
为前一帧的先验信噪比,
Figure PCTCN2017106502-appb-000129
表示当前帧的后验信噪比估计值。
可选地,如图7所示,用户终端600还包括:
调整模块605,用于通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:
Figure PCTCN2017106502-appb-000130
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。
可选地,第一估计模块601还用于通过如下公式进一步估计所述当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000131
或者
Figure PCTCN2017106502-appb-000132
其中,
Figure PCTCN2017106502-appb-000133
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000134
Figure PCTCN2017106502-appb-000135
分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
可选地,第一计算模块602用于根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:
Figure PCTCN2017106502-appb-000136
其中,
Figure PCTCN2017106502-appb-000137
表示所述预估先验信噪比对应的最小均方误差的估计值,
Figure PCTCN2017106502-appb-000138
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000139
表示所述当前音频帧的后验信噪比估计值。
可选地,第二计算模块603用于通过如下公式计算所述当前音频帧的语音存在概率:
Figure PCTCN2017106502-appb-000140
Figure PCTCN2017106502-appb-000141
或者
Figure PCTCN2017106502-appb-000142
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
Figure PCTCN2017106502-appb-000143
为一固定值,
Figure PCTCN2017106502-appb-000144
表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
可选地,第二估计模块604用于通过如下公式估计所述当前音频帧的最终先验信噪比:
Figure PCTCN2017106502-appb-000145
其中,
Figure PCTCN2017106502-appb-000146
表示所述当前音频帧的最终先验信噪比,
Figure PCTCN2017106502-appb-000147
表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
需要说明的是,本实施例中上述用户终端600可以是与本公开文本实施例中方法实施例提供的语音信号降噪方法对应的用户终端,本公开文本实施例中方法实施例中的任意实施方式都可以被本实施例中的上述用户终端600所实现,以及达到相同的有益效果,此处不再赘述。
参见图8,本公开文本实施例提供另一种用户终端的结构,该用户终端包括:处理器800、收发机810、存储器820、用户接口830和总线接口,其中:
处理器800,用于读取存储器820中的程序,执行下列过程:
估计当前音频帧的预估先验信噪比;
根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;
计算所述当前音频帧的语音存在概率;
结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。
其中,用户接口830中包括的麦克风,收发机810,用于在处理器800的控制下接收和发送数据。
在图8中,总线架构可以包括任意数量的互联的总线和桥,具体由处理器800代表的一个或多个处理器和存储器820代表的存储器的各种电路链接在一起。总线架构还可以将诸如***设备、稳压器和功率管理电路等之类的各种其他电路链接在一起。总线接口提供接口。收发机810可以是多个元件,即包括发送机和接收机,提供用于在传输介质上与各种其他装置通信的单元。针对不同的用户设备,用户接口830还可以是能够外接内接需要设备的接口,连接的设备包括但不限于小键盘、显示器、扬声器、麦克风、操纵杆等。
处理器800负责管理总线架构和通常的处理,存储器820可以存储处理器800在执行操作时所使用的数据。
可选地,所述估计当前音频帧的预估先验信噪比,包括:
基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。
可选地,所述基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比,包括:
通过如下公式估计当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000148
其中,
Figure PCTCN2017106502-appb-000149
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000150
表示前一帧的降噪处理结果,
Figure PCTCN2017106502-appb-000151
表示噪声方差,
Figure PCTCN2017106502-appb-000152
表示所述当前音频帧的后验信噪比估计值;
或者,
通过如下公式估计当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000153
其中,
Figure PCTCN2017106502-appb-000154
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000155
为前一帧的先验信噪比,
Figure PCTCN2017106502-appb-000156
表示当前帧的后验信噪比估计值。
可选地,处理器800还用于:
通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:
Figure PCTCN2017106502-appb-000157
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。
可选地,所述基于所述语音存在概率估计值估计当前音频帧的预估先验信噪比的步骤,进一步还包括:
通过如下公式进一步估计所述当前音频帧的预估先验信噪比:
Figure PCTCN2017106502-appb-000158
或者
Figure PCTCN2017106502-appb-000159
其中,
Figure PCTCN2017106502-appb-000160
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000161
Figure PCTCN2017106502-appb-000162
分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
可选地,所述根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值,包括:
根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:
Figure PCTCN2017106502-appb-000163
其中,
Figure PCTCN2017106502-appb-000164
表示所述预估先验信噪比对应的最小均方误差的估计值,
Figure PCTCN2017106502-appb-000165
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000166
表示所述当前音频帧的后验信噪比估计值。
可选地,所述计算所述当前音频帧的语音存在概率,包括:
通过如下公式计算所述当前音频帧的语音存在概率:
Figure PCTCN2017106502-appb-000167
Figure PCTCN2017106502-appb-000168
或者
Figure PCTCN2017106502-appb-000169
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
Figure PCTCN2017106502-appb-000170
为一固定值,
Figure PCTCN2017106502-appb-000171
表示所述当前音频帧的后 验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
可选地,所述结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,包括:
通过如下公式估计所述当前音频帧的最终先验信噪比:
Figure PCTCN2017106502-appb-000172
其中,
Figure PCTCN2017106502-appb-000173
表示所述当前音频帧的最终先验信噪比,
Figure PCTCN2017106502-appb-000174
表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
需要说明的是,本实施例中上述用户终端可以是与本公开文本实施例中方法实施例提供的语音信号降噪方法对应的用户终端,本公开文本实施例中方法实施例中的任意实施方式都可以被本实施例中的上述用户终端所实现,以及达到相同的有益效果,此处不再赘述
在本申请所提供的几个实施例中,应该理解到,所揭露方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
另外,在本公开文本各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理包括,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开文本各个实施例所述收发方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存 储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述是本公开文本的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本公开文本所述原理的前提下,还可以作出若干改进和润饰,这些改进和润饰也应视为本公开文本的保护范围。

Claims (17)

  1. 一种噪声抑制信噪比估计方法,包括:
    估计当前音频帧的预估先验信噪比;
    根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;
    计算所述当前音频帧的语音存在概率;
    结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。
  2. 如权利要求1所述的方法,其中,所述估计当前音频帧的预估先验信噪比,包括:
    基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。
  3. 如权利要求2所述的方法,其中,所述基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比,包括:
    通过如下公式估计当前音频帧的预估先验信噪比:
    Figure PCTCN2017106502-appb-100001
    其中,
    Figure PCTCN2017106502-appb-100002
    表示所述预估先验信噪比,α为平滑数,
    Figure PCTCN2017106502-appb-100003
    表示前一帧的降噪处理结果,
    Figure PCTCN2017106502-appb-100004
    表示噪声方差,
    Figure PCTCN2017106502-appb-100005
    表示所述当前音频帧的后验信噪比估计值;
    或者,
    通过如下公式估计当前音频帧的预估先验信噪比:
    Figure PCTCN2017106502-appb-100006
    其中,
    Figure PCTCN2017106502-appb-100007
    表示所述预估先验信噪比,α为平滑数,
    Figure PCTCN2017106502-appb-100008
    为前一帧的先验信噪比,
    Figure PCTCN2017106502-appb-100009
    表示当前帧的后验信噪比估计值。
  4. 如权利要求3所述的方法,还包括:
    通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:
    Figure PCTCN2017106502-appb-100010
    其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。
  5. 如权利要求4所述的方法,其中,所述基于所述语音存在概率估计值估计当前音频帧的预估先验信噪比的步骤,进一步还包括:
    通过如下公式进一步估计所述当前音频帧的预估先验信噪比:
    Figure PCTCN2017106502-appb-100011
    或者
    Figure PCTCN2017106502-appb-100012
    其中,
    Figure PCTCN2017106502-appb-100013
    表示所述预估先验信噪比,
    Figure PCTCN2017106502-appb-100014
    Figure PCTCN2017106502-appb-100015
    分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
  6. 如权利要求1-5中任一项所述的方法,其中,所述根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值,包括:
    根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:
    Figure PCTCN2017106502-appb-100016
    其中,
    Figure PCTCN2017106502-appb-100017
    表示所述预估先验信噪比对应的最小均方误差的估计值,
    Figure PCTCN2017106502-appb-100018
    表示所述预估先验信噪比,
    Figure PCTCN2017106502-appb-100019
    表示所述当前音频帧的后验信噪比估计值。
  7. 如权利要求1-5中任一项所述的方法,其中,所述计算所述当前音频帧的语音存在概率,包括:
    通过如下公式计算所述当前音频帧的语音存在概率:
    Figure PCTCN2017106502-appb-100020
    Figure PCTCN2017106502-appb-100021
    或者
    Figure PCTCN2017106502-appb-100022
    其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
    Figure PCTCN2017106502-appb-100023
    为一固定值,
    Figure PCTCN2017106502-appb-100024
    表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
  8. 如权利要求1-5中任一项所述的方法,其中,所述结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,包括:
    通过如下公式估计所述当前音频帧的最终先验信噪比:
    Figure PCTCN2017106502-appb-100025
    其中,
    Figure PCTCN2017106502-appb-100026
    表示所述当前音频帧的最终先验信噪比,
    Figure PCTCN2017106502-appb-100027
    表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
  9. 一种用户终端,包括:
    第一估计模块,用于估计当前音频帧的预估先验信噪比;
    第一计算模块,用于根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;
    第二计算模块,用于计算所述当前音频帧的语音存在概率;
    第二估计模块,用于结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。
  10. 如权利要求9所述的用户终端,其中,所述第一估计模块用于基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。
  11. 如权利要求10所述的用户终端,其中,所述第一估计模块用于通过如下公式估计当前音频帧的预估先验信噪比:
    Figure PCTCN2017106502-appb-100028
    其中,
    Figure PCTCN2017106502-appb-100029
    表示所述预估先验信噪比,α为平滑数,
    Figure PCTCN2017106502-appb-100030
    表示前一帧的降噪处理结果,
    Figure PCTCN2017106502-appb-100031
    表示噪声方差,
    Figure PCTCN2017106502-appb-100032
    表示所述当前音频帧的后验信噪 比估计值;
    或者,
    所述第一估计模块用于通过如下公式估计当前音频帧的预估先验信噪比:
    Figure PCTCN2017106502-appb-100033
    其中,
    Figure PCTCN2017106502-appb-100034
    表示所述预估先验信噪比,α为平滑数,
    Figure PCTCN2017106502-appb-100035
    为前一帧的先验信噪比,
    Figure PCTCN2017106502-appb-100036
    表示当前帧的后验信噪比估计值。
  12. 如权利要求11所述的用户终端,还包括:
    调整模块,用于通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:
    Figure PCTCN2017106502-appb-100037
    其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。
  13. 如权利要求12所述的用户终端,其中,所述第一估计模块还用于通过如下公式进一步估计所述当前音频帧的预估先验信噪比:
    Figure PCTCN2017106502-appb-100038
    或者
    Figure PCTCN2017106502-appb-100039
    其中,
    Figure PCTCN2017106502-appb-100040
    表示所述预估先验信噪比,
    Figure PCTCN2017106502-appb-100041
    Figure PCTCN2017106502-appb-100042
    分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
  14. 如权利要求9-13中任一项所述的用户终端,其中,所述第一计算模块用于根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:
    Figure PCTCN2017106502-appb-100043
    其中,
    Figure PCTCN2017106502-appb-100044
    表示所述预估先验信噪比对应的最小均方误差的估计值,
    Figure PCTCN2017106502-appb-100045
    表示所述预估先验信噪比,
    Figure PCTCN2017106502-appb-100046
    表示所述当前音频帧的后验信噪比估计值。
  15. 如权利要求9-13中任一项所述的用户终端,其中,所述第二计算模块用于通过如下公式计算所述当前音频帧的语音存在概率:
    Figure PCTCN2017106502-appb-100047
    Figure PCTCN2017106502-appb-100048
    或者
    Figure PCTCN2017106502-appb-100049
    其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
    Figure PCTCN2017106502-appb-100050
    为一固定值,
    Figure PCTCN2017106502-appb-100051
    表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
  16. 如权利要求9-13中任一项所述的用户终端,其中,所述第二估计模块用于通过如下公式估计所述当前音频帧的最终先验信噪比:
    Figure PCTCN2017106502-appb-100052
    其中,
    Figure PCTCN2017106502-appb-100053
    表示所述当前音频帧的最终先验信噪比,
    Figure PCTCN2017106502-appb-100054
    表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
  17. 一种用户终端,包括:处理器、存储器和收发机,其中:
    所述处理器用于读取存储器中的程序,执行下列过程:
    估计当前音频帧的预估先验信噪比;
    根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;
    计算所述当前音频帧的语音存在概率;
    结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,
    其中,所述收发机用于接收和发送数据,所述存储器能够存储处理器在执行操作时所使用的数据。
PCT/CN2017/106502 2016-11-10 2017-10-17 噪声抑制信噪比估计方法和用户终端 WO2018086444A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611039463.4A CN108074582B (zh) 2016-11-10 2016-11-10 一种噪声抑制信噪比估计方法和用户终端
CN201611039463.4 2016-11-10

Publications (1)

Publication Number Publication Date
WO2018086444A1 true WO2018086444A1 (zh) 2018-05-17

Family

ID=62109133

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/106502 WO2018086444A1 (zh) 2016-11-10 2017-10-17 噪声抑制信噪比估计方法和用户终端

Country Status (2)

Country Link
CN (1) CN108074582B (zh)
WO (1) WO2018086444A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986693A (zh) * 2020-08-10 2020-11-24 北京小米松果电子有限公司 音频信号的处理方法及装置、终端设备和存储介质
US20210327448A1 (en) * 2018-12-18 2021-10-21 Tencent Technology (Shenzhen) Company Limited Speech noise reduction method and apparatus, computing device, and computer-readable storage medium
CN113838474A (zh) * 2021-11-25 2021-12-24 全时云商务服务股份有限公司 通信***啸叫抑制方法和装置
CN114724571A (zh) * 2022-03-29 2022-07-08 大连理工大学 一种鲁棒的分布式说话人噪声消除***

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767781A (zh) * 2019-03-06 2019-05-17 哈尔滨工业大学(深圳) 基于超高斯先验语音模型与深度学习的语音分离方法、***及存储介质
CN109817234B (zh) * 2019-03-06 2021-01-26 哈尔滨工业大学(深圳) 基于连续噪声跟踪的目标语音信号增强方法、***及存储介质
CN111899752B (zh) * 2020-07-13 2023-01-10 紫光展锐(重庆)科技有限公司 快速计算语音存在概率的噪声抑制方法及装置、存储介质、终端
CN112969130A (zh) * 2020-12-31 2021-06-15 维沃移动通信有限公司 音频信号处理方法、装置和电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1763846A (zh) * 2005-11-23 2006-04-26 北京中星微电子有限公司 一种语音增益因子估计装置和方法
WO2006136900A1 (en) * 2005-06-15 2006-12-28 Nortel Networks Limited Method and apparatus for non-intrusive single-ended voice quality assessment in voip
CN103187068A (zh) * 2011-12-30 2013-07-03 联芯科技有限公司 基于Kalman的先验信噪比估计方法、装置及噪声抑制方法
CN105280193A (zh) * 2015-07-20 2016-01-27 广东顺德中山大学卡内基梅隆大学国际联合研究院 基于mmse误差准则的先验信噪比估计方法
CN105702262A (zh) * 2014-11-28 2016-06-22 上海航空电器有限公司 一种头戴式双麦克风语音增强方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814290A (zh) * 2009-02-25 2010-08-25 三星电子株式会社 增强语音识别***稳健性的方法
CN101853665A (zh) * 2009-06-18 2010-10-06 博石金(北京)信息技术有限公司 语音中噪声的消除方法
JP6129316B2 (ja) * 2012-09-03 2017-05-17 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン 情報に基づく多チャネル音声存在確率推定を提供するための装置および方法
CN102938254B (zh) * 2012-10-24 2014-12-10 中国科学技术大学 一种语音信号增强***和方法
US9449609B2 (en) * 2013-11-07 2016-09-20 Continental Automotive Systems, Inc. Accurate forward SNR estimation based on MMSE speech probability presence
US9449610B2 (en) * 2013-11-07 2016-09-20 Continental Automotive Systems, Inc. Speech probability presence modifier improving log-MMSE based noise suppression performance
CN103646648B (zh) * 2013-11-19 2016-03-23 清华大学 一种噪声功率估计方法
CN105741849B (zh) * 2016-03-06 2019-03-22 北京工业大学 数字助听器中融合相位估计与人耳听觉特性的语音增强方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006136900A1 (en) * 2005-06-15 2006-12-28 Nortel Networks Limited Method and apparatus for non-intrusive single-ended voice quality assessment in voip
CN1763846A (zh) * 2005-11-23 2006-04-26 北京中星微电子有限公司 一种语音增益因子估计装置和方法
CN103187068A (zh) * 2011-12-30 2013-07-03 联芯科技有限公司 基于Kalman的先验信噪比估计方法、装置及噪声抑制方法
CN105702262A (zh) * 2014-11-28 2016-06-22 上海航空电器有限公司 一种头戴式双麦克风语音增强方法
CN105280193A (zh) * 2015-07-20 2016-01-27 广东顺德中山大学卡内基梅隆大学国际联合研究院 基于mmse误差准则的先验信噪比估计方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210327448A1 (en) * 2018-12-18 2021-10-21 Tencent Technology (Shenzhen) Company Limited Speech noise reduction method and apparatus, computing device, and computer-readable storage medium
CN111986693A (zh) * 2020-08-10 2020-11-24 北京小米松果电子有限公司 音频信号的处理方法及装置、终端设备和存储介质
CN113838474A (zh) * 2021-11-25 2021-12-24 全时云商务服务股份有限公司 通信***啸叫抑制方法和装置
CN113838474B (zh) * 2021-11-25 2022-02-18 全时云商务服务股份有限公司 通信***啸叫抑制方法和装置
CN114724571A (zh) * 2022-03-29 2022-07-08 大连理工大学 一种鲁棒的分布式说话人噪声消除***
CN114724571B (zh) * 2022-03-29 2024-05-03 大连理工大学 一种鲁棒的分布式说话人噪声消除***

Also Published As

Publication number Publication date
CN108074582B (zh) 2021-08-06
CN108074582A (zh) 2018-05-25

Similar Documents

Publication Publication Date Title
WO2018086444A1 (zh) 噪声抑制信噪比估计方法和用户终端
US20210327448A1 (en) Speech noise reduction method and apparatus, computing device, and computer-readable storage medium
US20230298610A1 (en) Noise suppression method and apparatus for quickly calculating speech presence probability, and storage medium and terminal
US8239196B1 (en) System and method for multi-channel multi-feature speech/noise classification for noise suppression
CN110634497B (zh) 降噪方法、装置、终端设备及存储介质
WO2021179424A1 (zh) 结合ai模型的语音增强方法、***、电子设备和介质
JP6361156B2 (ja) 雑音推定装置、方法及びプログラム
US9721580B2 (en) Situation dependent transient suppression
US20100278351A1 (en) Methods and systems for reducing acoustic echoes in multichannel communication systems by reducing the dimensionality of the space of impulse resopnses
WO2021128670A1 (zh) 降低噪声的方法、装置、电子设备及可读存储介质
WO2012158156A1 (en) Noise supression method and apparatus using multiple feature modeling for speech/noise likelihood
WO2022161277A1 (zh) 语音增强方法、模型训练方法以及相关设备
CN109817234A (zh) 基于连续噪声跟踪的目标语音信号增强方法、***及存储介质
CN109727607B (zh) 时延估计方法、装置及电子设备
WO2021007841A1 (zh) 噪声估计方法、噪声估计装置、语音处理芯片以及电子设备
WO2020124325A1 (zh) 一种回声消除中的自适应滤波方法、装置、设备及存储介质
US20140357326A1 (en) Echo suppression
WO2022218254A1 (zh) 语音信号增强方法、装置及电子设备
WO2012166092A1 (en) Control of adaptation step size and suppression gain in acoustic echo control
WO2024041512A1 (zh) 音频降噪方法、装置、电子设备及可读存储介质
WO2021143249A1 (zh) 基于瞬态噪声抑制的音频处理方法、装置、设备及介质
CN112289337B (zh) 一种滤除机器学习语音增强后的残留噪声的方法及装置
WO2019119593A1 (zh) 语音增强方法及装置
US11922933B2 (en) Voice processing device and voice processing method
CN113611319A (zh) 基于语音成分实现的风噪抑制方法、装置、设备及***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17869048

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17869048

Country of ref document: EP

Kind code of ref document: A1