CN115527550A - Single-microphone subband domain noise reduction method and system - Google Patents
Single-microphone subband domain noise reduction method and system Download PDFInfo
- Publication number
- CN115527550A CN115527550A CN202211013301.9A CN202211013301A CN115527550A CN 115527550 A CN115527550 A CN 115527550A CN 202211013301 A CN202211013301 A CN 202211013301A CN 115527550 A CN115527550 A CN 115527550A
- Authority
- CN
- China
- Prior art keywords
- signal
- noise
- subband
- sub
- spectrum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000009467 reduction Effects 0.000 title claims description 31
- 238000001228 spectrum Methods 0.000 claims abstract description 143
- 230000001629 suppression Effects 0.000 claims abstract description 59
- 230000001131 transforming effect Effects 0.000 claims abstract description 6
- 230000003595 spectral effect Effects 0.000 claims description 15
- 238000004458 analytical method Methods 0.000 claims description 13
- 230000000630 rising effect Effects 0.000 claims description 10
- 230000015572 biosynthetic process Effects 0.000 claims description 9
- 238000009499 grossing Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 238000003786 synthesis reaction Methods 0.000 claims description 9
- 238000001514 detection method Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 4
- 239000000126 substance Substances 0.000 claims description 4
- 230000001174 ascending effect Effects 0.000 claims description 2
- 230000002401 inhibitory effect Effects 0.000 abstract description 6
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000004364 calculation method Methods 0.000 abstract description 3
- 230000002194 synthesizing effect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 15
- 238000005516 engineering process Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000012937 correction Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 241000287196 Asthenes Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000012074 hearing test Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 230000005404 monopole Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02163—Only one microphone
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a method and a system for reducing noise of a single-microphone subband domain, wherein the method comprises the steps of transforming a noisy speech sequence to generate a subband spectrum; carrying out envelope estimation on the subband spectrums; acquiring a sub-band domain suppression gain function according to the estimated envelope; revising the subband spectrum according to the subband domain suppression gain function to obtain a revised subband spectrum; synthesizing the revised subband spectrum and outputting the enhanced voice sequence. The system realizes each module of corresponding function. The invention transforms the input time domain signal into the sub-band spectrum signal, then estimates the envelope according to the sub-band spectrum signal, calculates the sub-band domain inhibiting gain, applies the sub-band domain inhibiting gain to revise the sub-band spectrum signal to obtain the revised sub-band spectrum signal, and finally inversely transforms the revised sub-band domain signal back to the time domain signal, thereby obtaining the enhanced voice signal. Compared with the prior art, the method can save millions of instructions per second by many times, has lower calculation complexity, is convenient to realize in real time, and is beneficial to popularization.
Description
Technical Field
The invention belongs to the technical field of communication noise reduction, and particularly relates to a single-microphone sub-band domain noise reduction method and system.
Background
The conference communication system operates in an unusually large and complex environment, and the voice signal picked up by the client microphone often contains environmental noise or interference, the existence of which seriously affects the conference call quality.
When ambient noise is mixed into a picked-up voice signal through a microphone, subjective quality of voice communication can be degraded even if the level of the noise is low or moderate. Hearing tests have shown that in situations where the signal-to-noise ratio is low, people cannot tolerate or even pay no attention to the noisy speech signal they hear, a phenomenon known as hearing fatigue; in particular, the intelligibility of speech will be affected when the signal-to-noise ratio is less than 10 dB. Even low levels of noise can present problems, especially when combining multiple voice channels in a conference or bridge. In a multi-party or multi-point conference communication, the background noise present at the microphone of each point of the conference is additively combined at the bridge with the noise processes from all other points. Thus, the loudspeaker at each location of the conference will reproduce the sum of the noise processes from all other locations. This problem becomes more and more severe as the number of meeting points increases. It is therefore desirable to perform noise reduction processing on the speech signal picked up by the microphone in order to improve the subjective quality of speech and to reduce the degree of degradation in perceived quality of speech communication due to listener fatigue.
At present, the noise suppression method based on a single microphone mainly comprises a classical wiener filtering technology, a dynamic comb filtering technology, a dynamic and linear all-pole and zero-pole modeling technology of voice, a short-time spectrum correction technology and a hidden Markov modeling technology. However, these techniques have certain drawbacks.
For example, linear filters in dynamic comb filtering techniques are adapted to pass only harmonic components of voiced sounds derived from the pitch period; the coefficients of the noise-free model in the dynamic, linear all-pole and zero-pole modeling techniques of speech are estimated from noisy speech; in the short-time spectrum correction technology, the amplitude of short-time Fourier transform is attenuated at the frequency without voice; hidden markov modeling techniques employ time-varying models of speech, but the evolution of the model coefficients is controlled by transition probabilities associated with the model states. The noise reduction technical methods are all operated in single-channel noisy speech, belong to the blind technical category, and only know input speech signals containing noise.
In order to enhance the signal-to-noise ratio of a noisy speech signal, the prior art needs to perform bootstrap estimation on noise and a speech signal respectively, estimate the signal-to-noise ratio of the noisy speech signal by using the assumptions of the intermittency of speech and the stationarity of noise, and perform noise suppression accordingly. This approach in practice can produce musical noise and speech distortion, especially in noisy scenes with non-stationary and strong interference levels and in far-field conditions.
Disclosure of Invention
The invention aims to provide a single-microphone sub-band domain noise reduction method and a single-microphone sub-band domain noise reduction system which reduce algorithm complexity and are easy to realize in real time.
The invention provides a single-microphone subband domain noise reduction method, which comprises the following steps:
transforming the noisy speech sequence to generate a subband spectrum;
carrying out envelope estimation on the subband spectrums;
acquiring a sub-band domain suppression gain function according to the estimated envelope;
revising the subband spectrum according to the subband domain suppression gain function;
and transforming the revised sub-band spectrum into a time domain signal to obtain a noise-reduced voice sequence.
The estimated envelope comprises a speech sub-band spectral envelope signal and a noise sub-band spectral envelope signal;
the envelope estimation of the subband spectrum comprises:
carrying out first nonlinear single-pole recursion on the subband spectrums to obtain speech subband spectrum envelopes;
performing a second nonlinear unipolar recursion on the subband spectrums to obtain noise subband spectrum basic envelopes;
performing voice activity detection by using the voice sub-band spectrum envelope signal and the noise sub-band spectrum basic envelope, if the voice signal is detected, taking the noise sub-band spectrum basic envelope as a noise sub-band spectrum envelope, otherwise, performing a third nonlinear single-pole recursion on the noise sub-band spectrum basic envelope to obtain the noise sub-band spectrum envelope;
wherein an ascending parameter of the first non-linear unipolar recursion is less than a descending parameter;
the rising parameter of the second and third non-linear recursions is greater than the falling parameter.
The descending parameter of the first nonlinear monopole recursion is a preset time constant;
rising parameter of the first nonlinear unipolar recurrenceDetermining according to the signal-to-noise ratio state of the subband spectrum:
The obtaining of the subband-domain suppression gain function according to the estimated envelope comprises:
determining an original value of a suppression gain according to the signal-to-noise ratio state of the subband spectrum, the speech subband spectrum envelope and the noise subband spectrum envelope;
performing frequency domain adjacent band smoothing on the original value of the suppression gain to obtain a sub-band suppression gain;
determining a suppression gain lower bound according to the signal-to-noise ratio state of the subband spectrum;
and calculating the sub-band domain suppression gain function according to the sub-band suppression gain and the suppression gain lower bound.
Wherein the content of the first and second substances,are respectively preset constantsIn decibels;
Wherein gamma is a threshold parameter of voice activity detection,is the spectral envelope of the speech sub-band,is the noise sub-band spectral envelope.
Determining a signal-to-noise ratio state of each frame of the subband spectrum in the following manner;
initially, if the SNR is greater than or equal toThe signal-to-noise ratio is high, if the signal-to-noise ratio is less thanAnd is not less thanIf the signal-to-noise ratio is less than the middle signal-to-noise ratio stateThen low signal-to-noise ratio state, as described aboveRespectively, the threshold values of the preset signal-to-noise ratio are respectively in decibels;
when the previous frame is shifted from the current frame to the signal-to-noise ratio state,
When the previous frame is shifted from the current frame from the signal-to-noise ratio state to the low signal-to-noise ratio state,
When the previous frame is shifted from the current frame from the low snr state to the snr state,
When the previous frame is shifted from the signal-to-noise ratio state to the high signal-to-noise ratio state from the current frame,
In the formula,. DELTA. 1 And Δ 2 Respectively, the redundancy threshold is expressed in decibels.
The envelope estimating of the subband spectrum comprises:
calculating a pseudo-modulus value of the subband spectrum;
and carrying out envelope estimation by using the pseudo modulus value.
The invention also provides a single-microphone sub-band domain noise reduction system, which is characterized in that: the system comprises an analysis filter bank, a sub-band domain single-microphone noise reduction core subsystem and a synthesis filter bank;
the analysis filter bank is used for converting the time domain signal into a sub-band spectrum signal;
the sub-band domain single microphone noise reduction core subsystem divides three operation environment states according to the signal-to-noise ratio, judges the operation environment state of the sub-band spectrum signal and revises the sub-band spectrum signal;
the synthesis filter bank is used to transform the revised subband spectral signals back to time-domain signals.
The system also comprises a pseudo-mode calculator, wherein the pseudo-mode calculator is used for receiving the sub-band spectrum signals output by the analysis filter bank, generating pseudo-mode signals according to the sub-band spectrum signals and inputting the pseudo-mode signals to the sub-band domain single microphone noise reduction core subsystem.
The sub-band domain single microphone noise reduction core subsystem comprises a voice spectrum envelope estimator, a basic noise spectrum envelope estimator, a final noise spectrum envelope estimator, a level calculator, a running environment state machine, an original suppression gain calculator and a final gain calculator;
the speech spectrum envelope estimator is used for estimating the rising parameters of the previous frame according to the pseudo-mode signalEstimating a speech sub-band spectrum envelope signal;
a basic noise estimator estimates a noise spectrum envelope level according to the pseudo-modulus signal and the speech sub-band spectrum envelope signal;
a final noise spectrum envelope estimator estimates a final noise spectrum envelope level according to the pseudo-modulus signal and the noise sub-band spectrum envelope level;
the level calculator calculates the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise according to the final noise spectrum envelope level;
the running environment state machine is used for determining a judgment threshold when the running environment state changes and judging the signal-to-noise ratio state, the lower bound of the suppression gain and the rising parameter of the current frame of the running environment of each frame of the subband spectrum according to the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise;
Calculating original noise suppression gain according to the voice subband spectrum envelope signal, the final noise spectrum envelope level and the signal-to-noise ratio state;
the final gain calculator calculates a final noise suppression gain according to a frequency domain adjacent band smoothing processing result and a suppression gain lower bound of the original noise suppression gain.
The invention transforms the input time domain signal into the sub-band spectrum signal, then estimates the envelope according to the sub-band spectrum signal, calculates the sub-band domain inhibiting gain according to the envelope, revises the sub-band spectrum signal by applying the sub-band domain inhibiting gain to obtain the revised sub-band spectrum signal, and finally inversely transforms the revised sub-band domain signal back to the time domain signal, thereby obtaining the enhanced voice signal. Compared with the prior art, the invention can save Millions of Instructions Per Second (MIPS), has lower computational complexity, is convenient for real-time implementation and is beneficial to popularization.
Drawings
Fig. 1 is a flow chart of a first preferred embodiment of the present invention.
FIG. 2 is a flow chart illustrating the implementation of the AFB algorithm in the first preferred embodiment.
Fig. 3 is a schematic flow chart of the implementation of the SFB algorithm in the first preferred embodiment.
Fig. 4 is a block diagram of the second preferred embodiment of the present invention.
FIG. 5 is a block diagram of the algorithm system of the second preferred embodiment.
Fig. 6 is a flow chart of the operation of the sub-band domain single-microphone noise reduction core subsystem in the second preferred embodiment.
Detailed Description
Step one, transforming the voice sequence containing noise to generate a subband spectrum.
In this embodiment, an Analysis Filter Bank (AFB) is used to process a noisy speech sequence to generate a noisy speech subband spectrum. Let w be the prototype low pass filter window function, which has a length of N sample points, K be the number of FFT and IFFT operations in the filter bank (where K is an even number), and M be the input time series data block (frame) length, i.e., the sampling rate (Decimation rate).
As shown in fig. 2, the AFB converts the time domain signal into a subband spectrum by the following steps:
the method comprises the following steps that firstly, the content of an analysis shift register (register _ a) in the AFB is initialized to be zero;
secondly, moving M sample data blocks into a register _ a;
thirdly, weighting the content of the register _ a by applying an analysis window function W;
fourthly, dividing the content of the register _ a into r sections, wherein each section comprises K samples, r = N/K, and r is an integer;
fifthly, overlapping the r K sample sections, and then performing Fast Fourier Transform (FFT) operation on the K samples to obtain K subband components;
sixthly, if the data input technology is finished, the AFB stops running; otherwise, skipping to the first step and repeating until the AFB stops running.
And matching, and performing inverse transformation from the sub-band spectrum to the time-domain voice sequence by adopting SFB. Compared with the common Fourier transform, the processing mode can save many times of MIPS and reduce the complexity of the algorithm.
And step two, carrying out envelope estimation on the subband spectrums.
In this step, a final gain function for suppressing noise is obtained from a sub-band spectrum signal X (k, t) of the noisy speech。
Assuming that a noise-free speech sequence is { s (n), n =0,1,2, \8230 }, a noise sequence is { v (n), n =0,1,2, \8230 }, a noise-containing speech sequence is { x (n), n =0,1,2, \8230 }, x (n) can be expressed as:
by applying AFB, the corresponding subband spectrum can be obtained as follows:
in the formula, K =0,1,2, \ 8230, K-1 is (sub-band domain) sub-band index, and K is the number of FFT points in AFB; t =0,1,2, \8230, for the index of the Signal frame, in actual calculation, in order to make a better compromise processing in terms of Mean Opinion Score (MOS) and music Noise, before estimating an envelope of a subband spectrum, an operating environment is divided according to a Signal-to-Noise ratio (SNR), and for the SNR (t) of the operating environment of the t-th frame, the state of the operating environment is specifically defined as follows:
The SNR of the operating environment in this embodiment is calculated from the ratio of the root mean square value of its speech signal modification bias to the root mean square (r.m.s.) value of its noise, and the r.m.s. of the speech signal modification bias is calculated as: for the spectral amplitude of noisy speech sub-bandThe sums of which are recursively averaged, which yields an r.m.s. value for the rough estimate of the speech signalI.e. by
WhereinIs a preset time constant. In view ofWill be biased by the background noise level, then the bias correction is performed as follows:
here, theIs the r.m.s. value of the noise, which is offset-corrected from the sum of the final subband noise spectral envelope level estimates, i.e.:
where BCF > 0 is a preset bias correction factor coefficient.
According to (16) and (17), the operating environment signal-to-noise ratio SNR (t) at t frames can be determined by the following equation:
to enable the operating environment state to be at the decision threshold T 1 And T 2 Near the situation that frequent transition does not occur due to the change of the estimation of the running environment SNR (t), the decision threshold of the running environment SNR (t) involved in the state division is carried out according to the following modes, namely:
when the previous frame is switched from the high signal-to-noise ratio state to the signal-to-noise ratio state in the current frame, the signal-to-noise ratio judgment threshold is;
When the signal-to-noise ratio state of the previous frame is changed from the signal-to-noise ratio state of the current frame to the low signal-to-noise ratio state, the signal-to-noise ratio judgment threshold is;
When the previous frame is switched from the low signal-to-noise ratio state to the signal-to-noise ratio state in the current frame, the signal-to-noise ratio judgment threshold is;
When the signal-to-noise ratio state of the previous frame is changed from the signal-to-noise ratio state of the current frame to the high signal-to-noise ratio state, the signal-to-noise ratio judgment threshold is;
In the formula,. DELTA. 1 And Δ 2 Respectively, the redundancy threshold is expressed in decibels.
The operating environment state machine estimates the operating environment SNR (t) according to equation (7) and compares the corresponding SNR thresholds in combination with equation (3) (i.e., using the SNR thresholds, respectivelyAnd/orAndand/orTo replace in formula (3)And) Thereby, an operating state value state of the environment = "high SNR state" or "medium SNR state" or "low SNR state" is obtained. This state value will be applied to the next noise suppressed processing frame.
When envelope estimation is carried out, the amplitude of the subband spectrum X (k, t) is pseudomorphicInstead, so as to reduce the complexity of operation, facilitate the engineering realization,re { X (k, t) } and Im { X (k, t) } are the real part and the imaginary part of X (k, t), respectively.
For the sub-band spectrum X (k, t) of a noisy speech signal, the speech sub-band spectrum envelope signal is first estimated using the following "fast-up-slow-down" nonlinear single-pole recursive model:
Where α is a coefficient of recursion, which is determined by:
here, theAnd slow-down parameterUsually a predetermined time constant, and a fast-rise parameterThen the operation state decision module adaptively selects different values according to the state of the operation environment, namely:
Time constant due to fast riseThe module is self-adaptively assigned according to the operation state, so that the accuracy of estimating the subband spectrum envelope of the voice signal is improved.
Next, we estimate the noise subband spectral envelope signal. The noise subband spectrum envelope estimator is composed of two parts calledThe basic noise estimator and the final noise estimator are connected in series, and the basic noise estimator adopts a slow rising-fast falling nonlinear unipolar recursive model to estimate the spectrum envelope level of the basic noise estimatorNamely:
Here, theRespectively, preset fast-fall and slow-rise time constants. TheThe level change caused by the speech disturbance can be always tracked, so that it can be usedTogether, complete the decision of VAD in the final noise estimator. The final noise estimator also adopts a non-linear single-pole recursive model of 'slow rising-fast falling' to estimate the final noise spectrum envelope levelBut its iterative update is only done when the VAD detection is false (i.e. no speech signal), i.e.:
Here, theRespectively a preset fast-falling time constant and a preset slow-rising time constant,is a preset VAD threshold parameter. In this way, the estimated lower bound can be as small as the small input amplitude without stalling of the update, and the ambient noise can be varied by any amount without affecting the convergence time.
And step three, acquiring a sub-band domain suppression gain function according to the sub-band spectrum envelope.
Based on the analysis in step two, it can be seen that the suppression gain function in each sub-band should be a function of the posterior SNR and the operating environment state in the sub-band, and in consideration of the fact that the VAD likelihood function based on energy detection is the posterior SNR, we normalize the posterior SNR to the VAD threshold, and use the normalized posterior SNR value to construct the original value of the suppression gain, that is:
where γ is the preset threshold parameter for VAD, when the A posteriori SNR (A-SNR) (A-SNR)) Greater than or equal to the threshold value γ, the VAD will indicate the presence of speech. Parameter(s)Is a gain expansion factor that controls the decay rate of the gain function for normalized a posteriori SNRs less than 1, the gain will decay linearly with normalized a posteriori SNR when p = 1.
Since the operating environment has been divided into three operating states, i.e., high, medium, and low SNR states, by SNR, and each operating state has a corresponding lower bound on the suppression gain, the lower bound on the suppression gain can be determined by the operating environment decision module as follows:
Defining an Aggressive state mode, wherein in the running state corresponding to the Aggressive state mode, a parameter p in the expression (15) should take a proper value larger than 1, otherwise, the value of p is 1.0. Normally, "aggregate" is taken as the "low SNR" state value, and then expression (15) becomes:
since the human ear cannot distinguish undulations located in the same Critical band (Critical band), but can distinguish undulations located in different Critical bands. This provides the basis for the critical band smoothing of the subband suppressed gain component. The critical bands are typically characterized in the Bark frequency scale, 1 to 24 Bark corresponding to the first 24 critical bands of hearing, with frequencies and bandwidths corresponding to their boundary points as shown in table 1.
Table 1: frequency point and bandwidth corresponding to first 24 auditory critical bands
Accordingly, the subband indexes k in the formula (17) are grouped, so that the frequencies corresponding to the subband indexes in each group are all located in the same critical band, and the subband suppression gains corresponding to each group are all replaced by the arithmetic mean value of the subband suppression gains, so that the critical band smoothing processing of the subband suppression gains is completed. The sub-band suppression gain after the critical band smoothing is recorded asThen the final gain function may be determined byThe following equation is determined:
and step four, revising the subband spectrums according to the final gain function to obtain revised subband spectrums.
And step five, synthesizing the revised sub-band spectrum and outputting the enhanced voice sequence.
As shown in fig. 3, the SFB inverts the revised subband spectrum into the speech sequence in the time domain as follows:
firstly, initializing the content of a synthesis shift register (reglist _ s) in the SFB to zero;
secondly, K time domain samples are obtained through Inverse Fast Fourier Transform (IFFT) operation of K subband spectrums;
thirdly, performing periodic topology on the K time domain samples to form r sections, and storing the r sections in a temporary register;
fourthly, weighting the temporary register by using a synthesis window function W1 (wherein the synthesis window W1 can be equal to the analysis window W);
fifthly, overlapping the content of the temporary register to reglist _ s;
sixthly, moving M samples from the register _ s to the left to obtain a new output data block;
seventhly, filling M zeros at the rightmost end in the reglist _ s;
eighthly, if the data input is finished, the SFB stops running; otherwise, jumping to the first step to repeat the above operation.
Compared with the prior art, the embodiment adopts the sub-band domain structure, and the computation saves MIPS by many times, so that the computation complexity of the algorithm is lower, and the real-time implementation on the existing commercial DSP chip is more convenient; and the operating environment is divided into three SNR states of high, medium and low, and respective suppression level is applied and respective time smoothing coefficient is selected to carry out envelope estimation in each state, so that better compromise processing is realized in the aspects of MOS and music noise, and the method has higher MOS index, lower no-perception speech distortion and lower no-perception music noise.
Preferred embodiment two
As shown in fig. 4, fig. 5, and fig. 6, the present embodiment discloses a single-microphone subband domain noise reduction system, which includes an analysis filter bank 1, a pseudo-modulus calculator 2, a subband domain single-microphone noise reduction core subsystem 3, and a synthesis filter bank 4.
The analysis filter bank 1 is used to convert the time domain signal into a subband spectral signal.
The pseudo-modulus calculator 2 is used for generating a pseudo-modulus signal according to the sub-band spectrum signal to replace the amplitude of the sub-band spectrum signal, so that the operation complexity is reduced, and the engineering implementation is easy.
And the sub-band domain single microphone noise reduction core subsystem 3 divides three operating environment states according to the signal-to-noise ratio, judges the operating environment state of the sub-band spectrum signal and revises the sub-band spectrum signal.
The sub-band domain single-microphone noise reduction core subsystem 3 includes a speech spectrum envelope estimator 31, a base noise spectrum envelope estimator 32, a final noise spectrum envelope estimator 33, a level calculator 34, a running environment state machine 35, a raw suppression gain calculator 36, and a final gain calculator 37;
the speech spectrum envelope estimator is used for estimating the rising parameters of the previous frame according to the pseudo-mode signalA speech subband spectral envelope signal is estimated.
And the basic noise estimator estimates the noise spectrum envelope level according to the pseudo-modulus signal and the speech sub-band spectrum envelope signal.
And the final noise spectrum envelope estimator estimates a final noise spectrum envelope level according to the pseudo-modulus signal and the noise sub-band spectrum envelope level.
The level calculator calculates the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise according to the final level noise spectrum envelope level.
The running environment state machine is used for determining a judgment threshold when the running environment state changes and judging the signal-to-noise ratio state, the lower bound of the suppression gain and the rising parameter of the current frame of the subband spectrum in each frame of the running environment according to the root mean square value of the rough estimation of the voice signal and the root mean square value machine of the noise。
And calculating the original suppression gain according to the voice subband spectrum envelope signal, the final noise spectrum envelope level and the signal-to-noise ratio state.
The final gain calculator calculates a final noise suppression gain according to a frequency domain adjacent band smoothing processing result and a suppression gain lower bound of the original noise suppression gain.
The working flow of the sub-band domain single microphone noise reduction core subsystem 3 is shown in fig. 6.
The synthesis filter bank 4 is used to transform the modified subband spectral signal back to a time-domain signal.
The invention transforms the input time domain signal into the sub-band spectrum signal, then estimates the envelope according to the sub-band spectrum signal, calculates the sub-band domain inhibiting gain according to the envelope, revises the sub-band spectrum signal by applying the sub-band domain inhibiting gain to obtain the revised sub-band spectrum signal, and finally inversely transforms the revised sub-band domain signal back to the time domain signal, thereby obtaining the enhanced voice signal. Compared with the prior art, the method can save MIPS by many times, has lower calculation complexity, is convenient to realize in real time, and is beneficial to popularization.
The present invention is not limited to the above preferred embodiments, and any modification, equivalent replacement or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A single-microphone subband-domain noise reduction method, comprising:
transforming the noisy speech sequence to generate a subband spectrum;
carrying out envelope estimation on the subband spectrums;
acquiring a sub-band domain suppression gain function according to the estimated envelope;
revising the subband spectrum according to the subband domain suppression gain function;
and transforming the revised sub-band spectrum into a time domain signal to obtain a noise-reduced voice sequence.
2. A single microphone subband-domain noise reduction method as claimed in claim 1, wherein the estimated envelope comprises a speech subband spectral envelope signal and a noise subband spectral envelope signal;
the envelope estimation of the subband spectrum comprises:
carrying out first nonlinear single-pole recursion on the subband spectrums to obtain speech subband spectrum envelopes;
carrying out second nonlinear single-pole recursion on the subband spectrums to obtain noise subband spectrum basic envelopes;
performing voice activity detection by using the voice sub-band spectrum envelope signal and the noise sub-band spectrum basic envelope, if the voice signal is detected, taking the noise sub-band spectrum basic envelope as a noise sub-band spectrum envelope, otherwise, performing a third nonlinear single-pole recursion on the noise sub-band spectrum basic envelope to obtain the noise sub-band spectrum envelope;
wherein an ascending parameter of the first non-linear unipolar recursion is less than a descending parameter;
the rising parameter of the second and third non-linear unipolar recursions is greater than the falling parameter.
3. The single-microphone subband-domain denoising method of claim 2, wherein the fall parameter of the first nonlinear unipolar recursion is a predetermined time constant;
rising parameter of the first non-linear unipolar recursionDetermining according to the signal-to-noise ratio state of the subband spectrum:
4. A single-microphone subband-domain noise reduction method as claimed in claim 3, wherein the deriving the subband-domain suppression gain function based on the estimated envelope comprises:
determining an original value of a suppression gain according to the signal-to-noise ratio state of the subband spectrum, the speech subband spectrum envelope and the noise subband spectrum envelope;
performing frequency domain critical band smoothing on the suppression gain original value to obtain sub-band suppression gain;
determining a suppression gain lower bound according to the signal-to-noise ratio state of the subband spectrum;
and calculating the sub-band domain suppression gain function according to the sub-band suppression gain and the suppression gain lower bound.
6. A single-microphone subband-domain denoising method as claimed in claim 5, wherein the snr state of each frame of the subband spectrum is determined as follows;
initially, if the SNR is greater than or equal toThe signal-to-noise ratio is high, if the signal-to-noise ratio is less thanAnd is not less thanIf the signal-to-noise ratio is less than the signal-to-noise ratio, the state is the signal-to-noise ratio stateThen low signal-to-noise ratio state, as described aboveRespectively, the threshold values of the preset signal-to-noise ratio are respectively in decibels;
when the previous frame is shifted from the current frame from the high snr state to the snr state,
When the previous frame is shifted from the signal-to-noise ratio state to the low signal-to-noise ratio state from the current frame,
When the previous frame is shifted from the current frame from the low snr state to the snr state,
When the previous frame is shifted from the signal-to-noise ratio state to the high signal-to-noise ratio state from the current frame,
In the formula,. DELTA. 1 And Δ 2 Respectively, the redundancy threshold is expressed in decibels.
7. A single-microphone subband-domain noise reduction method as claimed in any of claims 1 to 6, wherein the envelope estimation of the subband spectra comprises:
calculating a pseudo-modulus value of the subband spectrum;
and carrying out envelope estimation by using the pseudo modulus value.
8. A single-microphone subband-domain noise reduction system, comprising: the system comprises an analysis filter bank, a sub-band domain single-microphone noise reduction core subsystem and a synthesis filter bank;
the analysis filter bank is used for converting the time domain signal into a sub-band spectrum signal;
the sub-band domain single microphone noise reduction core subsystem divides three operation environment states according to the signal-to-noise ratio, judges the operation environment state of the sub-band spectrum signal and revises the sub-band spectrum signal;
the synthesis filter bank is used to transform the revised subband spectral signals back to time-domain signals.
9. A single-microphone subband-domain noise reduction system as claimed in claim 8, further comprising a pseudo-norm calculator for receiving the subband-spectrum signal output from the analysis filter bank, generating a pseudo-norm signal from the subband-spectrum signal, and inputting the pseudo-norm signal to the subband-domain single-microphone noise reduction core subsystem.
10. A single-microphone subband-domain noise reduction system as claimed in claim 9, wherein: the sub-band domain single microphone noise reduction core subsystem comprises a voice spectrum envelope estimator, a basic noise spectrum envelope estimator, a final noise spectrum envelope estimator, a level calculator, an operating environment state machine, an original suppression gain calculator and a final suppression gain calculator;
the speech spectrum envelope estimator is used for estimating the rising parameters of the previous frame according to the pseudo-mode signalEstimating a speech sub-band spectrum envelope signal;
a basic noise estimator estimates a noise spectrum envelope level according to the pseudo-modulus signal and the speech sub-band spectrum envelope signal;
a final noise spectrum envelope estimator estimates a final noise spectrum envelope level according to the pseudo-modulus signal and the noise sub-band spectrum envelope level;
the level calculator calculates the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise according to the final noise spectrum envelope level;
the running environment state machine is used for determining a judgment threshold when the running environment state changes and judging the signal-to-noise ratio state, the lower bound of the suppression gain and the rising parameter of the current frame of the running environment of each frame of the subband spectrum according to the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise;
An original suppression gain calculator calculates an original noise suppression gain according to the speech sub-band spectrum envelope signal, the final noise spectrum envelope level and the signal-to-noise ratio state;
and the final suppression gain calculator calculates the final noise suppression gain according to the frequency domain adjacent band smoothing processing result and the suppression gain lower bound of the original noise suppression gain.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211013301.9A CN115527550A (en) | 2022-08-23 | 2022-08-23 | Single-microphone subband domain noise reduction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211013301.9A CN115527550A (en) | 2022-08-23 | 2022-08-23 | Single-microphone subband domain noise reduction method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115527550A true CN115527550A (en) | 2022-12-27 |
Family
ID=84696850
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211013301.9A Pending CN115527550A (en) | 2022-08-23 | 2022-08-23 | Single-microphone subband domain noise reduction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115527550A (en) |
-
2022
- 2022-08-23 CN CN202211013301.9A patent/CN115527550A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8010355B2 (en) | Low complexity noise reduction method | |
KR101120679B1 (en) | Gain-constrained noise suppression | |
JP3591068B2 (en) | Noise reduction method for audio signal | |
JP5260561B2 (en) | Speech enhancement using perceptual models | |
CA2732723C (en) | Apparatus and method for processing an audio signal for speech enhancement using a feature extraction | |
CA2399706C (en) | Background noise reduction in sinusoidal based speech coding systems | |
CN108447495B (en) | Deep learning voice enhancement method based on comprehensive feature set | |
US20180366138A1 (en) | Speech Model-Based Neural Network-Assisted Signal Enhancement | |
US7957965B2 (en) | Communication system noise cancellation power signal calculation techniques | |
US7313518B2 (en) | Noise reduction method and device using two pass filtering | |
Lin et al. | Adaptive noise estimation algorithm for speech enhancement | |
CN108831499A (en) | Utilize the sound enhancement method of voice existing probability | |
US8352257B2 (en) | Spectro-temporal varying approach for speech enhancement | |
US20110125490A1 (en) | Noise suppressor and voice decoder | |
Verteletskaya et al. | Noise reduction based on modified spectral subtraction method | |
Udrea et al. | An improved spectral subtraction method for speech enhancement using a perceptual weighting filter | |
Wolfe et al. | Towards a perceptually optimal spectral amplitude estimator for audio signal enhancement | |
CN112652322A (en) | Voice signal enhancement method | |
CN112634927A (en) | Short wave channel voice enhancement method | |
Upadhyay et al. | The spectral subtractive-type algorithms for enhancing speech in noisy environments | |
Rao et al. | Speech enhancement using sub-band cross-correlation compensated Wiener filter combined with harmonic regeneration | |
CN115527550A (en) | Single-microphone subband domain noise reduction method and system | |
BR112019020491A2 (en) | apparatus and method for post-processing an audio signal using prediction-based format | |
CN114882898A (en) | Multi-channel speech signal enhancement method and apparatus, computer device and storage medium | |
Gui et al. | Adaptive subband Wiener filtering for speech enhancement using critical-band gammatone filterbank |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |