CN115527550A

CN115527550A - Single-microphone subband domain noise reduction method and system

Info

Publication number: CN115527550A
Application number: CN202211013301.9A
Authority: CN
Inventors: 梁民
Original assignee: G Net Cloud Service Co Ltd
Current assignee: G Net Cloud Service Co Ltd
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2022-12-27

Abstract

The invention discloses a method and a system for reducing noise of a single-microphone subband domain, wherein the method comprises the steps of transforming a noisy speech sequence to generate a subband spectrum; carrying out envelope estimation on the subband spectrums; acquiring a sub-band domain suppression gain function according to the estimated envelope; revising the subband spectrum according to the subband domain suppression gain function to obtain a revised subband spectrum; synthesizing the revised subband spectrum and outputting the enhanced voice sequence. The system realizes each module of corresponding function. The invention transforms the input time domain signal into the sub-band spectrum signal, then estimates the envelope according to the sub-band spectrum signal, calculates the sub-band domain inhibiting gain, applies the sub-band domain inhibiting gain to revise the sub-band spectrum signal to obtain the revised sub-band spectrum signal, and finally inversely transforms the revised sub-band domain signal back to the time domain signal, thereby obtaining the enhanced voice signal. Compared with the prior art, the method can save millions of instructions per second by many times, has lower calculation complexity, is convenient to realize in real time, and is beneficial to popularization.

Description

Single-microphone subband domain noise reduction method and system

Technical Field

The invention belongs to the technical field of communication noise reduction, and particularly relates to a single-microphone sub-band domain noise reduction method and system.

Background

The conference communication system operates in an unusually large and complex environment, and the voice signal picked up by the client microphone often contains environmental noise or interference, the existence of which seriously affects the conference call quality.

When ambient noise is mixed into a picked-up voice signal through a microphone, subjective quality of voice communication can be degraded even if the level of the noise is low or moderate. Hearing tests have shown that in situations where the signal-to-noise ratio is low, people cannot tolerate or even pay no attention to the noisy speech signal they hear, a phenomenon known as hearing fatigue; in particular, the intelligibility of speech will be affected when the signal-to-noise ratio is less than 10 dB. Even low levels of noise can present problems, especially when combining multiple voice channels in a conference or bridge. In a multi-party or multi-point conference communication, the background noise present at the microphone of each point of the conference is additively combined at the bridge with the noise processes from all other points. Thus, the loudspeaker at each location of the conference will reproduce the sum of the noise processes from all other locations. This problem becomes more and more severe as the number of meeting points increases. It is therefore desirable to perform noise reduction processing on the speech signal picked up by the microphone in order to improve the subjective quality of speech and to reduce the degree of degradation in perceived quality of speech communication due to listener fatigue.

At present, the noise suppression method based on a single microphone mainly comprises a classical wiener filtering technology, a dynamic comb filtering technology, a dynamic and linear all-pole and zero-pole modeling technology of voice, a short-time spectrum correction technology and a hidden Markov modeling technology. However, these techniques have certain drawbacks.

For example, linear filters in dynamic comb filtering techniques are adapted to pass only harmonic components of voiced sounds derived from the pitch period; the coefficients of the noise-free model in the dynamic, linear all-pole and zero-pole modeling techniques of speech are estimated from noisy speech; in the short-time spectrum correction technology, the amplitude of short-time Fourier transform is attenuated at the frequency without voice; hidden markov modeling techniques employ time-varying models of speech, but the evolution of the model coefficients is controlled by transition probabilities associated with the model states. The noise reduction technical methods are all operated in single-channel noisy speech, belong to the blind technical category, and only know input speech signals containing noise.

In order to enhance the signal-to-noise ratio of a noisy speech signal, the prior art needs to perform bootstrap estimation on noise and a speech signal respectively, estimate the signal-to-noise ratio of the noisy speech signal by using the assumptions of the intermittency of speech and the stationarity of noise, and perform noise suppression accordingly. This approach in practice can produce musical noise and speech distortion, especially in noisy scenes with non-stationary and strong interference levels and in far-field conditions.

Disclosure of Invention

The invention aims to provide a single-microphone sub-band domain noise reduction method and a single-microphone sub-band domain noise reduction system which reduce algorithm complexity and are easy to realize in real time.

The invention provides a single-microphone subband domain noise reduction method, which comprises the following steps:

transforming the noisy speech sequence to generate a subband spectrum;

carrying out envelope estimation on the subband spectrums;

acquiring a sub-band domain suppression gain function according to the estimated envelope;

revising the subband spectrum according to the subband domain suppression gain function;

and transforming the revised sub-band spectrum into a time domain signal to obtain a noise-reduced voice sequence.

The estimated envelope comprises a speech sub-band spectral envelope signal and a noise sub-band spectral envelope signal;

the envelope estimation of the subband spectrum comprises:

carrying out first nonlinear single-pole recursion on the subband spectrums to obtain speech subband spectrum envelopes;

performing a second nonlinear unipolar recursion on the subband spectrums to obtain noise subband spectrum basic envelopes;

performing voice activity detection by using the voice sub-band spectrum envelope signal and the noise sub-band spectrum basic envelope, if the voice signal is detected, taking the noise sub-band spectrum basic envelope as a noise sub-band spectrum envelope, otherwise, performing a third nonlinear single-pole recursion on the noise sub-band spectrum basic envelope to obtain the noise sub-band spectrum envelope;

wherein an ascending parameter of the first non-linear unipolar recursion is less than a descending parameter;

the rising parameter of the second and third non-linear recursions is greater than the falling parameter.

The descending parameter of the first nonlinear monopole recursion is a preset time constant;

rising parameter of the first nonlinear unipolar recurrence

Determining according to the signal-to-noise ratio state of the subband spectrum:

wherein the content of the first and second substances,

，

is a preset time constant.

The obtaining of the subband-domain suppression gain function according to the estimated envelope comprises:

determining an original value of a suppression gain according to the signal-to-noise ratio state of the subband spectrum, the speech subband spectrum envelope and the noise subband spectrum envelope;

performing frequency domain adjacent band smoothing on the original value of the suppression gain to obtain a sub-band suppression gain;

determining a suppression gain lower bound according to the signal-to-noise ratio state of the subband spectrum;

and calculating the sub-band domain suppression gain function according to the sub-band suppression gain and the suppression gain lower bound.

Determining the lower bound on the suppression gain as follows

：

Wherein the content of the first and second substances,

are respectively preset constantsIn decibels;

determining the suppression gain original value in the following manner

：

Wherein gamma is a threshold parameter of voice activity detection,

is the spectral envelope of the speech sub-band,

is the noise sub-band spectral envelope.

Determining a signal-to-noise ratio state of each frame of the subband spectrum in the following manner;

initially, if the SNR is greater than or equal to

The signal-to-noise ratio is high, if the signal-to-noise ratio is less than

And is not less than

If the signal-to-noise ratio is less than the middle signal-to-noise ratio state

Then low signal-to-noise ratio state, as described above

Respectively, the threshold values of the preset signal-to-noise ratio are respectively in decibels;

when the previous frame is shifted from the current frame to the signal-to-noise ratio state,

the signal-to-noise ratio decision threshold is

；

When the previous frame is shifted from the current frame from the signal-to-noise ratio state to the low signal-to-noise ratio state,

the signal-to-noise ratio decision threshold is

；

When the previous frame is shifted from the current frame from the low snr state to the snr state,

the signal-to-noise ratio decision threshold is

；

When the previous frame is shifted from the signal-to-noise ratio state to the high signal-to-noise ratio state from the current frame,

the signal-to-noise ratio decision threshold is

；

In the formula,. DELTA. ₁ And Δ ₂ Respectively, the redundancy threshold is expressed in decibels.

The envelope estimating of the subband spectrum comprises:

calculating a pseudo-modulus value of the subband spectrum;

and carrying out envelope estimation by using the pseudo modulus value.

The invention also provides a single-microphone sub-band domain noise reduction system, which is characterized in that: the system comprises an analysis filter bank, a sub-band domain single-microphone noise reduction core subsystem and a synthesis filter bank;

the analysis filter bank is used for converting the time domain signal into a sub-band spectrum signal;

the sub-band domain single microphone noise reduction core subsystem divides three operation environment states according to the signal-to-noise ratio, judges the operation environment state of the sub-band spectrum signal and revises the sub-band spectrum signal;

the synthesis filter bank is used to transform the revised subband spectral signals back to time-domain signals.

The system also comprises a pseudo-mode calculator, wherein the pseudo-mode calculator is used for receiving the sub-band spectrum signals output by the analysis filter bank, generating pseudo-mode signals according to the sub-band spectrum signals and inputting the pseudo-mode signals to the sub-band domain single microphone noise reduction core subsystem.

The sub-band domain single microphone noise reduction core subsystem comprises a voice spectrum envelope estimator, a basic noise spectrum envelope estimator, a final noise spectrum envelope estimator, a level calculator, a running environment state machine, an original suppression gain calculator and a final gain calculator;

the speech spectrum envelope estimator is used for estimating the rising parameters of the previous frame according to the pseudo-mode signal

Estimating a speech sub-band spectrum envelope signal;

a basic noise estimator estimates a noise spectrum envelope level according to the pseudo-modulus signal and the speech sub-band spectrum envelope signal;

a final noise spectrum envelope estimator estimates a final noise spectrum envelope level according to the pseudo-modulus signal and the noise sub-band spectrum envelope level;

the level calculator calculates the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise according to the final noise spectrum envelope level;

the running environment state machine is used for determining a judgment threshold when the running environment state changes and judging the signal-to-noise ratio state, the lower bound of the suppression gain and the rising parameter of the current frame of the running environment of each frame of the subband spectrum according to the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise

；

Calculating original noise suppression gain according to the voice subband spectrum envelope signal, the final noise spectrum envelope level and the signal-to-noise ratio state;

the final gain calculator calculates a final noise suppression gain according to a frequency domain adjacent band smoothing processing result and a suppression gain lower bound of the original noise suppression gain.

The invention transforms the input time domain signal into the sub-band spectrum signal, then estimates the envelope according to the sub-band spectrum signal, calculates the sub-band domain inhibiting gain according to the envelope, revises the sub-band spectrum signal by applying the sub-band domain inhibiting gain to obtain the revised sub-band spectrum signal, and finally inversely transforms the revised sub-band domain signal back to the time domain signal, thereby obtaining the enhanced voice signal. Compared with the prior art, the invention can save Millions of Instructions Per Second (MIPS), has lower computational complexity, is convenient for real-time implementation and is beneficial to popularization.

Drawings

Fig. 1 is a flow chart of a first preferred embodiment of the present invention.

FIG. 2 is a flow chart illustrating the implementation of the AFB algorithm in the first preferred embodiment.

Fig. 3 is a schematic flow chart of the implementation of the SFB algorithm in the first preferred embodiment.

Fig. 4 is a block diagram of the second preferred embodiment of the present invention.

FIG. 5 is a block diagram of the algorithm system of the second preferred embodiment.

Fig. 6 is a flow chart of the operation of the sub-band domain single-microphone noise reduction core subsystem in the second preferred embodiment.

Detailed Description

Step one, transforming the voice sequence containing noise to generate a subband spectrum.

In this embodiment, an Analysis Filter Bank (AFB) is used to process a noisy speech sequence to generate a noisy speech subband spectrum. Let w be the prototype low pass filter window function, which has a length of N sample points, K be the number of FFT and IFFT operations in the filter bank (where K is an even number), and M be the input time series data block (frame) length, i.e., the sampling rate (Decimation rate).

As shown in fig. 2, the AFB converts the time domain signal into a subband spectrum by the following steps:

the method comprises the following steps that firstly, the content of an analysis shift register (register _ a) in the AFB is initialized to be zero;

secondly, moving M sample data blocks into a register _ a;

thirdly, weighting the content of the register _ a by applying an analysis window function W;

fourthly, dividing the content of the register _ a into r sections, wherein each section comprises K samples, r = N/K, and r is an integer;

fifthly, overlapping the r K sample sections, and then performing Fast Fourier Transform (FFT) operation on the K samples to obtain K subband components;

sixthly, if the data input technology is finished, the AFB stops running; otherwise, skipping to the first step and repeating until the AFB stops running.

And matching, and performing inverse transformation from the sub-band spectrum to the time-domain voice sequence by adopting SFB. Compared with the common Fourier transform, the processing mode can save many times of MIPS and reduce the complexity of the algorithm.

And step two, carrying out envelope estimation on the subband spectrums.

In this step, a final gain function for suppressing noise is obtained from a sub-band spectrum signal X (k, t) of the noisy speech

。

Assuming that a noise-free speech sequence is { s (n), n =0,1,2, \8230 }, a noise sequence is { v (n), n =0,1,2, \8230 }, a noise-containing speech sequence is { x (n), n =0,1,2, \8230 }, x (n) can be expressed as:

（1）

by applying AFB, the corresponding subband spectrum can be obtained as follows:

（2）

in the formula, K =0,1,2, \ 8230, K-1 is (sub-band domain) sub-band index, and K is the number of FFT points in AFB; t =0,1,2, \8230, for the index of the Signal frame, in actual calculation, in order to make a better compromise processing in terms of Mean Opinion Score (MOS) and music Noise, before estimating an envelope of a subband spectrum, an operating environment is divided according to a Signal-to-Noise ratio (SNR), and for the SNR (t) of the operating environment of the t-th frame, the state of the operating environment is specifically defined as follows:

（3）

wherein

Respectively, threshold values of SNR in dB.

The SNR of the operating environment in this embodiment is calculated from the ratio of the root mean square value of its speech signal modification bias to the root mean square (r.m.s.) value of its noise, and the r.m.s. of the speech signal modification bias is calculated as: for the spectral amplitude of noisy speech sub-band

The sums of which are recursively averaged, which yields an r.m.s. value for the rough estimate of the speech signal

I.e. by

（4）

Wherein

Is a preset time constant. In view of

Will be biased by the background noise level, then the bias correction is performed as follows:

（5）

here, the

Is the r.m.s. value of the noise, which is offset-corrected from the sum of the final subband noise spectral envelope level estimates, i.e.:

（6）

where BCF > 0 is a preset bias correction factor coefficient.

According to (16) and (17), the operating environment signal-to-noise ratio SNR (t) at t frames can be determined by the following equation:

（7）

to enable the operating environment state to be at the decision threshold T ₁ And T ₂ Near the situation that frequent transition does not occur due to the change of the estimation of the running environment SNR (t), the decision threshold of the running environment SNR (t) involved in the state division is carried out according to the following modes, namely:

when the previous frame is switched from the high signal-to-noise ratio state to the signal-to-noise ratio state in the current frame, the signal-to-noise ratio judgment threshold is

；

When the signal-to-noise ratio state of the previous frame is changed from the signal-to-noise ratio state of the current frame to the low signal-to-noise ratio state, the signal-to-noise ratio judgment threshold is

；

When the previous frame is switched from the low signal-to-noise ratio state to the signal-to-noise ratio state in the current frame, the signal-to-noise ratio judgment threshold is

；

When the signal-to-noise ratio state of the previous frame is changed from the signal-to-noise ratio state of the current frame to the high signal-to-noise ratio state, the signal-to-noise ratio judgment threshold is

；

The operating environment state machine estimates the operating environment SNR (t) according to equation (7) and compares the corresponding SNR thresholds in combination with equation (3) (i.e., using the SNR thresholds, respectively

And/or

And

and/or

To replace in formula (3)

And

) Thereby, an operating state value state of the environment = "high SNR state" or "medium SNR state" or "low SNR state" is obtained. This state value will be applied to the next noise suppressed processing frame.

When envelope estimation is carried out, the amplitude of the subband spectrum X (k, t) is pseudomorphic

Instead, so as to reduce the complexity of operation, facilitate the engineering realization,

re { X (k, t) } and Im { X (k, t) } are the real part and the imaginary part of X (k, t), respectively.

For the sub-band spectrum X (k, t) of a noisy speech signal, the speech sub-band spectrum envelope signal is first estimated using the following "fast-up-slow-down" nonlinear single-pole recursive model

：

（8）

Where α is a coefficient of recursion, which is determined by:

（9）

here, the

And slow-down parameter

Usually a predetermined time constant, and a fast-rise parameter

Then the operation state decision module adaptively selects different values according to the state of the operation environment, namely:

（10）

wherein

Three preset fast-rise time constants respectively.

Time constant due to fast rise

The module is self-adaptively assigned according to the operation state, so that the accuracy of estimating the subband spectrum envelope of the voice signal is improved.

Next, we estimate the noise subband spectral envelope signal. The noise subband spectrum envelope estimator is composed of two parts calledThe basic noise estimator and the final noise estimator are connected in series, and the basic noise estimator adopts a slow rising-fast falling nonlinear unipolar recursive model to estimate the spectrum envelope level of the basic noise estimator

Namely:

（11）

wherein

（12）

Here, the

Respectively, preset fast-fall and slow-rise time constants. The

The level change caused by the speech disturbance can be always tracked, so that it can be used

Together, complete the decision of VAD in the final noise estimator. The final noise estimator also adopts a non-linear single-pole recursive model of 'slow rising-fast falling' to estimate the final noise spectrum envelope level

But its iterative update is only done when the VAD detection is false (i.e. no speech signal), i.e.:

（13）

wherein

（14）

Here, the

Respectively a preset fast-falling time constant and a preset slow-rising time constant,

is a preset VAD threshold parameter. In this way, the estimated lower bound can be as small as the small input amplitude without stalling of the update, and the ambient noise can be varied by any amount without affecting the convergence time.

And step three, acquiring a sub-band domain suppression gain function according to the sub-band spectrum envelope.

Based on the analysis in step two, it can be seen that the suppression gain function in each sub-band should be a function of the posterior SNR and the operating environment state in the sub-band, and in consideration of the fact that the VAD likelihood function based on energy detection is the posterior SNR, we normalize the posterior SNR to the VAD threshold, and use the normalized posterior SNR value to construct the original value of the suppression gain, that is:

（15）

where γ is the preset threshold parameter for VAD, when the A posteriori SNR (A-SNR) (A-SNR)

) Greater than or equal to the threshold value γ, the VAD will indicate the presence of speech. Parameter(s)

Is a gain expansion factor that controls the decay rate of the gain function for normalized a posteriori SNRs less than 1, the gain will decay linearly with normalized a posteriori SNR when p = 1.

Since the operating environment has been divided into three operating states, i.e., high, medium, and low SNR states, by SNR, and each operating state has a corresponding lower bound on the suppression gain, the lower bound on the suppression gain can be determined by the operating environment decision module as follows:

（16）

wherein

Respectively, a predetermined constant in dB.

Defining an Aggressive state mode, wherein in the running state corresponding to the Aggressive state mode, a parameter p in the expression (15) should take a proper value larger than 1, otherwise, the value of p is 1.0. Normally, "aggregate" is taken as the "low SNR" state value, and then expression (15) becomes:

（17）。

since the human ear cannot distinguish undulations located in the same Critical band (Critical band), but can distinguish undulations located in different Critical bands. This provides the basis for the critical band smoothing of the subband suppressed gain component. The critical bands are typically characterized in the Bark frequency scale, 1 to 24 Bark corresponding to the first 24 critical bands of hearing, with frequencies and bandwidths corresponding to their boundary points as shown in table 1.

Table 1: frequency point and bandwidth corresponding to first 24 auditory critical bands

Accordingly, the subband indexes k in the formula (17) are grouped, so that the frequencies corresponding to the subband indexes in each group are all located in the same critical band, and the subband suppression gains corresponding to each group are all replaced by the arithmetic mean value of the subband suppression gains, so that the critical band smoothing processing of the subband suppression gains is completed. The sub-band suppression gain after the critical band smoothing is recorded as

Then the final gain function may be determined byThe following equation is determined:

（18）。

and step four, revising the subband spectrums according to the final gain function to obtain revised subband spectrums.

Revised subband spectrum Y (k, t) =

（19）。

And step five, synthesizing the revised sub-band spectrum and outputting the enhanced voice sequence.

As shown in fig. 3, the SFB inverts the revised subband spectrum into the speech sequence in the time domain as follows:

firstly, initializing the content of a synthesis shift register (reglist _ s) in the SFB to zero;

secondly, K time domain samples are obtained through Inverse Fast Fourier Transform (IFFT) operation of K subband spectrums;

thirdly, performing periodic topology on the K time domain samples to form r sections, and storing the r sections in a temporary register;

fourthly, weighting the temporary register by using a synthesis window function W1 (wherein the synthesis window W1 can be equal to the analysis window W);

fifthly, overlapping the content of the temporary register to reglist _ s;

sixthly, moving M samples from the register _ s to the left to obtain a new output data block;

seventhly, filling M zeros at the rightmost end in the reglist _ s;

eighthly, if the data input is finished, the SFB stops running; otherwise, jumping to the first step to repeat the above operation.

Compared with the prior art, the embodiment adopts the sub-band domain structure, and the computation saves MIPS by many times, so that the computation complexity of the algorithm is lower, and the real-time implementation on the existing commercial DSP chip is more convenient; and the operating environment is divided into three SNR states of high, medium and low, and respective suppression level is applied and respective time smoothing coefficient is selected to carry out envelope estimation in each state, so that better compromise processing is realized in the aspects of MOS and music noise, and the method has higher MOS index, lower no-perception speech distortion and lower no-perception music noise.

Preferred embodiment two

As shown in fig. 4, fig. 5, and fig. 6, the present embodiment discloses a single-microphone subband domain noise reduction system, which includes an analysis filter bank 1, a pseudo-modulus calculator 2, a subband domain single-microphone noise reduction core subsystem 3, and a synthesis filter bank 4.

The analysis filter bank 1 is used to convert the time domain signal into a subband spectral signal.

The pseudo-modulus calculator 2 is used for generating a pseudo-modulus signal according to the sub-band spectrum signal to replace the amplitude of the sub-band spectrum signal, so that the operation complexity is reduced, and the engineering implementation is easy.

And the sub-band domain single microphone noise reduction core subsystem 3 divides three operating environment states according to the signal-to-noise ratio, judges the operating environment state of the sub-band spectrum signal and revises the sub-band spectrum signal.

The sub-band domain single-microphone noise reduction core subsystem 3 includes a speech spectrum envelope estimator 31, a base noise spectrum envelope estimator 32, a final noise spectrum envelope estimator 33, a level calculator 34, a running environment state machine 35, a raw suppression gain calculator 36, and a final gain calculator 37;

A speech subband spectral envelope signal is estimated.

And the basic noise estimator estimates the noise spectrum envelope level according to the pseudo-modulus signal and the speech sub-band spectrum envelope signal.

And the final noise spectrum envelope estimator estimates a final noise spectrum envelope level according to the pseudo-modulus signal and the noise sub-band spectrum envelope level.

The level calculator calculates the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise according to the final level noise spectrum envelope level.

The running environment state machine is used for determining a judgment threshold when the running environment state changes and judging the signal-to-noise ratio state, the lower bound of the suppression gain and the rising parameter of the current frame of the subband spectrum in each frame of the running environment according to the root mean square value of the rough estimation of the voice signal and the root mean square value machine of the noise

。

And calculating the original suppression gain according to the voice subband spectrum envelope signal, the final noise spectrum envelope level and the signal-to-noise ratio state.

The working flow of the sub-band domain single microphone noise reduction core subsystem 3 is shown in fig. 6.

The synthesis filter bank 4 is used to transform the modified subband spectral signal back to a time-domain signal.

The invention transforms the input time domain signal into the sub-band spectrum signal, then estimates the envelope according to the sub-band spectrum signal, calculates the sub-band domain inhibiting gain according to the envelope, revises the sub-band spectrum signal by applying the sub-band domain inhibiting gain to obtain the revised sub-band spectrum signal, and finally inversely transforms the revised sub-band domain signal back to the time domain signal, thereby obtaining the enhanced voice signal. Compared with the prior art, the method can save MIPS by many times, has lower calculation complexity, is convenient to realize in real time, and is beneficial to popularization.

The present invention is not limited to the above preferred embodiments, and any modification, equivalent replacement or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A single-microphone subband-domain noise reduction method, comprising:

transforming the noisy speech sequence to generate a subband spectrum;

carrying out envelope estimation on the subband spectrums;

2. A single microphone subband-domain noise reduction method as claimed in claim 1, wherein the estimated envelope comprises a speech subband spectral envelope signal and a noise subband spectral envelope signal;

the envelope estimation of the subband spectrum comprises:

carrying out second nonlinear single-pole recursion on the subband spectrums to obtain noise subband spectrum basic envelopes;

the rising parameter of the second and third non-linear unipolar recursions is greater than the falling parameter.

3. The single-microphone subband-domain denoising method of claim 2, wherein the fall parameter of the first nonlinear unipolar recursion is a predetermined time constant;

rising parameter of the first non-linear unipolar recursion

wherein the content of the first and second substances,

，

is a preset time constant.

4. A single-microphone subband-domain noise reduction method as claimed in claim 3, wherein the deriving the subband-domain suppression gain function based on the estimated envelope comprises:

performing frequency domain critical band smoothing on the suppression gain original value to obtain sub-band suppression gain;

5. The method of claim 4, wherein the lower bound of suppression gain is determined as follows

：

Wherein the content of the first and second substances,

are each preSetting a constant in decibels;

the suppression gain original value is determined in the following manner

：

Wherein gamma is a threshold parameter of voice activity detection,

is the envelope of the speech sub-band spectrum,

is the noise sub-band spectral envelope.

6. A single-microphone subband-domain denoising method as claimed in claim 5, wherein the snr state of each frame of the subband spectrum is determined as follows;

initially, if the SNR is greater than or equal to

The signal-to-noise ratio is high, if the signal-to-noise ratio is less than

And is not less than

If the signal-to-noise ratio is less than the signal-to-noise ratio, the state is the signal-to-noise ratio state

Then low signal-to-noise ratio state, as described above

when the previous frame is shifted from the current frame from the high snr state to the snr state,

the signal-to-noise ratio decision threshold is

；

When the previous frame is shifted from the signal-to-noise ratio state to the low signal-to-noise ratio state from the current frame,

the signal-to-noise ratio decision threshold is

；

the signal-to-noise ratio decision threshold is

；

the signal-to-noise ratio decision threshold is

；

7. A single-microphone subband-domain noise reduction method as claimed in any of claims 1 to 6, wherein the envelope estimation of the subband spectra comprises:

calculating a pseudo-modulus value of the subband spectrum;

and carrying out envelope estimation by using the pseudo modulus value.

8. A single-microphone subband-domain noise reduction system, comprising: the system comprises an analysis filter bank, a sub-band domain single-microphone noise reduction core subsystem and a synthesis filter bank;

9. A single-microphone subband-domain noise reduction system as claimed in claim 8, further comprising a pseudo-norm calculator for receiving the subband-spectrum signal output from the analysis filter bank, generating a pseudo-norm signal from the subband-spectrum signal, and inputting the pseudo-norm signal to the subband-domain single-microphone noise reduction core subsystem.

10. A single-microphone subband-domain noise reduction system as claimed in claim 9, wherein: the sub-band domain single microphone noise reduction core subsystem comprises a voice spectrum envelope estimator, a basic noise spectrum envelope estimator, a final noise spectrum envelope estimator, a level calculator, an operating environment state machine, an original suppression gain calculator and a final suppression gain calculator;

Estimating a speech sub-band spectrum envelope signal;

；

An original suppression gain calculator calculates an original noise suppression gain according to the speech sub-band spectrum envelope signal, the final noise spectrum envelope level and the signal-to-noise ratio state;

and the final suppression gain calculator calculates the final noise suppression gain according to the frequency domain adjacent band smoothing processing result and the suppression gain lower bound of the original noise suppression gain.