CN115527550A - Single-microphone subband domain noise reduction method and system - Google Patents

Single-microphone subband domain noise reduction method and system Download PDF

Info

Publication number
CN115527550A
CN115527550A CN202211013301.9A CN202211013301A CN115527550A CN 115527550 A CN115527550 A CN 115527550A CN 202211013301 A CN202211013301 A CN 202211013301A CN 115527550 A CN115527550 A CN 115527550A
Authority
CN
China
Prior art keywords
signal
noise
subband
sub
spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211013301.9A
Other languages
Chinese (zh)
Inventor
梁民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
G Net Cloud Service Co Ltd
Original Assignee
G Net Cloud Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by G Net Cloud Service Co Ltd filed Critical G Net Cloud Service Co Ltd
Priority to CN202211013301.9A priority Critical patent/CN115527550A/en
Publication of CN115527550A publication Critical patent/CN115527550A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02163Only one microphone

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a method and a system for reducing noise of a single-microphone subband domain, wherein the method comprises the steps of transforming a noisy speech sequence to generate a subband spectrum; carrying out envelope estimation on the subband spectrums; acquiring a sub-band domain suppression gain function according to the estimated envelope; revising the subband spectrum according to the subband domain suppression gain function to obtain a revised subband spectrum; synthesizing the revised subband spectrum and outputting the enhanced voice sequence. The system realizes each module of corresponding function. The invention transforms the input time domain signal into the sub-band spectrum signal, then estimates the envelope according to the sub-band spectrum signal, calculates the sub-band domain inhibiting gain, applies the sub-band domain inhibiting gain to revise the sub-band spectrum signal to obtain the revised sub-band spectrum signal, and finally inversely transforms the revised sub-band domain signal back to the time domain signal, thereby obtaining the enhanced voice signal. Compared with the prior art, the method can save millions of instructions per second by many times, has lower calculation complexity, is convenient to realize in real time, and is beneficial to popularization.

Description

Single-microphone subband domain noise reduction method and system
Technical Field
The invention belongs to the technical field of communication noise reduction, and particularly relates to a single-microphone sub-band domain noise reduction method and system.
Background
The conference communication system operates in an unusually large and complex environment, and the voice signal picked up by the client microphone often contains environmental noise or interference, the existence of which seriously affects the conference call quality.
When ambient noise is mixed into a picked-up voice signal through a microphone, subjective quality of voice communication can be degraded even if the level of the noise is low or moderate. Hearing tests have shown that in situations where the signal-to-noise ratio is low, people cannot tolerate or even pay no attention to the noisy speech signal they hear, a phenomenon known as hearing fatigue; in particular, the intelligibility of speech will be affected when the signal-to-noise ratio is less than 10 dB. Even low levels of noise can present problems, especially when combining multiple voice channels in a conference or bridge. In a multi-party or multi-point conference communication, the background noise present at the microphone of each point of the conference is additively combined at the bridge with the noise processes from all other points. Thus, the loudspeaker at each location of the conference will reproduce the sum of the noise processes from all other locations. This problem becomes more and more severe as the number of meeting points increases. It is therefore desirable to perform noise reduction processing on the speech signal picked up by the microphone in order to improve the subjective quality of speech and to reduce the degree of degradation in perceived quality of speech communication due to listener fatigue.
At present, the noise suppression method based on a single microphone mainly comprises a classical wiener filtering technology, a dynamic comb filtering technology, a dynamic and linear all-pole and zero-pole modeling technology of voice, a short-time spectrum correction technology and a hidden Markov modeling technology. However, these techniques have certain drawbacks.
For example, linear filters in dynamic comb filtering techniques are adapted to pass only harmonic components of voiced sounds derived from the pitch period; the coefficients of the noise-free model in the dynamic, linear all-pole and zero-pole modeling techniques of speech are estimated from noisy speech; in the short-time spectrum correction technology, the amplitude of short-time Fourier transform is attenuated at the frequency without voice; hidden markov modeling techniques employ time-varying models of speech, but the evolution of the model coefficients is controlled by transition probabilities associated with the model states. The noise reduction technical methods are all operated in single-channel noisy speech, belong to the blind technical category, and only know input speech signals containing noise.
In order to enhance the signal-to-noise ratio of a noisy speech signal, the prior art needs to perform bootstrap estimation on noise and a speech signal respectively, estimate the signal-to-noise ratio of the noisy speech signal by using the assumptions of the intermittency of speech and the stationarity of noise, and perform noise suppression accordingly. This approach in practice can produce musical noise and speech distortion, especially in noisy scenes with non-stationary and strong interference levels and in far-field conditions.
Disclosure of Invention
The invention aims to provide a single-microphone sub-band domain noise reduction method and a single-microphone sub-band domain noise reduction system which reduce algorithm complexity and are easy to realize in real time.
The invention provides a single-microphone subband domain noise reduction method, which comprises the following steps:
transforming the noisy speech sequence to generate a subband spectrum;
carrying out envelope estimation on the subband spectrums;
acquiring a sub-band domain suppression gain function according to the estimated envelope;
revising the subband spectrum according to the subband domain suppression gain function;
and transforming the revised sub-band spectrum into a time domain signal to obtain a noise-reduced voice sequence.
The estimated envelope comprises a speech sub-band spectral envelope signal and a noise sub-band spectral envelope signal;
the envelope estimation of the subband spectrum comprises:
carrying out first nonlinear single-pole recursion on the subband spectrums to obtain speech subband spectrum envelopes;
performing a second nonlinear unipolar recursion on the subband spectrums to obtain noise subband spectrum basic envelopes;
performing voice activity detection by using the voice sub-band spectrum envelope signal and the noise sub-band spectrum basic envelope, if the voice signal is detected, taking the noise sub-band spectrum basic envelope as a noise sub-band spectrum envelope, otherwise, performing a third nonlinear single-pole recursion on the noise sub-band spectrum basic envelope to obtain the noise sub-band spectrum envelope;
wherein an ascending parameter of the first non-linear unipolar recursion is less than a descending parameter;
the rising parameter of the second and third non-linear recursions is greater than the falling parameter.
The descending parameter of the first nonlinear monopole recursion is a preset time constant;
rising parameter of the first nonlinear unipolar recurrence
Figure 468489DEST_PATH_IMAGE001
Determining according to the signal-to-noise ratio state of the subband spectrum:
Figure 775973DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 869831DEST_PATH_IMAGE003
Figure 807831DEST_PATH_IMAGE004
is a preset time constant.
The obtaining of the subband-domain suppression gain function according to the estimated envelope comprises:
determining an original value of a suppression gain according to the signal-to-noise ratio state of the subband spectrum, the speech subband spectrum envelope and the noise subband spectrum envelope;
performing frequency domain adjacent band smoothing on the original value of the suppression gain to obtain a sub-band suppression gain;
determining a suppression gain lower bound according to the signal-to-noise ratio state of the subband spectrum;
and calculating the sub-band domain suppression gain function according to the sub-band suppression gain and the suppression gain lower bound.
Determining the lower bound on the suppression gain as follows
Figure 947825DEST_PATH_IMAGE005
Figure 324580DEST_PATH_IMAGE006
Wherein the content of the first and second substances,
Figure 473277DEST_PATH_IMAGE007
are respectively preset constantsIn decibels;
determining the suppression gain original value in the following manner
Figure 644496DEST_PATH_IMAGE008
Figure 6207DEST_PATH_IMAGE009
Wherein gamma is a threshold parameter of voice activity detection,
Figure 655494DEST_PATH_IMAGE010
is the spectral envelope of the speech sub-band,
Figure 723944DEST_PATH_IMAGE011
is the noise sub-band spectral envelope.
Determining a signal-to-noise ratio state of each frame of the subband spectrum in the following manner;
initially, if the SNR is greater than or equal to
Figure 269326DEST_PATH_IMAGE012
The signal-to-noise ratio is high, if the signal-to-noise ratio is less than
Figure 118333DEST_PATH_IMAGE012
And is not less than
Figure 571311DEST_PATH_IMAGE013
If the signal-to-noise ratio is less than the middle signal-to-noise ratio state
Figure 963110DEST_PATH_IMAGE013
Then low signal-to-noise ratio state, as described above
Figure 538447DEST_PATH_IMAGE014
Respectively, the threshold values of the preset signal-to-noise ratio are respectively in decibels;
when the previous frame is shifted from the current frame to the signal-to-noise ratio state,
the signal-to-noise ratio decision threshold is
Figure 15696DEST_PATH_IMAGE015
When the previous frame is shifted from the current frame from the signal-to-noise ratio state to the low signal-to-noise ratio state,
the signal-to-noise ratio decision threshold is
Figure 287014DEST_PATH_IMAGE016
When the previous frame is shifted from the current frame from the low snr state to the snr state,
the signal-to-noise ratio decision threshold is
Figure 657952DEST_PATH_IMAGE017
When the previous frame is shifted from the signal-to-noise ratio state to the high signal-to-noise ratio state from the current frame,
the signal-to-noise ratio decision threshold is
Figure 545137DEST_PATH_IMAGE018
In the formula,. DELTA. 1 And Δ 2 Respectively, the redundancy threshold is expressed in decibels.
The envelope estimating of the subband spectrum comprises:
calculating a pseudo-modulus value of the subband spectrum;
and carrying out envelope estimation by using the pseudo modulus value.
The invention also provides a single-microphone sub-band domain noise reduction system, which is characterized in that: the system comprises an analysis filter bank, a sub-band domain single-microphone noise reduction core subsystem and a synthesis filter bank;
the analysis filter bank is used for converting the time domain signal into a sub-band spectrum signal;
the sub-band domain single microphone noise reduction core subsystem divides three operation environment states according to the signal-to-noise ratio, judges the operation environment state of the sub-band spectrum signal and revises the sub-band spectrum signal;
the synthesis filter bank is used to transform the revised subband spectral signals back to time-domain signals.
The system also comprises a pseudo-mode calculator, wherein the pseudo-mode calculator is used for receiving the sub-band spectrum signals output by the analysis filter bank, generating pseudo-mode signals according to the sub-band spectrum signals and inputting the pseudo-mode signals to the sub-band domain single microphone noise reduction core subsystem.
The sub-band domain single microphone noise reduction core subsystem comprises a voice spectrum envelope estimator, a basic noise spectrum envelope estimator, a final noise spectrum envelope estimator, a level calculator, a running environment state machine, an original suppression gain calculator and a final gain calculator;
the speech spectrum envelope estimator is used for estimating the rising parameters of the previous frame according to the pseudo-mode signal
Figure 103157DEST_PATH_IMAGE019
Estimating a speech sub-band spectrum envelope signal;
a basic noise estimator estimates a noise spectrum envelope level according to the pseudo-modulus signal and the speech sub-band spectrum envelope signal;
a final noise spectrum envelope estimator estimates a final noise spectrum envelope level according to the pseudo-modulus signal and the noise sub-band spectrum envelope level;
the level calculator calculates the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise according to the final noise spectrum envelope level;
the running environment state machine is used for determining a judgment threshold when the running environment state changes and judging the signal-to-noise ratio state, the lower bound of the suppression gain and the rising parameter of the current frame of the running environment of each frame of the subband spectrum according to the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise
Figure 163517DEST_PATH_IMAGE019
Calculating original noise suppression gain according to the voice subband spectrum envelope signal, the final noise spectrum envelope level and the signal-to-noise ratio state;
the final gain calculator calculates a final noise suppression gain according to a frequency domain adjacent band smoothing processing result and a suppression gain lower bound of the original noise suppression gain.
The invention transforms the input time domain signal into the sub-band spectrum signal, then estimates the envelope according to the sub-band spectrum signal, calculates the sub-band domain inhibiting gain according to the envelope, revises the sub-band spectrum signal by applying the sub-band domain inhibiting gain to obtain the revised sub-band spectrum signal, and finally inversely transforms the revised sub-band domain signal back to the time domain signal, thereby obtaining the enhanced voice signal. Compared with the prior art, the invention can save Millions of Instructions Per Second (MIPS), has lower computational complexity, is convenient for real-time implementation and is beneficial to popularization.
Drawings
Fig. 1 is a flow chart of a first preferred embodiment of the present invention.
FIG. 2 is a flow chart illustrating the implementation of the AFB algorithm in the first preferred embodiment.
Fig. 3 is a schematic flow chart of the implementation of the SFB algorithm in the first preferred embodiment.
Fig. 4 is a block diagram of the second preferred embodiment of the present invention.
FIG. 5 is a block diagram of the algorithm system of the second preferred embodiment.
Fig. 6 is a flow chart of the operation of the sub-band domain single-microphone noise reduction core subsystem in the second preferred embodiment.
Detailed Description
Step one, transforming the voice sequence containing noise to generate a subband spectrum.
In this embodiment, an Analysis Filter Bank (AFB) is used to process a noisy speech sequence to generate a noisy speech subband spectrum. Let w be the prototype low pass filter window function, which has a length of N sample points, K be the number of FFT and IFFT operations in the filter bank (where K is an even number), and M be the input time series data block (frame) length, i.e., the sampling rate (Decimation rate).
As shown in fig. 2, the AFB converts the time domain signal into a subband spectrum by the following steps:
the method comprises the following steps that firstly, the content of an analysis shift register (register _ a) in the AFB is initialized to be zero;
secondly, moving M sample data blocks into a register _ a;
thirdly, weighting the content of the register _ a by applying an analysis window function W;
fourthly, dividing the content of the register _ a into r sections, wherein each section comprises K samples, r = N/K, and r is an integer;
fifthly, overlapping the r K sample sections, and then performing Fast Fourier Transform (FFT) operation on the K samples to obtain K subband components;
sixthly, if the data input technology is finished, the AFB stops running; otherwise, skipping to the first step and repeating until the AFB stops running.
And matching, and performing inverse transformation from the sub-band spectrum to the time-domain voice sequence by adopting SFB. Compared with the common Fourier transform, the processing mode can save many times of MIPS and reduce the complexity of the algorithm.
And step two, carrying out envelope estimation on the subband spectrums.
In this step, a final gain function for suppressing noise is obtained from a sub-band spectrum signal X (k, t) of the noisy speech
Figure 998749DEST_PATH_IMAGE020
Assuming that a noise-free speech sequence is { s (n), n =0,1,2, \8230 }, a noise sequence is { v (n), n =0,1,2, \8230 }, a noise-containing speech sequence is { x (n), n =0,1,2, \8230 }, x (n) can be expressed as:
Figure 853572DEST_PATH_IMAGE021
(1)
by applying AFB, the corresponding subband spectrum can be obtained as follows:
Figure 164468DEST_PATH_IMAGE022
(2)
in the formula, K =0,1,2, \ 8230, K-1 is (sub-band domain) sub-band index, and K is the number of FFT points in AFB; t =0,1,2, \8230, for the index of the Signal frame, in actual calculation, in order to make a better compromise processing in terms of Mean Opinion Score (MOS) and music Noise, before estimating an envelope of a subband spectrum, an operating environment is divided according to a Signal-to-Noise ratio (SNR), and for the SNR (t) of the operating environment of the t-th frame, the state of the operating environment is specifically defined as follows:
Figure 966202DEST_PATH_IMAGE023
(3)
wherein
Figure 718257DEST_PATH_IMAGE014
Respectively, threshold values of SNR in dB.
The SNR of the operating environment in this embodiment is calculated from the ratio of the root mean square value of its speech signal modification bias to the root mean square (r.m.s.) value of its noise, and the r.m.s. of the speech signal modification bias is calculated as: for the spectral amplitude of noisy speech sub-band
Figure 71878DEST_PATH_IMAGE024
The sums of which are recursively averaged, which yields an r.m.s. value for the rough estimate of the speech signal
Figure 745436DEST_PATH_IMAGE025
I.e. by
Figure 475495DEST_PATH_IMAGE026
(4)
Wherein
Figure 79127DEST_PATH_IMAGE027
Is a preset time constant. In view of
Figure 479015DEST_PATH_IMAGE028
Will be biased by the background noise level, then the bias correction is performed as follows:
Figure 436607DEST_PATH_IMAGE029
(5)
here, the
Figure 439198DEST_PATH_IMAGE030
Is the r.m.s. value of the noise, which is offset-corrected from the sum of the final subband noise spectral envelope level estimates, i.e.:
Figure 165846DEST_PATH_IMAGE031
(6)
where BCF > 0 is a preset bias correction factor coefficient.
According to (16) and (17), the operating environment signal-to-noise ratio SNR (t) at t frames can be determined by the following equation:
Figure 2215DEST_PATH_IMAGE032
(7)
to enable the operating environment state to be at the decision threshold T 1 And T 2 Near the situation that frequent transition does not occur due to the change of the estimation of the running environment SNR (t), the decision threshold of the running environment SNR (t) involved in the state division is carried out according to the following modes, namely:
when the previous frame is switched from the high signal-to-noise ratio state to the signal-to-noise ratio state in the current frame, the signal-to-noise ratio judgment threshold is
Figure 243840DEST_PATH_IMAGE015
When the signal-to-noise ratio state of the previous frame is changed from the signal-to-noise ratio state of the current frame to the low signal-to-noise ratio state, the signal-to-noise ratio judgment threshold is
Figure 191068DEST_PATH_IMAGE016
When the previous frame is switched from the low signal-to-noise ratio state to the signal-to-noise ratio state in the current frame, the signal-to-noise ratio judgment threshold is
Figure 37801DEST_PATH_IMAGE017
When the signal-to-noise ratio state of the previous frame is changed from the signal-to-noise ratio state of the current frame to the high signal-to-noise ratio state, the signal-to-noise ratio judgment threshold is
Figure 904126DEST_PATH_IMAGE018
In the formula,. DELTA. 1 And Δ 2 Respectively, the redundancy threshold is expressed in decibels.
The operating environment state machine estimates the operating environment SNR (t) according to equation (7) and compares the corresponding SNR thresholds in combination with equation (3) (i.e., using the SNR thresholds, respectively
Figure 773993DEST_PATH_IMAGE033
And/or
Figure 383966DEST_PATH_IMAGE034
And
Figure 88135DEST_PATH_IMAGE035
and/or
Figure 728DEST_PATH_IMAGE036
To replace in formula (3)
Figure 482524DEST_PATH_IMAGE037
And
Figure 771554DEST_PATH_IMAGE038
) Thereby, an operating state value state of the environment = "high SNR state" or "medium SNR state" or "low SNR state" is obtained. This state value will be applied to the next noise suppressed processing frame.
When envelope estimation is carried out, the amplitude of the subband spectrum X (k, t) is pseudomorphic
Figure 327301DEST_PATH_IMAGE039
Instead, so as to reduce the complexity of operation, facilitate the engineering realization,
Figure 535428DEST_PATH_IMAGE040
re { X (k, t) } and Im { X (k, t) } are the real part and the imaginary part of X (k, t), respectively.
For the sub-band spectrum X (k, t) of a noisy speech signal, the speech sub-band spectrum envelope signal is first estimated using the following "fast-up-slow-down" nonlinear single-pole recursive model
Figure 176625DEST_PATH_IMAGE041
Figure 3767DEST_PATH_IMAGE042
(8)
Where α is a coefficient of recursion, which is determined by:
Figure 741916DEST_PATH_IMAGE043
(9)
here, the
Figure 996311DEST_PATH_IMAGE044
And slow-down parameter
Figure 124804DEST_PATH_IMAGE045
Usually a predetermined time constant, and a fast-rise parameter
Figure 880270DEST_PATH_IMAGE046
Then the operation state decision module adaptively selects different values according to the state of the operation environment, namely:
Figure 345362DEST_PATH_IMAGE047
(10)
wherein
Figure 160871DEST_PATH_IMAGE048
Three preset fast-rise time constants respectively.
Time constant due to fast rise
Figure 776660DEST_PATH_IMAGE049
The module is self-adaptively assigned according to the operation state, so that the accuracy of estimating the subband spectrum envelope of the voice signal is improved.
Next, we estimate the noise subband spectral envelope signal. The noise subband spectrum envelope estimator is composed of two parts calledThe basic noise estimator and the final noise estimator are connected in series, and the basic noise estimator adopts a slow rising-fast falling nonlinear unipolar recursive model to estimate the spectrum envelope level of the basic noise estimator
Figure 945605DEST_PATH_IMAGE050
Namely:
Figure 64870DEST_PATH_IMAGE051
(11)
wherein
Figure 51281DEST_PATH_IMAGE052
(12)
Here, the
Figure 357628DEST_PATH_IMAGE053
Respectively, preset fast-fall and slow-rise time constants. The
Figure 189318DEST_PATH_IMAGE054
The level change caused by the speech disturbance can be always tracked, so that it can be used
Figure 631932DEST_PATH_IMAGE055
Together, complete the decision of VAD in the final noise estimator. The final noise estimator also adopts a non-linear single-pole recursive model of 'slow rising-fast falling' to estimate the final noise spectrum envelope level
Figure 461348DEST_PATH_IMAGE056
But its iterative update is only done when the VAD detection is false (i.e. no speech signal), i.e.:
Figure 379625DEST_PATH_IMAGE057
(13)
wherein
Figure 624793DEST_PATH_IMAGE058
(14)
Here, the
Figure 312126DEST_PATH_IMAGE059
Respectively a preset fast-falling time constant and a preset slow-rising time constant,
Figure 303654DEST_PATH_IMAGE060
is a preset VAD threshold parameter. In this way, the estimated lower bound can be as small as the small input amplitude without stalling of the update, and the ambient noise can be varied by any amount without affecting the convergence time.
And step three, acquiring a sub-band domain suppression gain function according to the sub-band spectrum envelope.
Based on the analysis in step two, it can be seen that the suppression gain function in each sub-band should be a function of the posterior SNR and the operating environment state in the sub-band, and in consideration of the fact that the VAD likelihood function based on energy detection is the posterior SNR, we normalize the posterior SNR to the VAD threshold, and use the normalized posterior SNR value to construct the original value of the suppression gain, that is:
Figure 53435DEST_PATH_IMAGE061
(15)
where γ is the preset threshold parameter for VAD, when the A posteriori SNR (A-SNR) (A-SNR)
Figure 758086DEST_PATH_IMAGE062
) Greater than or equal to the threshold value γ, the VAD will indicate the presence of speech. Parameter(s)
Figure 175292DEST_PATH_IMAGE063
Is a gain expansion factor that controls the decay rate of the gain function for normalized a posteriori SNRs less than 1, the gain will decay linearly with normalized a posteriori SNR when p = 1.
Since the operating environment has been divided into three operating states, i.e., high, medium, and low SNR states, by SNR, and each operating state has a corresponding lower bound on the suppression gain, the lower bound on the suppression gain can be determined by the operating environment decision module as follows:
Figure 346511DEST_PATH_IMAGE064
(16)
wherein
Figure 708222DEST_PATH_IMAGE065
Respectively, a predetermined constant in dB.
Defining an Aggressive state mode, wherein in the running state corresponding to the Aggressive state mode, a parameter p in the expression (15) should take a proper value larger than 1, otherwise, the value of p is 1.0. Normally, "aggregate" is taken as the "low SNR" state value, and then expression (15) becomes:
Figure 560771DEST_PATH_IMAGE066
(17)。
since the human ear cannot distinguish undulations located in the same Critical band (Critical band), but can distinguish undulations located in different Critical bands. This provides the basis for the critical band smoothing of the subband suppressed gain component. The critical bands are typically characterized in the Bark frequency scale, 1 to 24 Bark corresponding to the first 24 critical bands of hearing, with frequencies and bandwidths corresponding to their boundary points as shown in table 1.
Table 1: frequency point and bandwidth corresponding to first 24 auditory critical bands
Figure 894801DEST_PATH_IMAGE067
Accordingly, the subband indexes k in the formula (17) are grouped, so that the frequencies corresponding to the subband indexes in each group are all located in the same critical band, and the subband suppression gains corresponding to each group are all replaced by the arithmetic mean value of the subband suppression gains, so that the critical band smoothing processing of the subband suppression gains is completed. The sub-band suppression gain after the critical band smoothing is recorded as
Figure 299237DEST_PATH_IMAGE068
Then the final gain function may be determined byThe following equation is determined:
Figure 289190DEST_PATH_IMAGE069
(18)。
and step four, revising the subband spectrums according to the final gain function to obtain revised subband spectrums.
Revised subband spectrum Y (k, t) =
Figure 70064DEST_PATH_IMAGE070
(19)。
And step five, synthesizing the revised sub-band spectrum and outputting the enhanced voice sequence.
As shown in fig. 3, the SFB inverts the revised subband spectrum into the speech sequence in the time domain as follows:
firstly, initializing the content of a synthesis shift register (reglist _ s) in the SFB to zero;
secondly, K time domain samples are obtained through Inverse Fast Fourier Transform (IFFT) operation of K subband spectrums;
thirdly, performing periodic topology on the K time domain samples to form r sections, and storing the r sections in a temporary register;
fourthly, weighting the temporary register by using a synthesis window function W1 (wherein the synthesis window W1 can be equal to the analysis window W);
fifthly, overlapping the content of the temporary register to reglist _ s;
sixthly, moving M samples from the register _ s to the left to obtain a new output data block;
seventhly, filling M zeros at the rightmost end in the reglist _ s;
eighthly, if the data input is finished, the SFB stops running; otherwise, jumping to the first step to repeat the above operation.
Compared with the prior art, the embodiment adopts the sub-band domain structure, and the computation saves MIPS by many times, so that the computation complexity of the algorithm is lower, and the real-time implementation on the existing commercial DSP chip is more convenient; and the operating environment is divided into three SNR states of high, medium and low, and respective suppression level is applied and respective time smoothing coefficient is selected to carry out envelope estimation in each state, so that better compromise processing is realized in the aspects of MOS and music noise, and the method has higher MOS index, lower no-perception speech distortion and lower no-perception music noise.
Preferred embodiment two
As shown in fig. 4, fig. 5, and fig. 6, the present embodiment discloses a single-microphone subband domain noise reduction system, which includes an analysis filter bank 1, a pseudo-modulus calculator 2, a subband domain single-microphone noise reduction core subsystem 3, and a synthesis filter bank 4.
The analysis filter bank 1 is used to convert the time domain signal into a subband spectral signal.
The pseudo-modulus calculator 2 is used for generating a pseudo-modulus signal according to the sub-band spectrum signal to replace the amplitude of the sub-band spectrum signal, so that the operation complexity is reduced, and the engineering implementation is easy.
And the sub-band domain single microphone noise reduction core subsystem 3 divides three operating environment states according to the signal-to-noise ratio, judges the operating environment state of the sub-band spectrum signal and revises the sub-band spectrum signal.
The sub-band domain single-microphone noise reduction core subsystem 3 includes a speech spectrum envelope estimator 31, a base noise spectrum envelope estimator 32, a final noise spectrum envelope estimator 33, a level calculator 34, a running environment state machine 35, a raw suppression gain calculator 36, and a final gain calculator 37;
the speech spectrum envelope estimator is used for estimating the rising parameters of the previous frame according to the pseudo-mode signal
Figure 258600DEST_PATH_IMAGE071
A speech subband spectral envelope signal is estimated.
And the basic noise estimator estimates the noise spectrum envelope level according to the pseudo-modulus signal and the speech sub-band spectrum envelope signal.
And the final noise spectrum envelope estimator estimates a final noise spectrum envelope level according to the pseudo-modulus signal and the noise sub-band spectrum envelope level.
The level calculator calculates the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise according to the final level noise spectrum envelope level.
The running environment state machine is used for determining a judgment threshold when the running environment state changes and judging the signal-to-noise ratio state, the lower bound of the suppression gain and the rising parameter of the current frame of the subband spectrum in each frame of the running environment according to the root mean square value of the rough estimation of the voice signal and the root mean square value machine of the noise
Figure 706374DEST_PATH_IMAGE071
And calculating the original suppression gain according to the voice subband spectrum envelope signal, the final noise spectrum envelope level and the signal-to-noise ratio state.
The final gain calculator calculates a final noise suppression gain according to a frequency domain adjacent band smoothing processing result and a suppression gain lower bound of the original noise suppression gain.
The working flow of the sub-band domain single microphone noise reduction core subsystem 3 is shown in fig. 6.
The synthesis filter bank 4 is used to transform the modified subband spectral signal back to a time-domain signal.
The invention transforms the input time domain signal into the sub-band spectrum signal, then estimates the envelope according to the sub-band spectrum signal, calculates the sub-band domain inhibiting gain according to the envelope, revises the sub-band spectrum signal by applying the sub-band domain inhibiting gain to obtain the revised sub-band spectrum signal, and finally inversely transforms the revised sub-band domain signal back to the time domain signal, thereby obtaining the enhanced voice signal. Compared with the prior art, the method can save MIPS by many times, has lower calculation complexity, is convenient to realize in real time, and is beneficial to popularization.
The present invention is not limited to the above preferred embodiments, and any modification, equivalent replacement or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A single-microphone subband-domain noise reduction method, comprising:
transforming the noisy speech sequence to generate a subband spectrum;
carrying out envelope estimation on the subband spectrums;
acquiring a sub-band domain suppression gain function according to the estimated envelope;
revising the subband spectrum according to the subband domain suppression gain function;
and transforming the revised sub-band spectrum into a time domain signal to obtain a noise-reduced voice sequence.
2. A single microphone subband-domain noise reduction method as claimed in claim 1, wherein the estimated envelope comprises a speech subband spectral envelope signal and a noise subband spectral envelope signal;
the envelope estimation of the subband spectrum comprises:
carrying out first nonlinear single-pole recursion on the subband spectrums to obtain speech subband spectrum envelopes;
carrying out second nonlinear single-pole recursion on the subband spectrums to obtain noise subband spectrum basic envelopes;
performing voice activity detection by using the voice sub-band spectrum envelope signal and the noise sub-band spectrum basic envelope, if the voice signal is detected, taking the noise sub-band spectrum basic envelope as a noise sub-band spectrum envelope, otherwise, performing a third nonlinear single-pole recursion on the noise sub-band spectrum basic envelope to obtain the noise sub-band spectrum envelope;
wherein an ascending parameter of the first non-linear unipolar recursion is less than a descending parameter;
the rising parameter of the second and third non-linear unipolar recursions is greater than the falling parameter.
3. The single-microphone subband-domain denoising method of claim 2, wherein the fall parameter of the first nonlinear unipolar recursion is a predetermined time constant;
rising parameter of the first non-linear unipolar recursion
Figure 733761DEST_PATH_IMAGE001
Determining according to the signal-to-noise ratio state of the subband spectrum:
Figure 708670DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 800254DEST_PATH_IMAGE003
Figure 886022DEST_PATH_IMAGE004
is a preset time constant.
4. A single-microphone subband-domain noise reduction method as claimed in claim 3, wherein the deriving the subband-domain suppression gain function based on the estimated envelope comprises:
determining an original value of a suppression gain according to the signal-to-noise ratio state of the subband spectrum, the speech subband spectrum envelope and the noise subband spectrum envelope;
performing frequency domain critical band smoothing on the suppression gain original value to obtain sub-band suppression gain;
determining a suppression gain lower bound according to the signal-to-noise ratio state of the subband spectrum;
and calculating the sub-band domain suppression gain function according to the sub-band suppression gain and the suppression gain lower bound.
5. The method of claim 4, wherein the lower bound of suppression gain is determined as follows
Figure 504085DEST_PATH_IMAGE005
Figure 649896DEST_PATH_IMAGE006
Wherein the content of the first and second substances,
Figure 555924DEST_PATH_IMAGE007
are each preSetting a constant in decibels;
the suppression gain original value is determined in the following manner
Figure 179804DEST_PATH_IMAGE008
Figure 183532DEST_PATH_IMAGE009
Wherein gamma is a threshold parameter of voice activity detection,
Figure 437927DEST_PATH_IMAGE010
is the envelope of the speech sub-band spectrum,
Figure 566420DEST_PATH_IMAGE011
is the noise sub-band spectral envelope.
6. A single-microphone subband-domain denoising method as claimed in claim 5, wherein the snr state of each frame of the subband spectrum is determined as follows;
initially, if the SNR is greater than or equal to
Figure 321886DEST_PATH_IMAGE012
The signal-to-noise ratio is high, if the signal-to-noise ratio is less than
Figure 524328DEST_PATH_IMAGE012
And is not less than
Figure 339838DEST_PATH_IMAGE013
If the signal-to-noise ratio is less than the signal-to-noise ratio, the state is the signal-to-noise ratio state
Figure 955627DEST_PATH_IMAGE013
Then low signal-to-noise ratio state, as described above
Figure 921309DEST_PATH_IMAGE014
Respectively, the threshold values of the preset signal-to-noise ratio are respectively in decibels;
when the previous frame is shifted from the current frame from the high snr state to the snr state,
the signal-to-noise ratio decision threshold is
Figure 243837DEST_PATH_IMAGE015
When the previous frame is shifted from the signal-to-noise ratio state to the low signal-to-noise ratio state from the current frame,
the signal-to-noise ratio decision threshold is
Figure 964668DEST_PATH_IMAGE016
When the previous frame is shifted from the current frame from the low snr state to the snr state,
the signal-to-noise ratio decision threshold is
Figure 64824DEST_PATH_IMAGE017
When the previous frame is shifted from the signal-to-noise ratio state to the high signal-to-noise ratio state from the current frame,
the signal-to-noise ratio decision threshold is
Figure 771880DEST_PATH_IMAGE018
In the formula,. DELTA. 1 And Δ 2 Respectively, the redundancy threshold is expressed in decibels.
7. A single-microphone subband-domain noise reduction method as claimed in any of claims 1 to 6, wherein the envelope estimation of the subband spectra comprises:
calculating a pseudo-modulus value of the subband spectrum;
and carrying out envelope estimation by using the pseudo modulus value.
8. A single-microphone subband-domain noise reduction system, comprising: the system comprises an analysis filter bank, a sub-band domain single-microphone noise reduction core subsystem and a synthesis filter bank;
the analysis filter bank is used for converting the time domain signal into a sub-band spectrum signal;
the sub-band domain single microphone noise reduction core subsystem divides three operation environment states according to the signal-to-noise ratio, judges the operation environment state of the sub-band spectrum signal and revises the sub-band spectrum signal;
the synthesis filter bank is used to transform the revised subband spectral signals back to time-domain signals.
9. A single-microphone subband-domain noise reduction system as claimed in claim 8, further comprising a pseudo-norm calculator for receiving the subband-spectrum signal output from the analysis filter bank, generating a pseudo-norm signal from the subband-spectrum signal, and inputting the pseudo-norm signal to the subband-domain single-microphone noise reduction core subsystem.
10. A single-microphone subband-domain noise reduction system as claimed in claim 9, wherein: the sub-band domain single microphone noise reduction core subsystem comprises a voice spectrum envelope estimator, a basic noise spectrum envelope estimator, a final noise spectrum envelope estimator, a level calculator, an operating environment state machine, an original suppression gain calculator and a final suppression gain calculator;
the speech spectrum envelope estimator is used for estimating the rising parameters of the previous frame according to the pseudo-mode signal
Figure 604706DEST_PATH_IMAGE019
Estimating a speech sub-band spectrum envelope signal;
a basic noise estimator estimates a noise spectrum envelope level according to the pseudo-modulus signal and the speech sub-band spectrum envelope signal;
a final noise spectrum envelope estimator estimates a final noise spectrum envelope level according to the pseudo-modulus signal and the noise sub-band spectrum envelope level;
the level calculator calculates the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise according to the final noise spectrum envelope level;
the running environment state machine is used for determining a judgment threshold when the running environment state changes and judging the signal-to-noise ratio state, the lower bound of the suppression gain and the rising parameter of the current frame of the running environment of each frame of the subband spectrum according to the root mean square value of the rough estimation of the voice signal and the root mean square value of the noise
Figure 371805DEST_PATH_IMAGE019
An original suppression gain calculator calculates an original noise suppression gain according to the speech sub-band spectrum envelope signal, the final noise spectrum envelope level and the signal-to-noise ratio state;
and the final suppression gain calculator calculates the final noise suppression gain according to the frequency domain adjacent band smoothing processing result and the suppression gain lower bound of the original noise suppression gain.
CN202211013301.9A 2022-08-23 2022-08-23 Single-microphone subband domain noise reduction method and system Pending CN115527550A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211013301.9A CN115527550A (en) 2022-08-23 2022-08-23 Single-microphone subband domain noise reduction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211013301.9A CN115527550A (en) 2022-08-23 2022-08-23 Single-microphone subband domain noise reduction method and system

Publications (1)

Publication Number Publication Date
CN115527550A true CN115527550A (en) 2022-12-27

Family

ID=84696850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211013301.9A Pending CN115527550A (en) 2022-08-23 2022-08-23 Single-microphone subband domain noise reduction method and system

Country Status (1)

Country Link
CN (1) CN115527550A (en)

Similar Documents

Publication Publication Date Title
US8010355B2 (en) Low complexity noise reduction method
KR101120679B1 (en) Gain-constrained noise suppression
JP3591068B2 (en) Noise reduction method for audio signal
JP5260561B2 (en) Speech enhancement using perceptual models
CA2732723C (en) Apparatus and method for processing an audio signal for speech enhancement using a feature extraction
CA2399706C (en) Background noise reduction in sinusoidal based speech coding systems
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
US20180366138A1 (en) Speech Model-Based Neural Network-Assisted Signal Enhancement
US7957965B2 (en) Communication system noise cancellation power signal calculation techniques
US7313518B2 (en) Noise reduction method and device using two pass filtering
Lin et al. Adaptive noise estimation algorithm for speech enhancement
CN108831499A (en) Utilize the sound enhancement method of voice existing probability
US8352257B2 (en) Spectro-temporal varying approach for speech enhancement
US20110125490A1 (en) Noise suppressor and voice decoder
Verteletskaya et al. Noise reduction based on modified spectral subtraction method
Udrea et al. An improved spectral subtraction method for speech enhancement using a perceptual weighting filter
Wolfe et al. Towards a perceptually optimal spectral amplitude estimator for audio signal enhancement
CN112652322A (en) Voice signal enhancement method
CN112634927A (en) Short wave channel voice enhancement method
Upadhyay et al. The spectral subtractive-type algorithms for enhancing speech in noisy environments
Rao et al. Speech enhancement using sub-band cross-correlation compensated Wiener filter combined with harmonic regeneration
CN115527550A (en) Single-microphone subband domain noise reduction method and system
BR112019020491A2 (en) apparatus and method for post-processing an audio signal using prediction-based format
CN114882898A (en) Multi-channel speech signal enhancement method and apparatus, computer device and storage medium
Gui et al. Adaptive subband Wiener filtering for speech enhancement using critical-band gammatone filterbank

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination