CN103021420B - Speech enhancement method of multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation - Google Patents

Speech enhancement method of multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation Download PDF

Info

Publication number
CN103021420B
CN103021420B CN201210513075.0A CN201210513075A CN103021420B CN 103021420 B CN103021420 B CN 103021420B CN 201210513075 A CN201210513075 A CN 201210513075A CN 103021420 B CN103021420 B CN 103021420B
Authority
CN
China
Prior art keywords
mrow
msub
mtd
mover
omega
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210513075.0A
Other languages
Chinese (zh)
Other versions
CN103021420A (en
Inventor
刘文举
李超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201210513075.0A priority Critical patent/CN103021420B/en
Publication of CN103021420A publication Critical patent/CN103021420A/en
Application granted granted Critical
Publication of CN103021420B publication Critical patent/CN103021420B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a speech enhancement method of a multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation. The method mainly includes truncating signals acquired by a microphone and performing fast Fourier transform (FFT); performing micro maximum search on an amplitude spectrum through a phase adjustment algorithm to obtain an adjusted amplitude spectrum of noisy speech; estimating the amplitude spectrum of noise; dividing a whole band into a plurality of sub-bands and calculating the signal to noise ratio of each sub-band; performing amplitude spectrum subtraction of an over-subtraction rule on each sub-band; performing amplitude compensation on speech spectrums after the spectrum subtraction; and obtaining time domain waveforms of the signals through fast Fourier inversion and signal overlapping.

Description

Voice enhancement method of multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation
Technical Field
The invention relates to the field of voice signal processing, in particular to a voice enhancement method based on multi-subband spectral subtraction of phase adjustment and amplitude compensation.
Background
Spectral subtraction is one of the most widely used speech enhancement algorithms that historically have been proposed for noise cancellation (ref 1: s.f. ball, "Suppression of the acoustic noise in speech using speech processing", IEEE tran.acuost. speech Signal process.27(2), 113. sub. 120, 1979). It is based on a basic theory: for additive noise, we can subtract the noise spectrum from the spectrum of the Discrete Fourier Transform (DFT) of the noisy speech to obtain an estimate of the speech spectrum. The noise spectrum may be estimated and updated by the silence segments. The enhanced speech time-domain waveform can be obtained by performing Inverse Discrete Fourier Transform (IDFT) on the estimated speech spectrum. The spectral subtraction only needs DFT and IDFT, and has low calculation complexity and simple realization.
Spectral subtraction is largely divided into first order spectral subtraction (i.e., magnitude spectral subtraction) and second order spectral subtraction (i.e., power spectral subtraction). Whatever the form of spectral subtraction, great care should be taken in the design to avoid introducing speech distortion. If the subtracted portion is greater than the noise, the voice information will be lost; conversely, if the subtracted portion is less than the noise, there will be excessive noise residual. Researchers have subsequently proposed a number of improved algorithms to attenuate (or even eliminate) the speech distortion introduced during spectral subtraction.
Corresponding to the simple implementation of the spectral subtraction method, there are many drawbacks. Among them, the most dominant defect comes from musical noise. Due to the presence of noise estimation errors and spectral perturbations, the amplitude of the noisy signal in some frequency bands will be smaller than the estimated amplitude of the noise, thus giving the estimated speech spectrum after subtraction a negative value. The simplest approach is to zero these values less than zero so that the spectral magnitudes of the full band are all non-negative. However, this non-linear operation on negative values produces many isolated peaks in the frequency band. These isolated peaks exhibit strong randomness in both the time and frequency domains, and although not large in magnitude, the effect is severe. In the time domain, these isolated peaks sound like single-tone tones, and their pitch (frequency) varies randomly from frame to frame, thereby creating a new type of noise, often referred to as musical noise (musical noise). In many times, the musical noise is more objectionable than the original noise. Another important factor in generating musical noise is the large variance of the spectra of noisy speech and noise when estimated, and the large difference in spectral subtraction rules at different frequency points.
Speech noise is difficult to overcome in conventional spectral subtraction because spectral subtraction is based on the rule: cross terms resulting from the speech signal spectrum and the noise spectrum can be ignored. This rule is consistent with the view of long-term statistics: since speech and noise are random processes independent of each other, the expected value of its cross terms should be equal to zero. However, in Speech enhancement algorithm implementations, each frame is only about 20-30ms long, and in such a short time, this rule is difficult to establish (reference 2: N.Evans, J.Mason, W.Liu, and B.Fauve, "An assessment on the fundamental considerations". Proc.IEEE Internat.conf.on Acoustics, Speech, Signal Processing (ICASSP), 2006). Therefore, the formula of the spectral subtraction method is only an approximation and not an accuracy. Researchers have made many efforts to investigate the effects of cross-terms, but these studies are mainly directed to improving the performance of Automatic Speech Recognition (ASR), not Speech quality.
Disclosure of Invention
Technical problem to be solved
The fundamental source of music noise in the spectral subtraction is cross term error, the invention aims to overcome the adverse effect of the cross term error and provides an amplitude spectral subtraction method under zero error so as to thoroughly eliminate the music noise.
(II) technical scheme
The invention provides a speech enhancement method based on multi-subband spectral subtraction of phase adjustment and amplitude compensation for solving the technical problems, which comprises the following steps:
step a: collecting a voice signal y (n) with noise, and obtaining an amplitude spectrum alpha of the voice signal y (n) with noisey(ω), wherein n represents discrete time points and ω represents discrete frequency points;
step b: carrying out microspur maximum value search on the amplitude spectrum of the voice with noise by utilizing a phase adjustment algorithm to obtain the amplitude spectrum of the voice with noise when the phase difference between a pure voice signal and an additive noise signal is 0
Step c: updating additive noise amplitude spectra using noise estimation algorithms
Step d: using over-subtraction rule coefficients and additive noise amplitude spectraFor the noisy speech amplitude spectrumPerforming amplitude spectrum subtraction to obtain pure speech amplitude spectrum
Step e: using the second order amplitude compensation factor and the preset first order amplitude compensation factor to process the pure speech amplitude spectrumCompensating to obtain enhanced pure speech amplitude spectrumFurther obtaining an enhanced clean speech signal
(III) advantageous effects
For cross terms generated by speech signal spectrum and noise spectrum in the traditional spectral subtraction, an unreasonable assumption of neglecting cross term errors is abandoned. In the invention, a multi-subband spectral subtraction method based on phase adjustment and amplitude compensation is provided by using a space geometric principle and fully utilizing the difference of voice and noise signals. Practice proves that the method can well overcome the influence of cross terms, effectively eliminate isolated peaks randomly appearing in a frequency spectrum, and further effectively suppress music noise.
Drawings
FIG. 1 is a flow chart of a speech enhancement method of the present invention based on phase adjustment and amplitude compensation for multi-subband spectral subtraction;
FIG. 2 is a flow chart of a phase adjustment algorithm in the speech enhancement method of the present invention;
FIG. 3 is a flow chart of dividing a plurality of sub-bands and calculating corresponding SNRs in the speech enhancement method of the present invention;
FIG. 4 is a flow chart of sub-band spectral subtraction in the speech enhancement method of the present invention;
FIG. 5 is a flow chart of an amplitude compensation algorithm in the speech enhancement method of the present invention.
Detailed Description
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
As shown in FIG. 1, the present invention discloses a speech enhancement method based on phase adjustment and amplitude compensation for multi-subband spectral subtraction, which comprises the following steps:
step 101: noisy speech signal y (n) is collected by a microphone. The following is a detailed description of step 101.
Suppose that the noisy speech signal y (n) collected by the microphone at time k is the sum of x (n) and v (n), i.e.
y(n)=g*s(n)+v(n)=x(n)+v(n) (1)
Where g is the impulse response from source s (n) to the microphone, x (n) is the clean speech picked up by the microphone, and v (n) is additive noise.
Step 102 truncates the voice with noise, and performs Fast Fourier Transform (FFT) to obtain an amplitude spectrum and a phase spectrum of the voice signal with noise. Step 102 is described in detail below.
In this example we use a 32ms hanning window to truncate the signal and perform an FFT to obtain a frequency domain representation of the signal
Y(ω)=X(ω)+V(ω) (2)
Wherein, X represents the pure voice frequency spectrum, Y represents the frequency spectrum of the voice with noise, V represents the frequency spectrum of the additive noise, and omega represents the discrete frequency point.
Writing equation (2) in polar form,
<math> <mrow> <msub> <mi>&alpha;</mi> <mi>y</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <msup> <mi>e</mi> <mrow> <mi>j</mi> <msub> <mi>&theta;</mi> <mi>y</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>=</mo> <msub> <mi>&alpha;</mi> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <msup> <mi>e</mi> <mrow> <mi>j</mi> <msub> <mi>&theta;</mi> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>+</mo> <msub> <mi>&alpha;</mi> <mi>v</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <msup> <mi>e</mi> <mrow> <mi>j</mi> <msub> <mi>&theta;</mi> <mi>v</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow> </math>
wherein alpha isy(ω)、αx(omega) and alphav(ω) amplitude spectra of noisy speech, clean speech and additive noise, θy(ω)、θx(ω) and θv(omega) is the phase spectrum of the noisy speech, the clean speech and the additive noise, respectively, the amplitude spectrum alpha of the noisy speechy(ω) can be obtained by averaging a plurality of successive frames, the amplitude spectrum α of the additive noisev(ω) can be estimated using a noise estimation algorithm, the magnitude spectrum α of clean speechx(ω) is the object to be estimated.
Step 103: carrying out microspur maximum value search on the amplitude spectrum by utilizing a phase adjustment algorithm to obtain a noisy speech amplitude spectrum when the phase difference between a pure speech signal x (n) and an additive noise signal v (n) is 0Namely, on the omega frequency point, the maximum value of the amplitude of the voice signal with noise is searched in a plurality of continuous moments as the amplitude spectrum of the voice with noise after phase adjustment. The detailed flowchart of step 103 is shown in fig. 2, and the specific steps are as follows:
step 201: the displacement point of the noise-containing speech signal is initialized to m 1.
Step 202: forward shifting the noisy speech signal by m sampling points to obtain a shifted noisy speech signal:
ym=[y(n-m),y(n-m-1),...,y(n-m-L+1)]T (4)
wherein [ ·]TIs a transposition operator, vector ymEach element in (a) represents a sample value of the noisy speech signal at time point i, i ═ n-m, n-m-1.
Step 203: FFT is carried out on the voice signal with noise after the bit shift, and the frequency spectrum Y of the voice signal with noise is obtainedm(ω), i.e.
Ym(ω)=FFT(ym)
Where FFT (-) is a fast Fourier transform operator.
Step 204: obtaining an absolute value of the obtained frequency spectrum of the noisy speech signal after the displacement, and obtaining an amplitude spectrum | Y of the noisy speech signal after the displacementm(ω)|。
Step 205: judging the displacement point M, if M > MmaxIf the judgment result is no, go to step 206; otherwise, go to step 207. Wherein,
wherein,is an operator rounded up, fsIs the sampling rate and Ω is the length of the FFT.
Step 206: the displacement point m is incremented by 1 and step 202 is performed.
Step 207: performing a budget of taking the maximum value once, namely on the omega frequency point, and performing Y pair within the range of 0 < M < M (omega)m(omega) taking the maximum value as the amplitude spectrum of the noise-carrying voice signal after phase adjustment
<math> <mrow> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>max</mi> <mrow> <mn>0</mn> <mo>&le;</mo> <mi>m</mi> <mo>&le;</mo> <mi>M</mi> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mrow> </munder> <mo>|</mo> <msub> <mi>Y</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow> </math>
Wherein,
wherein,is an integer operator up and Ω denotes the length of the fast fourier transform.
Through step 103, the phase angle θ between the clean speech signal x (n) and the additive noise signal v (n) is finally obtainedxvAmplitude spectrum of noisy speech signal when 0
Step 104: and updating the estimation result of the additive noise power spectrum. The following is a detailed description of step 104.
First, a Signal-to-Noise Ratio (SNR) of the entire band is calculated
<math> <mrow> <msub> <mi>SNR</mi> <mi>k</mi> </msub> <mo>=</mo> <mn>10</mn> <msub> <mi>log</mi> <mn>10</mn> </msub> <mrow> <mo>(</mo> <mover> <mi>&Sigma;</mi> <mi>&omega;</mi> </mover> <msubsup> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>/</mo> <mover> <mi>&Sigma;</mi> <mi>&omega;</mi> </mover> <msubsup> <mover> <mi>&sigma;</mi> <mo>^</mo> </mover> <mi>v</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow> </math>
Wherein log10Representing natural logarithm operators, [ ·]Is a finite range summation operator, k represents a frame number, ω represents a discrete frequency bin,representing an estimate of the noisy speech power spectrum for the current frame k,an estimate representing the additive noise power spectrum of the previous frame k-1; .
Then, according to Voice Activity Detection (VAD) method, the lower threshold SNR of the voiced segment is determinedthDetermining an update of the estimated value of the additive noise power spectrum as follows:
<math> <mrow> <msubsup> <mover> <mi>&sigma;</mi> <mo>^</mo> </mover> <mi>v</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msubsup> <mover> <mi>&sigma;</mi> <mo>^</mo> </mover> <mi>v</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mi>if</mi> <msub> <mi>SNR</mi> <mi>k</mi> </msub> <mo>></mo> <msub> <mi>SNR</mi> <mi>th</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>0.98</mn> <mo>&CenterDot;</mo> <msubsup> <mover> <mi>&sigma;</mi> <mo>^</mo> </mover> <mi>v</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>+</mo> <mn>0.02</mn> <mo>&CenterDot;</mo> <msubsup> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mi>else</mi> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow> </math>
wherein,representing an estimate of the additive noise power spectrum of the previous frame k-1,an estimated value representing the power spectrum of the noisy speech of the current frame k, SNRthIs the lower threshold for SNR for voiced segments, k is the frame number.
Obtaining an additive noise amplitude spectrum according to the estimated value of the additive noise power spectrum
Step 105: the full frequency band is divided into a plurality of sub-bands and the signal-to-noise ratio is calculated on each sub-band. The detailed flowchart of step 105 is shown in fig. 3, and the specific steps are as follows:
step 301: initialization conditions, in this example the number of subbands R equals 8, the sampling rate fs=8000Hz。
Step 302: calculating the bandwidth of the sub-band, in this example, we adopt a uniformly distributed and non-overlapping bandwidth strategy, i.e. the bandwidth of each sub-band is equal, and is fd=500Hz。
Step 303: calculating the initial frequency point of each sub-band by the following method:
initial frequency point:
<math> <mrow> <msub> <mi>b</mi> <mi>r</mi> </msub> <mo>=</mo> <mfrac> <mrow> <mrow> <mo>(</mo> <mi>r</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <msub> <mi>f</mi> <mi>d</mi> </msub> </mrow> <msub> <mi>f</mi> <mi>s</mi> </msub> </mfrac> <mo>&CenterDot;</mo> <mi>&Omega;</mi> <mo>+</mo> <mn>1</mn> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow> </math>
wherein f isdIs the bandwidth of the sub-band, fsIs the sampling rate, Ω is the length of the FFT, and r is the r-th sub-band.
Cut-off frequency point:
<math> <mrow> <msub> <mi>e</mi> <mi>r</mi> </msub> <mo>=</mo> <mfrac> <mrow> <mi>r</mi> <msub> <mi>f</mi> <mi>d</mi> </msub> </mrow> <msub> <mi>f</mi> <mi>s</mi> </msub> </mfrac> <mo>&CenterDot;</mo> <mi>&Omega;</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow> </math>
fdis the bandwidth of the sub-band, fsIs the sampling rate and Ω is the length of the FFT.
Step 304:
calculating SNR at the r sub-bandr,r=1,2,...,R:
<math> <mrow> <msub> <mi>SNR</mi> <mi>r</mi> </msub> <mo>=</mo> <mn>10</mn> <msub> <mi>log</mi> <mn>10</mn> </msub> <mrow> <mo>(</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>&omega;</mi> <mo>=</mo> <msub> <mi>b</mi> <mi>r</mi> </msub> </mrow> <msub> <mi>e</mi> <mi>r</mi> </msub> </munderover> <msubsup> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>/</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>&omega;</mi> <mo>=</mo> <msub> <mi>b</mi> <mi>r</mi> </msub> </mrow> <msub> <mi>e</mi> <mi>r</mi> </msub> </munderover> <msubsup> <mover> <mi>&sigma;</mi> <mo>^</mo> </mover> <mi>v</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow> </math>
Wherein log10Representing natural logarithm operators, [ ·]Is a finite range summation operator, k represents a frame number, ω represents a discrete frequency bin,representing an estimate of the noisy speech power spectrum for the current frame k,representing an estimate of the additive noise power spectrum of the current frame k.
Step 106 uses over-subtraction rule to perform spectrum analysis on the amplitude of the noisy speech of the current frame kPerforming spectral subtraction to obtain the pure speech amplitude spectrum of the current frame kThe detailed flowchart of step 106 is shown in fig. 4, and the specific steps are as follows:
step 401: the initialization condition, r ═ 1, indicates that the algorithm starts from the first subband.
Step 402: a subtraction gain factor is calculated as follows:
<math> <mrow> <msub> <mi>&delta;</mi> <mi>r</mi> </msub> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <mi>for</mi> </mtd> <mtd> <msub> <mi>e</mi> <mi>r</mi> </msub> <mo>&le;</mo> <mi>&Omega;</mi> <mo>&CenterDot;</mo> <mn>1000</mn> <mo>/</mo> <msub> <mi>f</mi> <mi>s</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>1.5</mn> </mtd> <mtd> <mi>for</mi> </mtd> <mtd> <mi>&Omega;</mi> <mo>&CenterDot;</mo> <mn>1000</mn> <mo>/</mo> <msub> <mi>f</mi> <mi>s</mi> </msub> <mo>&lt;</mo> <msub> <mi>e</mi> <mi>r</mi> </msub> <mo>&le;</mo> <mi>&Omega;</mi> <mo>/</mo> <mn>2</mn> <mo>-</mo> <mi>&Omega;</mi> <mo>&CenterDot;</mo> <mn>3000</mn> <mo>/</mo> <msub> <mi>f</mi> <mi>s</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> <mtd> <mi>for</mi> </mtd> <mtd> <msub> <mi>e</mi> <mi>r</mi> </msub> <mo>></mo> <mi>&Omega;</mi> <mo>/</mo> <mn>2</mn> <mo>-</mo> <mi>&Omega;</mi> <mo>&CenterDot;</mo> <mn>3000</mn> <mo>/</mo> <msub> <mi>f</mi> <mi>s</mi> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow> </math>
wherein, brAnd erRespectively, a start frequency point and a cut-off frequency point of the r-th sub-band.
Step 403: the over-subtraction rule coefficients are calculated as follows:
step 404: the regular spectral subtraction is performed on the r-th subband as follows:
wherein,is the additive noise amplitude spectrum of the current frame k,is the noisy speech amplitude spectrum for the current frame k.
Step 405: and judging whether the condition R is less than R or not. If the judgment result is yes, go to step 406; otherwise, go to step 407.
Step 406: r +1, and step 402 is performed.
Step 407: half-wave rectification is performed to place the negative amplitude value generated by equation (13) at a minimum value, while suppressing musical noise caused by inaccurate noise estimation, as calculated as follows:
<math> <mrow> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mi>if</mi> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>></mo> <mi>&phi;</mi> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mi>&phi;</mi> <mo>&CenterDot;</mo> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mi>else</mi> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow> </math>
where φ is a maximum attenuation threshold, set to 0.02.
Step 107 utilizes a second order magnitude compensation factor mu2,rAnd preSet first order amplitude compensation factor mu1,rEstimation of clean speech amplitude spectraCompensating to obtain enhanced voice spectrumThe detailed flowchart of step 107 is shown in fig. 5, and the specific steps are as follows:
step 501: the initialization condition, r ═ 1, indicates that the algorithm starts from the first subband.
Step 502: a second order compensation factor is calculated as follows:
<math> <mrow> <msub> <mi>&mu;</mi> <mrow> <mn>2</mn> <mo>,</mo> <mi>r</mi> </mrow> </msub> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mi>if</mi> <msub> <mi>SNR</mi> <mi>r</mi> </msub> <mo>&lt;</mo> <msub> <mi>SNR</mi> <mn>0</mn> </msub> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <msub> <mi>SNR</mi> <mi>r</mi> </msub> <mo>-</mo> <msub> <mi>SNR</mi> <mn>0</mn> </msub> </mrow> <mn>4</mn> </mfrac> <mo>&CenterDot;</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mfrac> <msup> <mrow> <mo>(</mo> <msub> <mi>SNR</mi> <mi>r</mi> </msub> <mo>-</mo> <msub> <mi>SNR</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mn>8</mn> </mfrac> </mrow> </msup> </mtd> <mtd> <mi>else</mi> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>15</mn> <mo>)</mo> </mrow> </mrow> </math>
wherein the SNR0The threshold for VAD algorithm to indicate whether there is voice activity is 3 dB.
Step 503: implementing an amplitude spectrum compensation algorithm as follows:
<math> <mrow> <msub> <mover> <mi>&alpha;</mi> <mo>~</mo> </mover> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&mu;</mi> <mrow> <mn>1</mn> <mo>,</mo> <mi>r</mi> </mrow> </msub> <mo>&CenterDot;</mo> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&mu;</mi> <mrow> <mn>2</mn> <mo>,</mo> <mi>r</mi> </mrow> </msub> <mo>&CenterDot;</mo> <msubsup> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>16</mn> <mo>)</mo> </mrow> </mrow> </math>
wherein the first order compensation factor mu1,r0.05, mainly to further suppress musical noise.
Step 504: and judging whether the condition R is less than R or not. If the judgment result is yes, go to step 505; otherwise, go to step 506.
Step 505: r +1, and step 502 is performed.
Step 506: the algorithm ends.
Step 108 is to obtain an enhanced pure speech amplitude spectrumInverse Fast Fourier Transform (IFFT) is performed, and the obtained time-domain short-time pure speech signals are superimposed according to a 75% overlap ratio to obtain an enhanced pure speech signal
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A speech enhancement method based on phase adjustment and amplitude compensation for multi-subband spectral subtraction, comprising the steps of:
step a: collecting a voice signal y (n) with noise, and obtaining an amplitude spectrum alpha of the voice signal y (n) with noisey(ω), wherein n represents discrete time points and ω represents discrete frequency points;
step b: carrying out microspur maximum value search on the amplitude spectrum of the voice with noise by utilizing a phase adjustment algorithm to obtain the amplitude spectrum of the voice with noise when the phase difference between a pure voice signal and an additive noise signal is 0
Step c: updating additive noise amplitude spectra using noise estimation algorithms
Step d: using over-subtraction rule coefficients and additive noise amplitude spectraFor the noisy speech amplitude spectrumPerforming amplitude spectrum subtraction to obtain pure speech amplitude spectrum
Step e: using the second order amplitude compensation factor and the preset first order amplitude compensation factor to process the pure speech amplitude spectrumCompensating to obtain enhanced pure speech amplitude spectrumFurther obtaining an enhanced clean speech signal
2. The method according to claim 1, wherein said step b of performing a microspur maximum search on the amplitude spectrum of the noisy speech includes:
on the frequency point of the omega, searching the maximum value of the amplitude of the voice signal with noise in continuous M (omega) moments as the amplitude spectrum of the voice signal with noise after phase adjustment, namely finding the amplitude spectrum of the voice signal with noise when the phase difference between the pure voice signal and the additive noise signal is 0:
<math> <mrow> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>max</mi> <mrow> <mn>0</mn> <mo>&le;</mo> <mi>m</mi> <mo>&le;</mo> <mi>M</mi> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mrow> </munder> <mo>|</mo> <msub> <mi>Y</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </math>
wherein, Ym(ω) is the fast fourier transform of the speech signal after shifting M sampling points, and M (ω) takes different values at different frequency points, as shown in the following formula:
wherein,is an upward integer operator, omega represents the length of the fast fourier transform, and omega represents the discrete frequency points.
3. The method of claim 1, wherein the step c updates an additive noise magnitude spectrumFurther comprising:
step c 1: calculating the signal-to-noise ratio SNR of the full frequency band:
<math> <mrow> <msub> <mi>SNR</mi> <mi>k</mi> </msub> <mo>=</mo> <mn>10</mn> <msub> <mi>log</mi> <mn>10</mn> </msub> <mrow> <mo>(</mo> <mover> <mi>&Sigma;</mi> <mi>&omega;</mi> </mover> <msubsup> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>/</mo> <mover> <mi>&Sigma;</mi> <mi>&omega;</mi> </mover> <msubsup> <mover> <mi>&sigma;</mi> <mo>^</mo> </mover> <mi>v</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </math>
wherein, 1og10Representing natural logarithm operators, [ ·]Is a finite range summation operator, k represents a frame number, ω represents a discrete frequency bin,representing an estimate of the noisy speech power spectrum for the current frame k,an estimate representing the additive noise power spectrum of the previous frame k-1;
step c 2: using the voice activity detection VAD method, the lower threshold SNR of the voiced segments is usedthUpdating the estimated value of the additive noise power spectrum:
<math> <mrow> <msubsup> <mover> <mi>&sigma;</mi> <mo>^</mo> </mover> <mi>v</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msubsup> <mover> <mi>&sigma;</mi> <mo>^</mo> </mover> <mi>v</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mtd> <mtd> <msub> <mi>ifSNR</mi> <mi>k</mi> </msub> <mo>></mo> <msub> <mi>SNR</mi> <mi>th</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>0.98</mn> <mo>&CenterDot;</mo> <msubsup> <mover> <mi>&sigma;</mi> <mo>^</mo> </mover> <mi>v</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>+</mo> <mn>0.02</mn> <mo>&CenterDot;</mo> <msubsup> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mi>else</mi> </mtd> </mtr> </mtable> </mfenced> </mrow> </math>
wherein,representing an estimate of the additive noise power spectrum of the previous frame k-1,representing the estimated value of the noisy speech power spectrum of the current frame k, wherein k is the frame number;
step c 3: obtaining an additive noise amplitude spectrum according to the estimated value of the additive noise power spectrum
4. The method of claim 1, wherein step d is preceded by the step of dividing the full frequency band into a plurality of sub-bands and calculating the signal-to-noise ratio in each sub-band, which comprises the steps of:
step 1: dividing the full frequency band into a plurality of sub-frequency bands, and calculating the sub-frequency band bandwidth fd
fd=fs/2R
Wherein f issIs the sampling rate, R is the number of sub-bands;
step 2: calculating the initial frequency point b of each sub-bandrAnd cut-off frequency point er
<math> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mi>b</mi> <mi>r</mi> </msub> <mo>=</mo> <mfrac> <mrow> <mrow> <mo>(</mo> <mi>r</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <msub> <mi>f</mi> <mi>d</mi> </msub> </mrow> <msub> <mi>f</mi> <mi>s</mi> </msub> </mfrac> <mo>&CenterDot;</mo> <mi>&Omega;</mi> <mo>+</mo> <mn>1</mn> </mtd> </mtr> <mtr> <mtd> <msub> <mi>e</mi> <mi>r</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>rf</mi> <mi>d</mi> </msub> <msub> <mi>f</mi> <mi>s</mi> </msub> </mfrac> <mo>&CenterDot;</mo> <mi>&Omega;</mi> </mtd> </mtr> </mtable> </mfenced> </math>
Wherein R1, 2, R Ω represents the length of the fast fourier transform;
and step 3: calculating SNR at the r sub-bandr,r=1,2,...,R:
<math> <mrow> <msub> <mi>SNR</mi> <mi>r</mi> </msub> <mo>=</mo> <mn>10</mn> <msub> <mi>log</mi> <mn>10</mn> </msub> <mrow> <mo>(</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>&omega;</mi> <mo>=</mo> <msub> <mi>b</mi> <mi>r</mi> </msub> </mrow> <msub> <mi>e</mi> <mi>r</mi> </msub> </munderover> <msubsup> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>/</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>&omega;</mi> <mo>=</mo> <msub> <mi>b</mi> <mi>r</mi> </msub> </mrow> <msub> <mi>e</mi> <mi>r</mi> </msub> </munderover> <msubsup> <mover> <mi>&sigma;</mi> <mo>^</mo> </mover> <mi>v</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </math>
Wherein log10Representing natural logarithm operators, [ ·]Is a finite range summation operator, k represents a frame number, ω represents a discrete frequency bin,representing an estimate of the noisy speech power spectrum for the current frame k,representing an estimate of the additive noise power spectrum of the current frame k.
5. The method of claim 4, wherein the over-reduction rule coefficient in step d is calculated as follows:
wherein,the rule coefficients are subtracted from the respective sub-bands.
6. The method of claim 5, wherein the amplitude spectrum in step d is reduced by:
performing spectral subtraction on the r-th sub-band to obtain a pure speech amplitude spectrum of the current frame k:
wherein,is the additive noise amplitude spectrum of the current frame k,is the noisy speech amplitude spectrum of the current frame k, R1, 2, R,ris a subtraction gain factor, the calculation formula is as follows:
<math> <mrow> <msub> <mi>&delta;</mi> <mi>r</mi> </msub> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <mi>for</mi> </mtd> <mtd> <msub> <mi>e</mi> <mi>r</mi> </msub> <mo>&le;</mo> <mi>&Omega;</mi> <mo>&CenterDot;</mo> <mn>1000</mn> <mo>/</mo> <msub> <mi>f</mi> <mi>s</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>1.5</mn> </mtd> <mtd> <mi>for</mi> </mtd> <mtd> <mi>&Omega;</mi> <mo>&CenterDot;</mo> <mn>1000</mn> <mo>/</mo> <msub> <mi>f</mi> <mi>s</mi> </msub> <mo>&lt;</mo> <msub> <mi>e</mi> <mi>r</mi> </msub> <mo>&le;</mo> <mi>&Omega;</mi> <mo>/</mo> <mn>2</mn> <mo>-</mo> <mi>&Omega;</mi> <mo>&CenterDot;</mo> <mn>3000</mn> <mo>/</mo> <msub> <mi>f</mi> <mi>s</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>2</mn> </mtd> <mtd> <mi>for</mi> </mtd> <mtd> <msub> <mi>e</mi> <mi>r</mi> </msub> <mo>></mo> <mi>&Omega;</mi> <mo>/</mo> <mn>2</mn> <mo>-</mo> <mi>&Omega;</mi> <mo>&CenterDot;</mo> <mn>3000</mn> <mo>/</mo> <msub> <mi>f</mi> <mi>s</mi> </msub> </mtd> </mtr> </mtable> </mfenced> </mrow> </math>
7. the method of claim 4, wherein the second order magnitude compensation factor is calculated in step e as follows:
calculating a second-order amplitude compensation factor mu on each sub-frequency band2,r
<math> <mrow> <msub> <mi>&mu;</mi> <mrow> <mn>2</mn> <mo>,</mo> <mi>r</mi> </mrow> </msub> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <msub> <mi>ifSNR</mi> <mi>r</mi> </msub> <mo>&lt;</mo> <msub> <mi>SNR</mi> <mn>0</mn> </msub> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <msub> <mi>SNR</mi> <mi>r</mi> </msub> <mo>-</mo> <msub> <mi>SNR</mi> <mn>0</mn> </msub> </mrow> <mn>4</mn> </mfrac> <mo>&CenterDot;</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mfrac> <msup> <mrow> <mo>(</mo> <msub> <mi>SNR</mi> <mi>r</mi> </msub> <mo>-</mo> <msub> <mi>SNR</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mn>8</mn> </mfrac> </mrow> </msup> </mtd> <mtd> <mi>else</mi> </mtd> </mtr> </mtable> </mfenced> </mrow> </math>
Wherein the SNR0Is a threshold for determining whether there is voice activity.
8. The method of claim 7, wherein the compensating of the clean speech amplitude spectrum in step e is performed by:
performing amplitude compensation on the r-th sub-band to obtain an enhanced pure speech amplitude spectrum
<math> <mrow> <msub> <mover> <mi>&alpha;</mi> <mo>~</mo> </mover> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&mu;</mi> <mrow> <mn>1</mn> <mo>,</mo> <mi>r</mi> </mrow> </msub> <mo>&CenterDot;</mo> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> </msub> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&mu;</mi> <mrow> <mn>2</mn> <mo>,</mo> <mi>r</mi> </mrow> </msub> <mo>&CenterDot;</mo> <msubsup> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mrow> </math>
Wherein R is 1, 21,rFor a predetermined first-order compensation factor, mu2,rIs a second order magnitude compensation factor.
9. The method as claimed in claim 6, wherein said step d further performs half-wave rectification on the spectrum of the clean speech amplitude of the current frame k obtained by said spectral subtraction:
<math> <mrow> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mi>if</mi> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> <mo>></mo> <mi>&phi;</mi> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mi>&phi;</mi> <mo>&CenterDot;</mo> <msub> <mover> <mi>&alpha;</mi> <mo>^</mo> </mover> <mi>y</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>&omega;</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mi>else</mi> </mtd> </mtr> </mtable> </mfenced> </mrow> </math>
where φ is the maximum attenuation threshold.
10. The method of claim 1, wherein step e is performed by applying an enhanced clean speech amplitude spectrumCarrying out fast Fourier inverse transformation to obtain a time domain short-time pure voice signal, and superposing the time domain short-time pure voice signal according to a 75% overlapping rate to obtain an enhanced pure voice signal
CN201210513075.0A 2012-12-04 2012-12-04 Speech enhancement method of multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation Active CN103021420B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210513075.0A CN103021420B (en) 2012-12-04 2012-12-04 Speech enhancement method of multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210513075.0A CN103021420B (en) 2012-12-04 2012-12-04 Speech enhancement method of multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation

Publications (2)

Publication Number Publication Date
CN103021420A CN103021420A (en) 2013-04-03
CN103021420B true CN103021420B (en) 2015-02-25

Family

ID=47969950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210513075.0A Active CN103021420B (en) 2012-12-04 2012-12-04 Speech enhancement method of multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation

Country Status (1)

Country Link
CN (1) CN103021420B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106128480A (en) * 2016-06-21 2016-11-16 安徽师范大学 A kind of method that noisy speech is carried out voice activity detection

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103745729B (en) * 2013-12-16 2017-01-04 深圳百科信息技术有限公司 A kind of audio frequency denoising method and system
US9240819B1 (en) * 2014-10-02 2016-01-19 Bose Corporation Self-tuning transfer function for adaptive filtering
CN106328151B (en) * 2015-06-30 2020-01-31 芋头科技(杭州)有限公司 ring noise eliminating system and application method thereof
CN108022595A (en) * 2016-10-28 2018-05-11 电信科学技术研究院 A kind of voice signal noise-reduction method and user terminal
CN107437417B (en) * 2017-08-02 2020-02-14 中国科学院自动化研究所 Voice data enhancement method and device based on recurrent neural network voice recognition
WO2019024008A1 (en) * 2017-08-02 2019-02-07 中国科学院自动化研究所 Voice data enhancing method and device in voice recognition based on recurrent neural network
CN107833579B (en) * 2017-10-30 2021-06-11 广州酷狗计算机科技有限公司 Noise elimination method, device and computer readable storage medium
CN108430074A (en) * 2018-01-24 2018-08-21 深圳市科虹通信有限公司 A kind of measurement method and its system of the interference of LTE system subband
CN108735213B (en) * 2018-05-29 2020-06-16 太原理工大学 Voice enhancement method and system based on phase compensation
CN108831500B (en) * 2018-05-29 2023-04-28 平安科技(深圳)有限公司 Speech enhancement method, device, computer equipment and storage medium
CN109643554B (en) * 2018-11-28 2023-07-21 深圳市汇顶科技股份有限公司 Adaptive voice enhancement method and electronic equipment
CN109319351A (en) * 2018-11-28 2019-02-12 广州市煌子辉贸易有限公司 A kind of intelligent garbage bin with sound identifying function
CN110797041B (en) * 2019-10-21 2023-05-12 珠海市杰理科技股份有限公司 Speech noise reduction processing method and device, computer equipment and storage medium
CN111508514A (en) * 2020-04-10 2020-08-07 江苏科技大学 Single-channel speech enhancement algorithm based on compensation phase spectrum
CN112416120B (en) * 2020-10-13 2023-08-25 深圳供电局有限公司 Intelligent multimedia interaction system based on wearable equipment
CN112951262B (en) * 2021-02-24 2023-03-10 北京小米松果电子有限公司 Audio recording method and device, electronic equipment and storage medium
CN113851151A (en) * 2021-10-26 2021-12-28 北京融讯科创技术有限公司 Masking threshold estimation method, device, electronic equipment and storage medium
CN115602191A (en) * 2022-12-12 2023-01-13 杭州兆华电子股份有限公司(Cn) Noise elimination method of transformer voiceprint detection system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6564184B1 (en) * 1999-09-07 2003-05-13 Telefonaktiebolaget Lm Ericsson (Publ) Digital filter design method and apparatus
CN101320566A (en) * 2008-06-30 2008-12-10 中国人民解放军第四军医大学 Non-air conduction speech reinforcement method based on multi-band spectrum subtraction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6564184B1 (en) * 1999-09-07 2003-05-13 Telefonaktiebolaget Lm Ericsson (Publ) Digital filter design method and apparatus
CN101320566A (en) * 2008-06-30 2008-12-10 中国人民解放军第四军医大学 Non-air conduction speech reinforcement method based on multi-band spectrum subtraction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于听觉掩蔽效应的多带谱减语音增强算法;牟海维等;《大庆石油学院学报》;20091031;第33卷(第5期);103-106,126 *
基于非平稳噪声估计的改进谱减语音增强算法;孙晋松等;《计算机工程与应用》;20101231;第46卷(第5期);120-122 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106128480A (en) * 2016-06-21 2016-11-16 安徽师范大学 A kind of method that noisy speech is carried out voice activity detection

Also Published As

Publication number Publication date
CN103021420A (en) 2013-04-03

Similar Documents

Publication Publication Date Title
CN103021420B (en) Speech enhancement method of multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation
CN106340292B (en) A kind of sound enhancement method based on continuing noise estimation
EP2031583B1 (en) Fast estimation of spectral noise power density for speech signal enhancement
CA2732723C (en) Apparatus and method for processing an audio signal for speech enhancement using a feature extraction
JP4958303B2 (en) Noise suppression method and apparatus
JP5528538B2 (en) Noise suppressor
CN101976566B (en) Voice enhancement method and device using same
CN111128213B (en) Noise suppression method and system for processing in different frequency bands
US7957964B2 (en) Apparatus and methods for noise suppression in sound signals
KR101737824B1 (en) Method and Apparatus for removing a noise signal from input signal in a noisy environment
JP2004502977A (en) Subband exponential smoothing noise cancellation system
CN110808059A (en) Speech noise reduction method based on spectral subtraction and wavelet transform
Islam et al. Speech enhancement based on a modified spectral subtraction method
US7917359B2 (en) Noise suppressor for removing irregular noise
US11622208B2 (en) Apparatus and method for own voice suppression
US10297272B2 (en) Signal processor
Bahadur et al. Performance measurement of a hybrid speech enhancement technique
Rao et al. Speech enhancement using sub-band cross-correlation compensated Wiener filter combined with harmonic regeneration
JP3849679B2 (en) Noise removal method, noise removal apparatus, and program
Kauppinen et al. Improved noise reduction in audio signals using spectral resolution enhancement with time-domain signal extrapolation
Sunnydayal et al. Speech enhancement using sub-band wiener filter with pitch synchronous analysis
Zhang et al. Fundamental frequency estimation combining air-conducted speech with bone-conducted speech in noisy environment
CN109346106B (en) Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting
Yang et al. Environment-Aware Reconfigurable Noise Suppression
Shimamura et al. Noise estimation with an inverse comb filter in non-stationary noise environments

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant