CN101778322A - Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic - Google Patents

Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic Download PDF

Info

Publication number
CN101778322A
CN101778322A CN200910250393A CN200910250393A CN101778322A CN 101778322 A CN101778322 A CN 101778322A CN 200910250393 A CN200910250393 A CN 200910250393A CN 200910250393 A CN200910250393 A CN 200910250393A CN 101778322 A CN101778322 A CN 101778322A
Authority
CN
China
Prior art keywords
noise
signal
power spectrum
voice signal
target voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910250393A
Other languages
Chinese (zh)
Other versions
CN101778322B (en
Inventor
刘文举
程宁
李超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN2009102503930A priority Critical patent/CN101778322B/en
Publication of CN101778322A publication Critical patent/CN101778322A/en
Application granted granted Critical
Publication of CN101778322B publication Critical patent/CN101778322B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic, aiming at two important factors influencing the postfiltering sound enhancement performance of a microphone array, i.e. accurate estimation for signal parameters and suitable compromise between increasing noise reduction performance and reducing voice distortion. The scheme of the invention comprises the following steps of carrying out time domain alignment on signals collected by the microphone array, and carrying out short-time Fourier transform and characteristic value analysis based of power spectrum; determining the dimensionality of a signal subspace through the existence probability of target voice signal in maximation noise-carried voice signals; self-adaptively selecting a distribution model of a noise power spectrum in the noise-carried voice signals; estimating noise power spectrum by utilizing a conditional probability; estimating an auditory masking threshold value based on the signal subspace; and estimating a postfilter by combining Lagrange multipliers according to the auditory sensing characteristics.

Description

Based on filtering sound enhancement method behind the microphone array of multi-model and auditory properties
Technical field
The present invention relates to the design of signal subspace method, auditory masking effect and the postfilter of microphone array.
Background technology
Real-life voice usually are subjected to The noise in the environment, and the multicenter voice Enhancement Method had been subjected to paying close attention to widely in the last few years.Microphone array voice enhancement method is that with respect to the advantage of single channel sound enhancement method it can utilize the correlation characteristic of estimated signal more accurately between the multiple signals, thereby reaches better voice reinforced effects.Wherein, behind the microphone array filtering sound enhancement method especially since its outstanding anti-acoustic capability obtained in recent years widely using.(list of references 1:K.Uwe Simmer such as Simmer, et al, " Post-filtering techniques ", inMicrophone Arrays, M.Brandstein and D.Ward, Eds.New York:Springer, ch.3, pp.36-60,2001.) having proved that optimum multicenter voice under the least mean-square error meaning strengthens to separate can be decomposed into the non-distortion response of a minimum variance Beam-former and add that a single pass dimension receives the form of postfilter.Although proved the optimality of back filtering method in theory, in actual applications,, limited the performance of back filtering method because the very difficult power spectrum that accurately estimates voice signal and noise signal obtains desirable postfilter.So, reasonably postfilter design, power spectrum signal is estimated to make that the performance of sound enhancement method is significantly improved accurately.Zelinski (list of references 2:R.Zelinski, " A microphone array with adaptive post-filteringfor noise reduction in reverberant rooms ", in Proc.of ICASSP-88,1988, Vol.5, pp.2578-2581.) suppose that the noise signal on each array element is incoherent, proposed a kind of postfilter method for designing.But owing in the actual environment, there is certain correlation between the array element noise, so this method poor-performing.McCowan (list of references 3:Iain A.McCowan, Herv é Bourlard, " Microphone array post-filter based on noise field coherence ", IEEETransaction on Speech and Audio Processing, Vol.11, pp.709-715, Nov.2003.) considered correlation between the noise, utilize the characteristic of shot noise field, proposed a kind of postfilter method for designing, have preferably voice and strengthen the property.But because its method is based on shot noise field hypothesis, so when the noise field in the practical matter did not meet the shot noise field, this method performance can significantly decrease.The present invention utilizes the auditory masking effect of people's ear, has proposed a kind of postfilter method for designing based on auditory perception property.For the spectrum of estimating noise power more accurately, the present invention is signal subspace and noise subspace with the signals with noise spatial decomposition, proposed to exist probability to maximize the method for estimator Spatial Dimension with target voice signal signal, reasonably estimate the dimension of signal subspace and noise subspace, on noise subspace, the method with conditional probability estimating noise power spectrum has been proposed.Experiment showed, that noise estimation method ratio noise estimation method in the past proposed by the invention is more accurate, the postfilter based on auditory perception property that is proposed is also more effective than traditional postfilter.
The frequency domain representation of the Noisy Speech Signal vector that receives on the array of supposing to be made up of L microphone is: X=[X 1..., X L] HThe frequency domain representation of the voice signal after the enhancing that is obtained by the weighting summation of array input signal is as follows:
Y=w HX=w H[Sd+N] (1)
Wherein, model w is the array weight coefficient, and S is an echo signal, d=[d 1..., d L] TBe to propagate vector, N=[N 1..., N L] HBe the noise signal vector, [] HBe the conjugate transpose operator.
Error signal e=S-w HThe power of X is:
φ ee = E [ { S - w H X } { S H - X H w } ] = φ SS - w H φ XS - φ XS H w + w H Φ XX w - - - ( 2 )
Wherein, Φ XXBe the cross power spectrum matrix of multichannel Noisy Speech Signal X, φ XSBe the crosspower spectrum of multichannel Noisy Speech Signal X and single channel echo signal S, φ SSIt is the power spectrum of single channel target voice signal S.
Make φ EeWeight w is differentiated, is zero, can get optimal weighting coefficients:
w opt = Φ XX - 1 φ XS - - - ( 3 )
Under target voice signal and the incoherent hypothesis of noise, (3) formula becomes:
w opt = Φ XX - 1 φ SS d = [ φ SS dd H + Φ NN ] - 1 φ SS d - - - ( 4 )
Use the Sherman-Morrison-Woodbury identity, following formula can be expressed as again:
w opt = [ φ SS φ SS + ( d H Φ NN - 1 d ) - 1 ] Φ NN - 1 d d H Φ NN - 1 d = [ φ SS φ SS + φ NN ] Φ NN - 1 d d H Φ NN - 1 d - - - ( 5 )
Wherein, φ NNBe respectively the auto-power spectrum of single channel noise, Φ NNIt is multi-channel noise cross power spectrum matrix.Formula (5) can be regarded the non-distortion response of a minimum variance Beam-former as
Figure G2009102503930D0000025
Add that a single pass dimension receives postfilter φ SS/ (φ SS+ φ NN).
Summary of the invention
In order to solve prior art problems, the objective of the invention is to the single channel postfilter is designed, utilize many distributed models adaptive selection method and auditory properties to design a kind of new postfilter.The problem that the design of single channel postfilter needs to consider comprises two aspects: good anti-acoustic capability and less target voice signal distortion.Usually, postfilter also may increase the distortion of target voice signal in noise reduction.So the two is reasonably compromised is the problem that the postfilter design must be considered.
For reaching described purpose, the invention provides a kind ofly based on filtering sound enhancement method behind the microphone array of multi-model and auditory properties, the concrete steps of this method are as follows:
Step a: the multi-path voice signal of the microphone array collection band noise of forming by L microphone, the voice signal of each road band noise is carried out time domain alignment, the frequency signal form of each the road signal indication value of pluralizing after using discrete Fourier transform in short-term to align is calculated the spectral power matrix of microphone array multiple signals and this spectral power matrix is carried out characteristic value decomposition and obtains eigenvalue matrix and eigenvectors matrix;
Step b: by the probability that exists of target voice signal in the maximization Noisy Speech Signal, determine the dimension Q of signal subspace, and Q≤L;
Step c: based on the stationarity of spectrum, noise power spectrum distributed model in the adaptively selected Noisy Speech Signal;
Steps d: utilize conditional probability estimating noise power spectrum;
Step e: estimate according to signal subspace dimension and noise power spectrum, utilize auditory masking effect, estimate to obtain the auditory masking threshold of each frequency based on signal subspace;
Step f: according to noise power spectrum, auditory masking threshold, estimate postfilter in conjunction with Lagrange multiplier, residual noise in the feasible enhancing voice is less than the auditory masking threshold of people's ear, thereby eliminate the residual noise influence, and make the distortion of target voice signal as much as possible little, finish that the filtering voice strengthen behind the microphone array.
Wherein, described spectral power matrix is carried out characteristic value decomposition, comprising:
Utilize characteristic value decomposition that the Noisy Speech Signal space is divided into two sub spaces, i.e. signal subspace: to comprise target voice signal and noise; Noise subspace: only comprise noise; The spectral power matrix Φ of Noisy Speech Signal X at time frame t and frequency k XX(k, t) characteristic value decomposition is:
Φ XX(k,t)=UΛ XXU H=U(Λ SSNN(k,t)I)U H
Wherein, X=S+N, X are Noisy Speech Signal, and S is the target voice signal, and N is a noise; Λ XXBe the Noisy Speech Signal power spectrum characteristic value matrix of characteristic value descending, Λ SSBe the target voice signal power spectrum characteristic value matrix of characteristic value descending, U is an eigenvectors matrix, φ NN(k t) is the noise power of time frame t and frequency k, and I is L rank unit matrix, [] HBe the conjugate transpose operator.
Wherein, described definite signal subspace dimension is to get the probability maximum that only Q value makes that the target voice signal exists in the noisy speech; Utilize conditional probability to calculate, step comprises:
Definition exclusive events H 0And H 1:
Incident H 0: in the Noisy Speech Signal, only there is noise, do not have the target voice signal;
Incident H 1: in the Noisy Speech Signal, target voice signal and noise exist simultaneously;
Signal subspace dimension Q is defined as:
arg max Q P ( S ( k , t ) | H 1 )
Wherein, (k t) is the power spectrum of target voice signal signal on k Frequency point of t frame to S, and P () is the distribution function of target voice signal spectrum, and argmax () is an operator of seeking the parameter value with maximum scores.
Wherein, described stationarity based on spectrum, noise power spectrum distributed model in the adaptively selected Noisy Speech Signal may further comprise the steps:
Step c1: define a discriminant function Ω who is used for explaining the stationarity of power spectrum:
Ω = Π i = Q + 1 L λ X i ( L - Q ) 1 L - Q Σ i = Q + 1 L λ X i
That is, Ω is a geometric average
Figure G2009102503930D0000043
To arithmetic average
Figure G2009102503930D0000044
Ratio, wherein,
Figure G2009102503930D0000045
Be Noisy Speech Signal power spectrum characteristic value matrix Λ XXI characteristic value, i ∈ Q+1 ..., L} is the subscript of characteristic value, the value of Ω is between 0 to 1;
Step c2: compare according to discriminant score and predetermined threshold value, determine to be useful in the noise power spectrum distributed model in the Noisy Speech Signal.
Wherein, described comparison step according to discriminant score and predetermined threshold value comprises:
Step c21: determine two predetermined threshold value Ω 1And Ω 2, Ω 1<Ω 2
Step c22: compare discriminant function and predetermined threshold value, especially, if discriminant function is less than predetermined threshold value Ω 1, then select the zero-mean Gaussian Profile for use; If differentiate greater than predetermined threshold value Ω 2, then select Gamma distribution for use; Otherwise select laplacian distribution for use.
Wherein, utilize the step of conditional probability estimating noise power spectrum to comprise:
For each frame Noisy Speech Signal, the probability that it only contains noise is P (H 0| X), promptly containing the probability that noise contains the target voice signal again is P (H 1| X); At both of these case, the estimating noise power spectrum is as follows respectively:
H 0 : φ NN 0 = 1 L Σ i = 1 L λ X i H 1 : φ NN 1 = 1 L - Q Σ i = Q + 1 L λ X i
Wherein,
Figure G2009102503930D0000052
With
Figure G2009102503930D0000053
Be respectively that noise is at exclusive events H 0And H 1Power spectrum under a situation arises, i ∈ 1 ..., L} is the subscript of characteristic value;
According to condition probability formula, noise power spectrum is estimated as follows:
φ ~ NN = P ( H 0 | X ) φ NN 0 + P ( H 1 | X ) φ NN 1 .
Wherein, the step of described estimation auditory masking threshold comprises:
Step f1: auditory frequency range 0-15500Hz is divided into several crucial sub-bands;
Step f2: calculate the auditory masking threshold in each sub-band respectively.
Wherein, auditory masking threshold in each sub-band of described calculating is the energy that calculates each frequency on each sub-band, calculate the propagation coefficient of people's ear basement membrane for each frequency range sound, then the propagation coefficient of the energy of each frequency on each sub-band and each frequency range sound being multiplied each other obtains the epilamellar excitation energy value of people's ear, and the functional relation according to epilamellar excitation energy value of people's ear and auditory masking threshold calculates masking threshold again.
Wherein, described step in conjunction with Lagrange multiplier estimation postfilter G is as follows:
Step fa: under the constraints of residual noise power, minimize the distortion of target voice signal, set up optimization problem with this less than masking threshold;
Step fb: find the solution in conjunction with Lagrange multiplier, obtain the optimal estimation of postfilter;
Step fc: bring auditory masking threshold and noise power spectrum into and estimate, finish the design of postfilter.
Beneficial effect of the present invention: the present invention utilizes the auditory masking effect of people's ear to propose a kind of rational half-way house, has designed a kind of new postfilter based on auditory perception property.Traditional noise estimation method is based on the noise estimation method of VAD, just detects the pure noise frame in the noisy speech, estimates noise power spectrum on voice and the noise hybrid frame with the average power spectra on these frames.Because noise changes, the noise on each frame is actually different.So, compose the noise power spectrum of estimating on all frames based on the noise estimation method of VAD with the average noise power on the pure noise frame and can cause bigger evaluated error.At this situation, the present invention proposes a kind of noise power spectrum method of estimation based on the signals with noise Subspace Decomposition, all estimating noise power is composed on each frame signal, has reduced the Noise Estimation error greatly.Then, the present invention utilizes the auditory masking effect design postfilter of people's ear, makes the residual noise that strengthens in the voice of back be sheltered by the target voice, has also reduced the distortion of target voice in noise reduction.
Description of drawings
Further characteristic of the present invention and advantage will be described below with reference to illustrative accompanying drawing.
Fig. 1 illustrate an application based on the microphone array of multi-model and auditory properties after the example flow diagram of filtering sound enhancement method;
Fig. 2 is the flow chart of a definite signal subspace dimension method;
Fig. 3 is the flow chart of noise power spectrum distributed model in the definite Noisy Speech Signal;
Fig. 4 is a flow chart that utilizes conditional probability estimating noise power spectrum;
Fig. 5 is a flow chart that calculates auditory masking threshold;
Fig. 6 is the flow chart of a design postfilter.
Embodiment
The following detailed description that should be appreciated that different examples and accompanying drawing is not to be intended to the present invention is limited to special illustrative embodiment; The illustrative embodiment that is described only is illustration each step of the present invention, and its scope is defined by additional claim.
The present invention utilizes the auditory masking effect of people's ear to propose a kind of rational half-way house, has designed a kind of new postfilter based on auditory perception property.The auditory masking effect of people's ear is meant, under normal conditions, target voice signal signal is strong signal, and background noise relatively a little less than, auditory system can be determined auditory masking threshold on the frequency domain according to concrete target voice signal signal like this, if filtered residual noise is limited under the auditory masking threshold of people's ear, this noise just can not be perceived by the human ear so, thereby realizes the enhancing to Noisy Speech Signal.Concrete step is as follows:
A kind of new based on filtering sound enhancement method behind the microphone array of multi-model and auditory properties, comprise the following steps:
Step a: the multi-path voice signal of the microphone array collection band noise of forming by L microphone, the voice signal of each road band noise is carried out time domain alignment, the frequency signal form of each the road signal indication value of pluralizing after using discrete Fourier transform in short-term to align is calculated the spectral power matrix of microphone array multiple signals and this spectral power matrix is carried out characteristic value decomposition and obtains eigenvalue matrix and eigenvectors matrix;
Step b:, determine the dimension Q of signal subspace by the probability that exists of target voice signal in the maximization Noisy Speech Signal;
Step c: based on the stationarity of spectrum, noise power spectrum distributed model in the adaptively selected Noisy Speech Signal;
Steps d: utilize conditional probability estimating noise power spectrum;
Step e: estimate according to signal subspace dimension and noise power spectrum, utilize auditory masking effect, estimate to obtain the auditory masking threshold of each frequency based on signal subspace;
Step f: according to noise power spectrum, auditory masking threshold, estimate postfilter in conjunction with Lagrange multiplier, residual noise in the feasible enhancing voice is less than the auditory masking threshold of people's ear, thereby eliminate the residual noise influence, and make the distortion of target voice signal as much as possible little, finish that the filtering voice strengthen behind the microphone array.
Normally used noise estimation method is based on the noise estimation method of VAD.Just detect the pure noise frame in the noisy speech, estimate noise power spectrum on voice and the noise hybrid frame with the average power spectra on these frames.Because noise changes, the noise on each frame is actually different.So, compose the noise power spectrum of estimating on all frames based on the noise estimation method of VAD with the average noise power on the pure noise frame and can cause bigger evaluated error.
At this situation, step b) of the present invention and step d) have adopted a kind of method based on the signals with noise Subspace Decomposition to come the dimension and the noise power spectrum of estimating noise subspace, all estimating noise power is composed on each frame signal, has greatly reduced the Noise Estimation error.
Under target voice signal and the incoherent hypothesis of noise, Noisy Speech Signal is at the spectral power matrix Φ of time frame t and frequency k XX(k t) can be expressed as target voice signal signal power spectrum matrix Φ SS(k is t) with noise signal spectral power matrix Φ NN(k, t) sum:
Φ XX(k,t)=Φ SS(k,t)+Φ NN(k,t) (6)
For microphone array signals, can suppose that the auto-power spectrum of noise signal on each array element equates, and noise signal is uncorrelated between array element, then following formula is set up:
Φ NN(k,t)=φ NN(k,t)I (7)
Wherein, I is L rank unit matrixs, φ NN(k t) is the auto-power spectrum of single channel noise.
Make the characteristic value decomposition of target voice signal spectral power matrix be:
Φ SS(k,t)=UΛ SSU H (8)
Wherein, Λ SSBe the eigenvalue matrix of characteristic value descending, U is the characteristic of correspondence vector matrix, and Q is a rank of matrix, and Q≤L.
Utilize characteristic value decomposition the signals with noise space can be divided into two sub spaces: signal subspace (comprising target voice signal and noise) and noise subspace (only comprising noise).If signals with noise spectral power matrix characteristic value decomposition is:
Φ XX(k,t)=UΛ XXU H=U(Λ SSNN(k,t)I)U H (9)
Λ XXBe the Noisy Speech Signal power spectrum characteristic value matrix of characteristic value descending, I is L rank unit matrix.
The present invention proposes and from noise subspace, estimate to obtain noise auto-power spectrum φ NNMethod.At first need to determine the dimension Q and the noise subspace dimension P of signal subspace.
In step b), provide a kind of probability that exists to determine the method for Q by target voice signal in the maximization Noisy Speech Signal, promptly get the probability maximum that only Q value makes that the target voice signal exists.
Utilize conditional probability to calculate, definition exclusive events H 0And H 1:
Incident H 0: in the Noisy Speech Signal, only there is noise, do not have the target voice signal;
Incident H 1: in the Noisy Speech Signal, target voice signal and noise exist simultaneously;
Signal subspace dimension Q is defined as:
arg max Q P ( S ( k , t ) | H 1 ) - - - ( 10 )
Wherein, (k t) is the power spectrum of target voice signal signal on k Frequency point of t frame to S, and P () is the distribution function of target voice signal spectrum, and argmax () is an operator of seeking the parameter value with maximum scores.
Step c) provides a kind of adaptive approach based on noise power spectrum distributed model in the stationarity select tape noisy speech signal of spectrum.This method comprises the following steps:
At first, definition discriminant function Ω
Ω = Π i = Q + 1 L λ X i ( L - Q ) 1 L - Q Σ i = Q + 1 L λ X i - - - ( 11 )
That is, Ω is a geometric average
Figure G2009102503930D0000092
To arithmetic average
Figure G2009102503930D0000093
Ratio wherein,
Figure G2009102503930D0000094
Be Noisy Speech Signal power spectrum characteristic value matrix Λ XXI characteristic value, i ∈ Q+1 ..., L} is the subscript of characteristic value, the value of Ω is between 0 to 1.
Then, determine two predetermined threshold value, Ω 1And Ω 21<Ω 2), compare discriminant function and predetermined threshold value, especially, if discriminant function is less than predetermined threshold value Ω 1, then select the zero-mean Gaussian Profile for use; If differentiate greater than predetermined threshold value Ω 2, then select Gamma distribution for use; Otherwise select laplacian distribution for use.
In step d), provide a kind of method of utilizing conditional probability estimating noise power spectrum.For each frame Noisy Speech Signal, the probability that it only contains noise is P (H 0| X), promptly containing the probability that noise contains the target voice signal again is P (H 1| X); At both of these case, the estimating noise power spectrum is as follows respectively:
H 0 : φ NN 0 = 1 L Σ i = 1 L λ X i H 1 : φ NN 1 = 1 L - Q Σ i = Q + 1 L λ X i - - - ( 12 )
Wherein, i ∈ 1 ..., L} is the subscript of characteristic value,
Figure G2009102503930D0000096
With
Figure G2009102503930D0000097
Be respectively noise at exclusive events H0 and the H1 power spectrum under a situation arises.
According to condition probability formula, the noise power spectrum method of estimation is as follows:
φ ~ NN = P ( H 0 | X ) · φ NN 0 + P ( H 1 | X ) · φ NN 1 - - - ( 13 )
Step e) provides a kind of to be estimated according to signal subspace dimension and noise power spectrum, and utilize auditory masking effect, estimation obtains the method for the auditory masking threshold of each frequency based on signal subspace.
Auditory frequency range is 0 to 15500Hz, has covered 24 critical sub-bands, need calculate auditory masking threshold in each sub-band.At first calculate the energy of each frequency on each sub-band, calculate the propagation coefficient of people's ear basement membrane for each frequency range sound again, the propagation coefficient of the energy of each frequency on each sub-band and each frequency range sound is multiplied each other obtains the epilamellar excitation energy value of people's ear then.At last, the functional relation according to epilamellar excitation energy value of people's ear and auditory masking threshold further calculates masking threshold again.
It is a kind of according to noise power spectrum, auditory masking threshold that step f) provides, and estimates postfilter G (e in conjunction with Lagrange multiplier J ω) method.Residual noise in the feasible enhancing voice influences thereby eliminate residual noise, and makes the distortion of target voice signal as much as possible little less than the auditory masking threshold of people's ear.The filtering voice strengthen after finishing microphone array.
The output signal of supposing the non-distortion response of minimum variance Beam-former is Target voice signal signal is S (e J ω), the voice signal after back filtering strengthens and the error of target voice signal signal can be expressed as follows:
E ( e jω ) = G ( e jω ) S ~ ( e jω ) - S ( e jω ) = [ G ( e jω ) - 1 ] S ( e jω ) + G ( e jω ) N ~ ( e jω ) - - - ( 14 )
Wherein, For
Figure G2009102503930D0000104
In noise.
Describe the distortion that strengthens target voice signal in the voice for first in the formula (14), described the size that strengthens residual noise in the voice for second.Can calculate a suitable postfilter G (e J ω) make to strengthen residual noise in the voice less than the auditory masking threshold of people's ear, thus its influence eliminated.At formula (14), the present invention proposes following goal constraint:
min E T = [ G ( e jω ) - 1 ] 2 S ( e jω ) 2 + G ( e jω ) 2 N ~ ( e jω ) 2 - - - ( 15 )
Constraints:
G ( e jω ) 2 N ~ ( e jω ) 2 ≤ C thr - - - ( 16 )
Wherein, C ThrBe auditory masking threshold.
Find the solution order with method of Lagrange multipliers:
J = E T + μ ( G ( e jω ) 2 N ~ ( e jω ) 2 - C thr ) - - - ( 17 )
Wherein, μ is a Lagrange multiplier.
Make J to G (e J ω) differentiate, and be zero, can get:
G ( e jω ) = S ( e jω ) 2 S ( e jω ) 2 + ( 1 + μ ) N ~ ( e jω ) 2 - - - ( 18 )
Can be found out under goal constraint of the present invention by formula (18), be exactly the Weiner filter of more reasonably having estimated noise on expression-form based on the postfilter of auditory perception property.
Make J to the μ differentiate, and be zero, can get:
G ( e jω ) = C thr N ~ ( e jω ) 2 - - - ( 19 )
Equate by (18) and (19) two formulas, can get:
1 + μ = S ( e jω ) 2 N ~ ( e jω ) 2 max ( N ~ ( e jω ) 2 C thr - 1,0 ) - - - ( 20 )
(20) are brought into (18), and with in the formula (13)
Figure G2009102503930D0000113
Replace
Figure G2009102503930D0000114
It is as follows to obtain the postfilter based on auditory perception property that this paper carries:
G ( e jω ) = 1 1 + max ( φ ~ NN C thr - 1,0 ) - - - ( 21 )
In Fig. 1, go out an application based on the microphone array of multi-model and auditory properties after filtering sound enhancement method flow chart.System comprises the microphone array of at least two microphones 101.
The microphone of microphone array has different arrangements, and especially, microphone 101 is placed in a row, and wherein each microphone and adjoining microphone have predeterminable range.For example, the distance between two microphones may approximately be 5 centimetres.For different applied environments and specification requirement, microphone array may be set in place.
The voice signal of gathering from microphone 101 is sent to signal processing unit 102.Before being sent to signal processing unit, voice signal can come the preliminary treatment voice signal through low pass filter.
The defeated voice signal of gathering of 102 pairs of different microphones of signal processing unit carries out delay compensation to realize time domain alignment.Each microphone signal after using discrete Fourier transform in short-term to align is expressed as the frequency signal form of complex values, calculates the spectral power matrix Φ of the multichannel Noisy Speech Signal of microphone array collection at time frame t, frequency k XX(k t) and to this matrix carries out characteristic value decomposition, obtains eigenvalue matrix Λ XXWith eigenvectors matrix U.
In following step 103, utilize eigenvalue matrix Λ XX,, determine the dimension Q of signal subspace by the probability method that exists of target voice signal in the maximization Noisy Speech Signal.
Then, step 104 is utilized the dimension Q of signal subspace, based on the stationarity of spectrum, noise power spectrum distributed model in the adaptively selected Noisy Speech Signal.
Step 105 is utilized signal subspace dimension Q and noise power spectrum distributed model, composes according to the conditional probability estimating noise power.
Step 106 utilizes signal subspace dimension and noise power spectrum to estimate, according to auditory masking effect, estimates to obtain the auditory masking threshold of each frequency based on signal subspace.
At last, step 107 utilizes noise power spectrum to estimate and auditory masking threshold, in conjunction with Lagrange multiplier design postfilter.
At Fig. 2, the flow process of the method for a definite signal subspace dimension has been described, this method is corresponding to the step 103 among Fig. 1.
Through step 101 and step 102, the voice signal that microphone array is gathered has passed through time domain alignment, Short Time Fourier Transform.And to the power spectrum Φ of multichannel Noisy Speech Signal XXCarry out characteristic value decomposition, obtain eigenvalue matrix Λ XXWith eigenvectors matrix U.By (9) formula, signals with noise power spectrum characteristic value matrix be broken down into power spectrum signal characteristic value and noise power spectrum characteristic value and, Q is the dimension of signal subspace.
In first step 201, the dimension Q of initializing signal subspace, making it is 1.
Next, step 202 is upgraded noise power spectrum and target voice signal power spectrum.Because Noisy Speech Signal power spectrum characteristic value matrix Λ XXBe descending, and the hypothesis signal strength signal intensity is greater than noise, so when the dimension of signal subspace was Q, the power of noise was
φ NN = 1 L - Q Σ i = Q + 1 L λ X i - - - ( 22 )
Wherein, i ∈ Q+1 ..., L} is the subscript of characteristic value.
And the power of target voice signal is
S = 1 Q Σ i = 1 Q ( λ X i - φ NN ) 1 2 - - - ( 23 )
Wherein, i ∈ 1 ..., Q} is the subscript of characteristic value.
So, the variance of target voice signal is
v s = λ X 1 - φ NN Q = 1 1 Q Σ i = 1 Q [ ( λ X i - φ NN ) 1 2 - S ] 2 Q > 1 - - - ( 24 )
Wherein, wherein, i ∈ 1 ..., Q} is the subscript of characteristic value.
Step 203 selects a spectrum of describing the target voice signal to distribute from Gauss model, laplace model and gamma model arbitrarily.Calculate the conditional probability P of target voice signal G(S (k, t) | H 1), especially, when selecting Gauss model,
P G ( S ( k , t ) | H 1 ) = 1 2 πv s ( k , t ) exp { - S 2 ( k , t ) 2 v s ( k , t ) }
Step 204 realizes that variable Q and j's adds computing certainly:
Q=Q+1
Then step 205 is judged loop termination condition Q>L, especially, when condition does not satisfy, returns step 202; Otherwise carry out step 206.
Formula that step 206 is utilized (10) of the present invention has finally been determined the dimension Q of signal subspace, promptly
arg max Q P ( S ( k , t ) | H 1 ) .
In Fig. 3, the flow chart of noise power spectrum distributed model in the definite Noisy Speech Signal has been described.This method is corresponding to the step 104 among Fig. 1.
Gauss model, laplace model and gamma model can be used to describe the spectral coefficient of voice signal and noise signal, but also can be different for its noise characteristic of different noise types, so Model Selection should be carried out targetedly according to the characteristic of target noise.In this example, the statistics according to the computer fan noise has provided the method that a kind of stationarity based on spectrum is carried out Model Selection.
In step 301, calculate discriminant score Ω by (11) formula.
Step 302 judges that whether discriminant score Ω is less than Ω 1If judged result is true, then selects Gauss model; Otherwise execution in step 303 judges that whether discriminant score Ω is less than Ω 2If judged result is true, then selects laplace model; Otherwise select the gamma model.
The model adaptation selection algorithm that the present invention embodies is based on the result to the data statistics of a large amount of computer fan noise experiment.Experiment finds that Gauss model is an optimal models when Ω gets smaller value, when the Ω value is big, and the laplace model optimum, and the total average noise evaluated error of gamma model is minimum.In view of the above, to carry out Model Selection as follows in the present invention:
Figure G2009102503930D0000133
In Fig. 4, a method flow diagram that utilizes conditional probability estimating noise power spectrum has been described.This method is corresponding to the step 105 among Fig. 1.
Step 401 is calculated the average power spectra of the pure noise frame of Noisy Speech Signal The initial segment
Figure G2009102503930D0000134
Step 402 is calculated the power spectrum of present frame
φ NN cur = 1 L Σ i = 1 L λ X i
Wherein, i ∈ 1 ..., L} is the subscript of characteristic value.
Next step 403 is calculated the ratio of present frame power spectrum and pure noise power spectrum
r = φ NN cur φ NN pre
Step 403 has been finished conditional probability P (H jointly to step 408 0| calculating X).The size of r and setting threshold α at first relatively, α gets and is slightly larger than 1 smaller value, and especially, α is taken as 1.2.When r<α, present frame more may be pure noise frame, so P (H 0| X) should get bigger value, the present invention is provided with under it and is limited to 0.8.If work as r>α, present frame more may be a speech frame, at this moment P (H 0| X) should get a suitable value.Because the energy of signal is distributed uneven on each frequency, so, different P (H got according to different frequencies here 0| X) value.When low frequency, P (H 0| value X) should be greater than the value of high frequency, because the energy of signal concentrates on low frequency region mostly.Promptly
P ( H 0 | X ) = max ( 1 1 + r β 1 , 0.8 ) r ≤ 1.2 1 1 + r β 2 if f ≤ f thr 1 1 + r β 3 if f > f thr r > 1.2 - - - ( 26 )
Wherein, f ThrBe the threshold frequency of low-and high-frequency, β 1And β 2It is weight coefficient.
Step 409 design conditions probability P (H 1| X)=1-P (H 0| X).
Obtain conditional probability P (H 0| X) and P (H 1| X), step 410 utilizes (13) formula to obtain the estimated value of noise power spectrum
In Fig. 5, a kind of flow chart that calculates the method for auditory masking threshold has been described.This method is corresponding to the step 106 among Fig. 1.For the masking by noise in the signal is fallen, thereby realize enhancing to target voice signal signal, need be with noise limit at this below threshold value.
Step 501 is 24 sub-frequency bands with 0 to 15500Hz human auditory system scope division, so that calculate auditory masking threshold in each sub-band.
In step 502, utilize the signal subspace dimension of step 206 gained, calculated the energy of each frequency.(j, b) expression is the energy on b frequency in the j sub-frequency bands to H, can calculate according to signal subspace characteristic value and characteristic vector.
H ( j , b ) = mean ( 1 L Σ i = 1 Q λ S i | U 1 , i | 2 ) - - - ( 27 )
Wherein, For the characteristic value of target voice signal spectral power matrix is estimated U 1, iBe i base of signal subspace, i ∈ 1 ..., Q} is that the subscript m ean () of characteristic value is for getting the average operator.
SF (j) is the function of expressing people's ear basement membrane propagation characteristic on the j sub-frequency bands, j ∈ 1 ..., 24}.
In step 503, calculate the propagation function of each sub-band
SF ( j ) = 15.81 + 7.5 ( j + 0.474 ) - 17.5 1 + ( j + 0.474 ) 2 , j∈{1,…,24} (28)
Next, the excitation energy value of energy on the step 504 computational chart traveller on a long journey ear basement membrane
C(j,b)=SF(j)*H(j,b),j∈{1,…,24} (29)
Step 505 is calculated auditory masking threshold
C thr = 10 log 10 | C ( j , b ) | - | O ( j ) 10 | - | φ ~ NN 10 | - - - ( 30 )
Wherein, O (j) is a side-play amount, j ∈ 1 ..., 24} represents the j sub-frequency bands.
In Fig. 6, the flow chart of a design postfilter has been described.This method is corresponding to the step 107 among Fig. 1.
The power of residual noise is lower than under the condition of auditory masking threshold in the voice after guaranteeing enhancing, for the distortion that makes target voice signal signal reaches minimum.
Step 601 is described constrained optimization problem, and is as follows:
Target:
min E T = [ G ( e jω ) - 1 ] 2 S ( e jω ) 2 + G ( e jω ) 2 N ~ ( e jω ) 2
Constraints:
G ( e jω ) 2 N ~ ( e jω ) 2 ≤ C thr
Step 602 utilizes method of Lagrange multipliers to find the solution, order:
J = E T + μ ( G ( e jω ) 2 N ~ ( e jω ) 2 - C thr )
Make J to G (e J ω) and μ differentiate respectively, and be zero, can get:
G ( e jω ) = S ( e jω ) 2 S ( e jω ) 2 + ( 1 + μ ) N ~ ( e jω ) 2 G ( e jω ) = C thr N ~ ( e jω ) 2
Step 603 is found the solution this equation, obtains the optimal estimation of postfilter, that is:
G ( e jω ) = 1 1 + max ( φ ~ NN C thr - 1,0 )
The noise power spectrum that again step 410 is obtained is estimated
Figure G2009102503930D0000163
With the 505 auditory masking threshold C that obtain ThrBring into, step 604 is finished the design of postfilter.
According to this specification, the further modifications and variations of the present invention are conspicuous for the technical staff in described field.Therefore, this explanation will be regarded as illustrative and its objective is to one of ordinary skill in the art's instruction being used to carry out conventional method of the present invention.Should be appreciated that the form of the present invention that this specification illustrates and describes just is counted as current preferred embodiment.

Claims (9)

1. one kind based on filtering sound enhancement method behind the microphone array of multi-model and auditory properties, it is characterized in that, comprises the following steps:
Step a: the multi-path voice signal of the microphone array collection band noise of forming by L microphone, the voice signal of each road band noise is carried out time domain alignment, the frequency signal form of each the road signal indication value of pluralizing after using discrete Fourier transform in short-term to align is calculated the spectral power matrix of microphone array multiple signals and this spectral power matrix is carried out characteristic value decomposition and obtains eigenvalue matrix and eigenvectors matrix;
Step b: by the probability that exists of target voice signal in the maximization Noisy Speech Signal, determine the dimension Q of signal subspace, and Q≤L;
Step c: based on the stationarity of spectrum, noise power spectrum distributed model in the adaptively selected Noisy Speech Signal;
Steps d: utilize conditional probability estimating noise power spectrum;
Step e: estimate according to signal subspace dimension and noise power spectrum, utilize auditory masking effect, estimate to obtain the auditory masking threshold of each frequency based on signal subspace;
Step f: according to noise power spectrum, auditory masking threshold, estimate postfilter in conjunction with Lagrange multiplier, residual noise in the feasible enhancing voice is less than the auditory masking threshold of people's ear, thereby eliminate the residual noise influence, and make the distortion of target voice signal as much as possible little, finish that the filtering voice strengthen behind the microphone array.
2. the method for claim 1 is characterized in that, described spectral power matrix is carried out characteristic value decomposition, comprising:
Utilize characteristic value decomposition that the Noisy Speech Signal space is divided into two sub spaces, i.e. signal subspace: to comprise target voice signal and noise; Noise subspace: only comprise noise; The spectral power matrix Φ of Noisy Speech Signal X at time frame t and frequency k XX(k, t) characteristic value decomposition is:
Φ XX(k,t)=UΛ XXU H=U(Λ SSNN(k,t)I)U H
Wherein, X=S+N, X are Noisy Speech Signal, and S is the target voice signal, and N is a noise; Λ XXBe the Noisy Speech Signal power spectrum characteristic value matrix of characteristic value descending, Λ SSBe the target voice signal power spectrum characteristic value matrix of characteristic value descending, U is an eigenvectors matrix, φ NN(k t) is the noise power of time frame t and frequency k, and I is L rank unit matrix, [] HBe the conjugate transpose operator.
3. the method for claim 1 is characterized in that, described definite signal subspace dimension is to get the probability maximum that only Q value makes that the target voice signal exists in the noisy speech; Utilize conditional probability to calculate, step comprises:
Definition exclusive events H 0And H 1:
Incident H 0: in the Noisy Speech Signal, only there is noise, do not have the target voice signal;
Incident H 1: in the Noisy Speech Signal, target voice signal and noise exist simultaneously;
Signal subspace dimension Q is defined as:
arg max Q P ( S ( k , t ) | H 1 )
Wherein, (k t) is the power spectrum of target voice signal signal on k Frequency point of t frame to S, and P () is the distribution function of target voice signal spectrum, and argmax () is an operator of seeking the parameter value with maximum scores.
4. the method for claim 1 is characterized in that, described stationarity based on spectrum, and noise power spectrum distributed model in the adaptively selected Noisy Speech Signal may further comprise the steps:
Step c1: define a discriminant function Ω who is used for explaining the stationarity of power spectrum:
Ω = ( L - Q ) Π i = Q + 1 L λ X i 1 L - Q Σ i = Q + 1 L λ X i
That is, Ω is a geometric average
Figure F2009102503930C0000023
To arithmetic average
Figure F2009102503930C0000024
Ratio, wherein,
Figure F2009102503930C0000025
Be Noisy Speech Signal power spectrum characteristic value matrix Λ XXI characteristic value, i ∈ Q+1 ..., L} is the subscript of characteristic value, the value of Ω is between 0 to 1;
Step c2: compare according to discriminant score and predetermined threshold value, determine to be useful in the noise power spectrum distributed model in the Noisy Speech Signal.
5. method as claimed in claim 4 is characterized in that, described comparison step according to discriminant score and predetermined threshold value comprises:
Step c21: determine two predetermined threshold value Ω 1And Ω 2, Ω 1<Ω 2
Step c22: compare discriminant function and predetermined threshold value, especially, if discriminant function is less than predetermined threshold value Ω 1, then select the zero-mean Gaussian Profile for use; If differentiate greater than predetermined threshold value Ω 2, then select Gamma distribution for use; Otherwise select laplacian distribution for use.
6. the method for claim 1 is characterized in that, utilizes the step of conditional probability estimating noise power spectrum to comprise:
For each frame Noisy Speech Signal, the probability that it only contains noise is P (H 0| X), promptly containing the probability that noise contains the target voice signal again is P (H 1| X); At both of these case, the estimating noise power spectrum is as follows respectively:
H 0 : φ NN 0 = 1 L Σ i = 1 L λ X i H 1 : φ NN 1 = 1 L - Q Σ i = Q + 1 L λ X i
Wherein,
Figure F2009102503930C0000032
With
Figure F2009102503930C0000033
Be respectively that noise is at exclusive events H 0And H 1Power spectrum under a situation arises, i ∈ 1 ..., L} is the subscript of characteristic value;
According to condition probability formula, noise power spectrum is estimated as follows:
φ ~ NN = P ( H 0 | X ) φ NN 0 + P ( H 1 | X ) φ NN 1 .
7. the method for claim 1 is characterized in that, the step of described estimation auditory masking threshold comprises:
Step f1: auditory frequency range 0-15500Hz is divided into several crucial sub-bands;
Step f2: calculate the auditory masking threshold in each sub-band respectively.
8. method as claimed in claim 7, it is characterized in that, auditory masking threshold in each sub-band of described calculating is the energy that calculates each frequency on each sub-band, calculate the propagation coefficient of people's ear basement membrane for each frequency range sound, then the propagation coefficient of the energy of each frequency on each sub-band and each frequency range sound being multiplied each other obtains the epilamellar excitation energy value of people's ear, and the functional relation according to epilamellar excitation energy value of people's ear and auditory masking threshold calculates masking threshold again.
9. the method for claim 1 is characterized in that, described step in conjunction with Lagrange multiplier estimation postfilter G is as follows:
Step fa: under the constraints of residual noise power, minimize the distortion of target voice signal, set up optimization problem with this less than masking threshold;
Step fb: find the solution in conjunction with Lagrange multiplier, obtain the optimal estimation of postfilter;
Step fc: bring auditory masking threshold and noise power spectrum into and estimate, finish the design of postfilter.
CN2009102503930A 2009-12-07 2009-12-07 Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic Expired - Fee Related CN101778322B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009102503930A CN101778322B (en) 2009-12-07 2009-12-07 Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009102503930A CN101778322B (en) 2009-12-07 2009-12-07 Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic

Publications (2)

Publication Number Publication Date
CN101778322A true CN101778322A (en) 2010-07-14
CN101778322B CN101778322B (en) 2013-09-25

Family

ID=42514612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009102503930A Expired - Fee Related CN101778322B (en) 2009-12-07 2009-12-07 Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic

Country Status (1)

Country Link
CN (1) CN101778322B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102157156A (en) * 2011-03-21 2011-08-17 清华大学 Single-channel voice enhancement method and system
CN102945674A (en) * 2012-12-03 2013-02-27 上海理工大学 Method for realizing noise reduction processing on speech signal by using digital noise reduction algorithm
CN104575511A (en) * 2013-10-22 2015-04-29 陈卓 Voice enhancement method and device
CN104661152A (en) * 2013-11-25 2015-05-27 奥迪康有限公司 Spatial filterbank for hearing system
CN104737229A (en) * 2012-10-22 2015-06-24 三菱电机株式会社 Method for transforming input signal
CN105244036A (en) * 2014-06-27 2016-01-13 中兴通讯股份有限公司 Microphone speech enhancement method and microphone speech enhancement device
CN105792074A (en) * 2016-02-26 2016-07-20 西北工业大学 Voice signal processing method and device
CN107370898A (en) * 2016-05-11 2017-11-21 华为终端(东莞)有限公司 Tone player method and terminal
CN108028049A (en) * 2015-09-14 2018-05-11 美商楼氏电子有限公司 Microphone signal merges
CN108352818A (en) * 2015-11-18 2018-07-31 华为技术有限公司 Audio-signal processing apparatus for enhancing voice signal and method
CN109979478A (en) * 2019-04-08 2019-07-05 网易(杭州)网络有限公司 Voice de-noising method and device, storage medium and electronic equipment
CN110858485A (en) * 2018-08-23 2020-03-03 阿里巴巴集团控股有限公司 Voice enhancement method, device, equipment and storage medium
CN110875052A (en) * 2018-08-31 2020-03-10 深圳市优必选科技有限公司 Robot voice denoising method, robot device and storage device
CN113362856A (en) * 2021-06-21 2021-09-07 国网上海市电力公司 Sound fault detection method and device applied to power Internet of things
CN113658605A (en) * 2021-10-18 2021-11-16 成都启英泰伦科技有限公司 Speech enhancement method based on deep learning assisted RLS filtering processing
WO2023115269A1 (en) * 2021-12-20 2023-06-29 深圳市韶音科技有限公司 Voice activity detection method and system, and voice enhancement method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程宁等: "基于高斯-拉普拉斯-伽玛模型和人耳听觉掩蔽效应的信号子空间语音增强算法", 《声学学报》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102157156A (en) * 2011-03-21 2011-08-17 清华大学 Single-channel voice enhancement method and system
CN104737229A (en) * 2012-10-22 2015-06-24 三菱电机株式会社 Method for transforming input signal
CN102945674A (en) * 2012-12-03 2013-02-27 上海理工大学 Method for realizing noise reduction processing on speech signal by using digital noise reduction algorithm
CN104575511A (en) * 2013-10-22 2015-04-29 陈卓 Voice enhancement method and device
CN104575511B (en) * 2013-10-22 2019-05-10 陈卓 Sound enhancement method and device
CN104661152A (en) * 2013-11-25 2015-05-27 奥迪康有限公司 Spatial filterbank for hearing system
CN104661152B (en) * 2013-11-25 2020-08-11 奥迪康有限公司 Spatial filter bank for hearing system
CN105244036A (en) * 2014-06-27 2016-01-13 中兴通讯股份有限公司 Microphone speech enhancement method and microphone speech enhancement device
CN108028049A (en) * 2015-09-14 2018-05-11 美商楼氏电子有限公司 Microphone signal merges
US10602267B2 (en) 2015-11-18 2020-03-24 Huawei Technologies Co., Ltd. Sound signal processing apparatus and method for enhancing a sound signal
CN108352818B (en) * 2015-11-18 2020-12-04 华为技术有限公司 Sound signal processing apparatus and method for enhancing sound signal
CN108352818A (en) * 2015-11-18 2018-07-31 华为技术有限公司 Audio-signal processing apparatus for enhancing voice signal and method
CN105792074A (en) * 2016-02-26 2016-07-20 西北工业大学 Voice signal processing method and device
CN105792074B (en) * 2016-02-26 2019-02-05 西北工业大学 A kind of audio signal processing method and device
CN107370898B (en) * 2016-05-11 2020-07-07 华为终端有限公司 Ring tone playing method, terminal and storage medium thereof
CN107370898A (en) * 2016-05-11 2017-11-21 华为终端(东莞)有限公司 Tone player method and terminal
CN110858485A (en) * 2018-08-23 2020-03-03 阿里巴巴集团控股有限公司 Voice enhancement method, device, equipment and storage medium
CN110858485B (en) * 2018-08-23 2023-06-30 阿里巴巴集团控股有限公司 Voice enhancement method, device, equipment and storage medium
CN110875052A (en) * 2018-08-31 2020-03-10 深圳市优必选科技有限公司 Robot voice denoising method, robot device and storage device
CN109979478A (en) * 2019-04-08 2019-07-05 网易(杭州)网络有限公司 Voice de-noising method and device, storage medium and electronic equipment
CN113362856A (en) * 2021-06-21 2021-09-07 国网上海市电力公司 Sound fault detection method and device applied to power Internet of things
CN113658605A (en) * 2021-10-18 2021-11-16 成都启英泰伦科技有限公司 Speech enhancement method based on deep learning assisted RLS filtering processing
CN113658605B (en) * 2021-10-18 2021-12-17 成都启英泰伦科技有限公司 Speech enhancement method based on deep learning assisted RLS filtering processing
WO2023115269A1 (en) * 2021-12-20 2023-06-29 深圳市韶音科技有限公司 Voice activity detection method and system, and voice enhancement method and system

Also Published As

Publication number Publication date
CN101778322B (en) 2013-09-25

Similar Documents

Publication Publication Date Title
CN101778322B (en) Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic
CN108922554B (en) LCMV frequency invariant beam forming speech enhancement algorithm based on logarithmic spectrum estimation
Zhang et al. A speech enhancement algorithm by iterating single-and multi-microphone processing and its application to robust ASR
EP3696814A1 (en) Speech enhancement method and apparatus, device and storage medium
CN110739005B (en) Real-time voice enhancement method for transient noise suppression
US9570087B2 (en) Single channel suppression of interfering sources
US7158933B2 (en) Multi-channel speech enhancement system and method based on psychoacoustic masking effects
US7761291B2 (en) Method for processing audio-signals
CN108831495A (en) A kind of sound enhancement method applied to speech recognition under noise circumstance
CN101777349A (en) Auditory perception property-based signal subspace microphone array voice enhancement method
CN109308904A (en) A kind of array voice enhancement algorithm
US9854368B2 (en) Method of operating a hearing aid system and a hearing aid system
CN110634502A (en) Single-channel voice separation algorithm based on deep neural network
US20090304203A1 (en) Method and device for binaural signal enhancement
CN111081267B (en) Multi-channel far-field speech enhancement method
EP2395506B1 (en) Method and acoustic signal processing system for interference and noise suppression in binaural microphone configurations
CN105679330A (en) Digital hearing aid noise reduction method based on improved sub-band signal-to-noise ratio estimation
CN109961799A (en) A kind of hearing aid multicenter voice enhancing algorithm based on Iterative Wiener Filtering
JP6987509B2 (en) Speech enhancement method based on Kalman filtering using a codebook-based approach
TW490656B (en) Method and system for on-line blind source separation
Fontaine et al. Explaining the parameterized Wiener filter with alpha-stable processes
Araki et al. Hybrid approach for multichannel source separation combining time-frequency mask with multi-channel Wiener filter
US20230129873A1 (en) Noise suppression method and system for personal sound amplification product
Esch et al. Model-based speech enhancement using SNR dependent MMSE estimation
US8306249B2 (en) Method and acoustic signal processing device for estimating linear predictive coding coefficients

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130925

CF01 Termination of patent right due to non-payment of annual fee