CN102800322B - Method for estimating noise power spectrum and voice activity - Google Patents

Method for estimating noise power spectrum and voice activity Download PDF

Info

Publication number
CN102800322B
CN102800322B CN201110141137.5A CN201110141137A CN102800322B CN 102800322 B CN102800322 B CN 102800322B CN 201110141137 A CN201110141137 A CN 201110141137A CN 102800322 B CN102800322 B CN 102800322B
Authority
CN
China
Prior art keywords
lambda
overbar
alpha
probability
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110141137.5A
Other languages
Chinese (zh)
Other versions
CN102800322A (en
Inventor
应冬文
颜永红
付强
潘接林
李军锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN201110141137.5A priority Critical patent/CN102800322B/en
Publication of CN102800322A publication Critical patent/CN102800322A/en
Application granted granted Critical
Publication of CN102800322B publication Critical patent/CN102800322B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a method for estimating the noise power spectrum and the voice activity. According to the method, the appear probability of a voice on a frequency sub band and power spectrum information of noise can be finally deduced according to the time sequence relevance of a sequential hidden markov model (SHMM) description language based on first-order regression on each frequency component. The method comprises the following steps of: 1) extracting a logarithmic amplitude spectrum envelop for a voice signal on each frequency component, and constructing a corresponding binary hidden markov model, wherein each state is represented by Gaussian distribution; 2) for a field of voice data, setting M frames of caches, storing the previous M frames of input signals into the caches, extracting M frames of logarithmic amplitude spectrums from the caches, and constructing an initialization model by adopting a maximum likelihood estimation algorithm; and 3) after the initialization model lambdaM is obtained, starting from the (M+1)th frame, gradually updating the HMM of each frequency band by adopting an incremental learning method, and sequentially performing recurrence to obtain a noise value and the appear probability of a voice signal.

Description

A kind of noise power spectrum is estimated and Voice activity detector method
Technical field
The present invention relates to the technical field that voice signal is processed, specifically, the present invention relates to a kind of noise spectrum estimation and Voice activity detector method based on sequential hidden Markov model.Wherein, Voice activity detector is the algorithm that judges that on time dimension whether voice occur, it can answer existence with the form of "Yes" or "No", also can describe with voice probability of occurrence the existence of voice.
Background technology
It is the requisite ingredient of noise reduction algorithm that Voice activity detector and noise power spectrum are estimated, their performance directly affects the performance of noise reduction algorithm, particularly under severe noise circumstance, their remote effect the performance of speech processing system (as speech recognition, words person's identification and speech recognizer).
Most voice application system is had in the face of ambient noise interference.Forefathers have proposed a lot of methods and have removed the interference of noise to voice system, and nearly all method all depends on Voice activity detector and noise power spectrum is estimated.These two modules exist contact closely, and their accuracy directly affects the whole noiseproof feature of system.Although traditional method of estimation is functional, but still have two places to be worth improving:
1, make full use of the sequential correlativity of continuous speech/non-speech audio in a certain frequency component, existing algorithm is abundant not for the utilization of temporal correlation, they often adopt fairly simple single order recurrence smoother to carry out smoothly amplitude spectrum envelope, and the smoothing factor of smoother is fixed.And voice signal itself is exactly the signal of a piecewise stationary, its statistical nature comprises sequential correlativity, and all, along with the time is constantly changing, a fixing model cannot reflect this time-varying characteristics.If we can adopt adaptive model to carry out modeling to sequential correlativity, the performance of algorithm will get a promotion undoubtedly so.This method is not mentioned in documents and materials in the past.
2, the parameter adaptive of traditional sequential HMM adopts the recurrence average mode of high-order, and current HMM parameter set depends on the model in a moment, current observed value and the observed value in a plurality of moment in past, and the mode calculated amount of this parametric regression is huge.If can this high-order be returned and be reduced to single order recurrence in the situation that loss of significance be little, so, the counting yield of algorithm will greatly improve.The sequential HMM algorithm returning based on single order is not mentioned in documents and materials in the past yet.
In addition, traditional solution is the mode based on semi-supervised learning.At initial period, one system need to be made the hypothesis of " noise is initial ", supposes that the beginning of sentence always exists one section of non-speech audio.This section of non-speech audio can be understood as the ground unrest sample of artificial mark, sets up the initialization model of noise from these mark samples, and this is a kind of supervised learning method.Its defect is: this hypothesis is difficult to be met in some applications, such as starting with voice signal when sentence, by causing the initialization failure of noise model, then makes speech detection and noise power spectrum estimate all inaccurate so.This initialized method is open in the patent of Chinese application number 201010178166.4.
Summary of the invention
The object of the object of the invention is, for a kind of noise spectrum estimation and Voice activity detector method based on sequential hidden Markov model is provided, the method utilizes the sequential correlativity that hidden Markov model exists in certain frequency component voice signal to carry out modeling, log power spectrum envelope in certain frequency component can be regarded a Markov chain as, the redirect between voice " appearance " and " not occurring " two states of this chain, for each state, adopt a Gaussian distribution to describe the distribution of its power spectrum, again according to the forward direction factor of HMM, can derive voice certain time frequency probability of occurrence.
For achieving the above object, the invention provides a kind of noise power spectrum estimates and Voice activity detector method, the sequential hidden Markov model SHMM that the method returns based on single order describes the sequential correlativity of voice in each frequency component, and adopt the mode of incremental learning progressively to upgrade SHMM, finally, deduce out the power spectrum information of the probability of occurrence on this frequency subband and the noise of voice, accurately to reflect the sequential statistical nature of voice, the method comprises the following steps:
1) for voice signal, in each frequency component, extract logarithm amplitude spectrum envelope, and set up a corresponding binary hidden Markov model, wherein, one-component represents the distribution of speech energy, another component is the distribution of noise energy, and each state represents by Gaussian distribution;
2) for one section of speech data, set M frame buffer, front M frame input signal is deposited in buffer memory, extract the logarithm amplitude spectrum of M frame in buffer memory, adopt maximum likelihood estimation algorithm to set up an initialized model;
3) obtaining initialized model λ mafterwards, since M+1 frame, adopt the method for incremental learning, upgrade frame by frame the HMM model of each frequency band, recursion obtains the probability of occurrence of noise figure and voice signal successively.
The concrete steps of the method comprise:
1) for voice signal, in each frequency component, extract logarithm amplitude spectrum envelope, for the logarithm amplitude spectrum time series x in a frequency component l={ x 1, x 2..., x l, set up a hidden Markov model s l={ s 1, s 2..., s l, s t{ 0,1} is its corresponding status switch to ∈, and 1 represents that voice go out present condition, and 0 represents that noise goes out present condition, λ lexpression is from sequence x lin the model parameter valuation of obtaining, so, for a given parameter set λ l, corresponding observed value sequence x lprobability density function can be expressed as:
p ( x l | λ l ) = Σ s l p ( s l | λ l ) p ( x l | λ l , s l ) ;
Wherein, p (s l| λ l) expression status switch s lthe prior probability occurring, gaussian component is expressed as:
p ( s l | λ l ) = Π t = 1 l a s t - 1 , s t ;
Here
Figure BDA0000064390660000032
represent state transition probability,
Figure BDA0000064390660000033
represent original state probability, p (x l| λ l, s l) expression given state s lwith parameter set λ lsituation under observed value sequence x llikelihood score:
p ( x l | λ l , s l ) = Π t = 1 l b ( x t | s t , λ l ) ;
Wherein,
b ( x t | s t , λ l ) = 1 2 π κ s t , l exp { - 1 2 ( x t - μ s t , l ) 2 / κ s t , l } ;
change;
μ in this model 0, lbe exactly that we want the noise of estimating, meanwhile, the probability that we can derive that voice signal occurs on certain frequency of l frame is
Figure BDA0000064390660000037
2), for one section of speech data, set M frame buffer, front M frame input signal is deposited in buffer memory in to the logarithm amplitude spectrum of M frame in extraction buffer memory, substitution step 1) HMM model to hidden Markov model λ of initialization on each frequency m, subscript M represents initialized time window length, l>=M;
3) obtaining initialized model λ mafterwards, since M+1 frame, HMM model adopts the method for incremental learning, upgrades frame by frame SHMM model, and recursion obtains λ successively l; And draw noise figure μ 0, lwith the probability of occurrence of voice signal on certain frequency of l frame.
As a kind of improvement of technique scheme, described step 1) in extract a frame amplitude spectrum step comprise:
First, the digitized sound signal of this frame is done to pre-service, establishing every frame length is F point, and first zero padding is to N point, N>=F, N=2 j, j is integer, and j>=8, carries out leaf transformation in N point discrete Fourier, obtains discrete spectrum
Figure BDA0000064390660000038
wherein, y l, nn the sampled point that represents l frame in buffer memory, Y l, kk Fourier transform value of i frame in expression buffer memory (k=0,1 ..., N-1); So, its range value may be calculated in formula, b (r) is windowed function.Described pre-service comprises windowing or/and pre-emphasis; Described windowed function adopts Hanning window or breathes out peaceful window.
As a kind of improvement of technique scheme, described step 2) in the initialization of HMM, on certain frequency, concrete initialized step comprises:
Step 201): the method by cluster is divided into two classes by M sample: with
Figure BDA0000064390660000042
wherein, M 0+ M 1=M, the subscript for class (1) that average is larger represents, another kind of with subscript (0) expression; The method of the cluster described step 201) adopts the non-supervisory cluster of LBG or fuzzy clustering method;
The average of two classes is
Figure BDA0000064390660000043
the average of the class that energy is less is wherein,
Figure BDA0000064390660000045
The variance of two classes is respectively: κ ‾ 0 , M = 1 M 0 Σ j = 1 M 0 ( x i j - μ ‾ 0 , M ) 2 , κ ‾ 1 , M = 1 M 1 Σ j = 1 M 1 ( x i j - μ ‾ 1 , M ) 2 ;
The initializes weights coefficient of two classes is: a ‾ 00 , M = a ‾ 01 , M = a ‾ 11 , M = a ‾ 10 , M = 0.5 ;
The likelihood score of novel model of calculating,
Figure BDA0000064390660000048
and start interative computation; In following iterative process, old model parameter set expression is λ ' m, new model parameter is: start before iteration,
Figure BDA00000643906600000410
l ' is set to a very large negative, the initialization forward direction factor,
Figure BDA00000643906600000411
the backward factor of initialization,
Figure BDA00000643906600000412
Step 202): calculate the forward direction factor: F ‾ l ( z ) = Σ y F ‾ l - 1 ( z ) a ‾ y , z , M b ( x l | y , λ ‾ M ) , z , y ∈ { 0,1 } ;
Step 203): calculate the backward factor: B ‾ l ( z ) = Σ y B ‾ l + 1 ( y ) a ‾ z , y , M b ( x l + 1 | y , λ ‾ M ) , z , y ∈ { 0,1 } ;
Step 204): calculating noise and voice probability of occurrence: p ( z | x l , λ ‾ M ) = F ‾ l ( z ) B ‾ l ( z ) Σ z F ‾ l ( z ) B ‾ l ( z ) , z ∈ { 0,1 } ;
Step 205): if
Figure BDA00000643906600000416
stop falling generation, wherein ζ is close to zero but is greater than zero decimal;
Step 206): calculate transition probability:
p ( s l - 1 = y , s l = z | x l , λ ‾ M ) = F ‾ l - 1 ( y ) B ‾ l ( z ) a ‾ yz , M b ( x l | z , λ ‾ M ) Σ z F ‾ l - 1 ( y ) B ‾ l ( z ) a ‾ yz , M b ( x l | z , λ ‾ M ) ;
Step 207): calculate new initialization probability π z ′ = p ( s 1 = z | x 1 , λ ‾ M ) ;
Step 208): calculate new average μ z , M ′ = Σ t = 1 M p ( s t = z | x t , λ ‾ M ) x t Σ t = 1 M p ( s t = z | x t , λ ‾ M ) λ ‾ M λ ‾ M λ ‾ M ;
Step 209): new average is retrained: μ ' 1, M=max{ μ ' 0, M, μ ' 0, M+ δ }, wherein, δ is a constant, span is between 0 to 100;
Step 210): calculate new variance κ z , M ′ = Σ t = 1 M p ( s t = z | x t , λ ‾ M ) ( x t - μ ‾ z , M ) 2 Σ t = 1 M p ( s t = z | x t , λ ‾ M ) ;
Step 211): new variance is retrained to κ ' 1, M=max{ κ ' 0, M, κ ' 1, M;
Step 212): calculate new transition probability, a yz , M ′ = Σ t = 1 M p ( s t - 1 = y , s t = z | x t , λ ‾ M ) Σ t = 1 M Σ z p ( s t - 1 = y , s t = z | x t , λ ‾ M ) ;
Step 213): the likelihood score of novel model of calculating L ‾ = log ( p ( x M | λ M ′ ) ) ;
Step 214): if satisfied condition
Figure BDA0000064390660000055
termination of iterations, wherein, ε is a very little numeral, if
Figure BDA0000064390660000056
iteration jumps to step 202).
In the modeling process of above-mentioned HMM parameter, respectively the average of HMM, weight, variance and transition probability are retrained.It is to be noted: in initialization procedure, weight coefficient in the effect that the transition probability is here brought into play and patent 201010178166.4 is suitable, because the weight coefficient in 201010178166.4 is used as denominator term in initialization procedure, so it must retrain in initialization procedure.And there is not this problem in transition probability in this patent.
As a kind of improvement of technique scheme, described step 3) in the sequential renewal of HMM be to set up initialized model λ mafterwards, since M+1 frame, adopt the method for incremental learning, upgrade frame by frame HMM model, its iterative process can be expressed as: on each frequency, and known λ lwith current observed value x l, infer λ l+1; For l+1 frame, carry out Fourier transform, obtain Y l+1, k, wherein, 0≤k < N; On each frequency, calculate range value for each frequency, as follows in the parameter step of updating of l+1 frame:
Step 301): calculate the forward direction factor, F l + 1 | &lambda; l ( z ) = &Sigma; y F l | &lambda; l - 1 ( z ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) , z∈{0,1};
Step 302): computing voice and noise probability of occurrence, &gamma; l + 1 | &lambda; l ( z ) = F l + 1 | &lambda; l ( z ) &Sigma; z F l + 1 | &lambda; l ( z ) , z &Element; { 0,1 } ;
Step 303): design conditions transition probability,
&xi; l + 1 | &lambda; l ( y , z ) = F l + 1 | &lambda; l ( y ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) &Sigma; yz F l + 1 | &lambda; l ( y ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) ;
Step 304): calculating average noise voice probability of occurrence, &gamma; ~ l + 1 ( z ) = &alpha; &gamma; ~ l ( z ) + ( 1 - &alpha; ) &gamma; l + 1 | &lambda; l ( z ) ;
Step 305): rely on smoothing factor computing time, &alpha; ~ l + 1 ( z ) = &alpha; &gamma; &OverBar; l ( z ) &alpha; &gamma; &OverBar; l ( z ) + ( 1 - &alpha; ) &gamma; l + 1 | &lambda; l ( z ) ;
Step 306): computing mode average, &mu; z , l + 1 = &alpha; ~ l + 1 ( z ) &mu; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] x l + 1 ;
Step 307): new state average is retrained: μ 1, l+1=max{ μ 1, l+1, μ 0, l+1+ δ }, l>=M;
Step 308): calculate new state variance, &kappa; z , l + 1 = &alpha; ~ l + 1 ( z ) &kappa; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] ( x l + 1 - &mu; z , l ) 2 ;
Step 309): new state variance is retrained to κ 1, l+1=max{ κ 0, l+1, κ 1, l+1, l>=M;
Step 310): calculate mean transferred probability, &xi; ~ l + 1 ( y , z ) = &alpha; &xi; ~ l ( y , z ) + ( 1 - &alpha; ) &xi; l + 1 | &lambda; l ( y , z ) ;
Step 311): computing mode probability, a yz , l + 1 = a yz , l + &xi; l + 1 | &lambda; l ( y , z ) a yz , l - &xi; l + 1 | &lambda; l ( y , 1 - z ) 1 - a yz , l K a yz , l 2 &xi; &OverBar; l + 1 ( y , z ) + K ( 1 - a yz , l ) 2 &xi; &OverBar; l + 1 ( y , 1 - z ) ;
Step 312): new transition probability is retrained to a 01, l=max{a 01, l, η }, a 00, l=1-a 01, l, a 10, l=max{a 10, l, η }, a 11, l=1-a 10, l, l>=M;
From above sub-step, obtained λ l+1in all parameters, thereby obtained corresponding voice probability of occurrence γ l+1| λ land the power spectrum valuation μ of noise signal (1) 0, l+1.
Described step 3) Increment Learning Algorithm that HMM model in adopts comprises: recursion weight coefficient, recursion average and pass vertebra variance;
Wherein, described recursion average: in formula,
Figure BDA00000643906600000610
be a smoothing factor that depends on voice probability of occurrence, be less than 1 but close to 1;
Described recursion variance: &kappa; z , l + 1 = &alpha; ~ l + 1 ( z ) &kappa; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] ( x l + 1 - &mu; z , l ) 2 ;
Described recursion transition probability:
Figure BDA0000064390660000071
or a yz, l+1=β a yz, l+ (1-β) ξ l+1| λ l(y, z); In formula, β is one and is less than 1 but for example, close to 1 smoothing factor, β=0.99.
The parameter recurrence method of the described sequential hidden Markov model returning based on single order is:
Calculate the forward direction factor of HMM: F l + 1 | &lambda; l ( z ) = &Sigma; y F l | &lambda; l - 1 ( z ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) , z &Element; { 0,1 } ;
Computing voice and noise probability of occurrence, &gamma; l + 1 | &lambda; l ( z ) = F l + 1 | &lambda; l ( z ) &Sigma; z F l + 1 | &lambda; l ( z ) , z &Element; { 0,1 } ;
Design conditions transition probability, &xi; l + 1 | &lambda; l ( y , z ) = F l + 1 | &lambda; l ( y ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) &Sigma; yz F l + 1 | &lambda; l ( y ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) ;
Calculating average noise voice probability of occurrence, &gamma; ~ l + 1 ( z ) = &alpha; &gamma; ~ l ( z ) + ( 1 - &alpha; ) &gamma; l + 1 | &lambda; l ( z ) ;
Rely on smoothing factor computing time, &alpha; ~ l + 1 ( z ) = &alpha; &gamma; &OverBar; l ( z ) &alpha; &gamma; &OverBar; l ( z ) + ( 1 - &alpha; ) &gamma; l + 1 | &lambda; l ( z ) ;
Computation of mean values, &mu; z , l + 1 = &alpha; ~ l + 1 ( z ) &mu; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] x l + 1 ;
Calculate new variance, &kappa; z , l + 1 = &alpha; ~ l + 1 ( z ) &kappa; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] ( x l + 1 - &mu; z , l ) 2 ;
Calculate mean transferred probability, &xi; ~ l + 1 ( y , z ) = &alpha; &xi; ~ l ( y , z ) + ( 1 - &alpha; ) &xi; l + 1 | &lambda; l ( y , z ) ;
Calculate transition probability, a yz , l + 1 = a yz , l + &xi; l + 1 | &lambda; l ( y , z ) a yz , l - &xi; l + 1 | &lambda; l ( y , 1 - z ) 1 - a yz , l K a yz , l 2 &xi; &OverBar; l + 1 ( y , z ) + K ( 1 - a yz , l ) 2 &xi; &OverBar; l + 1 ( y , 1 - z ) .
In technique scheme, the tied mechanism of the guarantee bigram statistics model sound and stable operation of employing comprises:
1) in the starting stage, when the average probability of occurrence of voice is less than certain fixed threshold ζ,
Figure BDA00000643906600000711
algorithm for estimating stops falling generation.This constrains in step 205 and implements.
2), for to prevent that the state transition of hidden Markov model from stopping, the transfering state of model is retrained.a 01,l=max{a 01,l,η},a 00,l=1-a 01,l,a 10,l=max{a 10,l,η},a 11,l=1-a 10,l,l≥M。This constrains in step 312 and implements.
3) in tracing process, the constraint to average, μ 1, l+1=max{ μ 1, l+1, μ 0, l+1+ δ }, l>=M.This constrains in step 307 and implements.
4) constraint to variance, κ 1, l+1=max{ κ 0, l+1, κ 1, l+1, l>=M.This constrains in step 309 and implements.
The present invention relates to a kind of based on sequential hidden Markov model (Sequential Hidden Markov Model, SHMM) noise power spectrum is estimated and Voice activity detector method, comprise the following steps: 1) logarithm amplitude characteristic for voice signal in each frequency component, set up a SHMM model, 2) for one section of speech data, set M frame buffer, front M frame input signal is deposited in buffer memory, extract the logarithm amplitude spectrum of M frame in buffer memory, substitution step 1) SHMM model carries out initialization, obtains initialized model λ m; 3) obtaining initialized model λ mafterwards, since M+1 frame, adopt the method for incremental learning, upgrade frame by frame SHMM model.Noise states mean value in model is exactly current noise estimation value, and the voice probability of occurrence in estimation procedure represents that voice are in the activity of time-frequency domain.The method of recursion is: according to current observed value x lmodel parameter collection λ with a upper moment l-1, the model parameter collection λ of estimation current time l.Thus, obtain successively each probability that constantly noise power spectrum in certain frequency component and voice occur.The present invention is that spectrum is estimated and the tight coupling solution of Voice activity detector, can strengthen the adaptability of voice application system to noise circumstance; The present invention does not rely on " noise is initial " and supposes; And the present invention can also provide the description of voice activity on time-frequency two-dimensional space.This patent is to develop on the patent basis of the patent No. 201010178166.4 of having applied for, owing to having adopted the accurate modeling method of model more, the performance of this patent is better than 201010178166.4, but computation complexity is higher than 201010178166.4.
Compared with prior art, the present invention has following technique effect:
Based on voice signal, in certain frequency component, there is sequential correlativity, the present invention utilizes hidden Markov model to carry out modeling to this sequential correlativity, log power spectrum envelope in certain frequency component can be regarded a Markov chain as, the redirect between voice " appearance " and " not occurring " two states of this chain, for each state, adopt a Gaussian distribution to describe the distribution of its power spectrum.In order to simplify calculating, the invention allows for the sequential HMM method for tracing that single order returns, its parameter is along with input signal constantly changes.Wherein the voice of HMM " do not occur " that the average of state is exactly the estimated value of noise power spectrum, according to the forward direction factor of HMM, can derive voice certain time frequency probability of occurrence.
The present invention is that a kind of Voice activity detector and noise power spectrum are estimated tightly coupled scheme, can strengthen the adaptability of voice application system to noise circumstance; And the present invention can also provide the description of voice activity on time-frequency two-dimensional space, be conducive to noise to carry out further process of refinement.
Accompanying drawing explanation
Fig. 1 noise spectrum estimation of the present invention and Voice activity detector method process flow diagram;
Fig. 2 is that example has compared SHMM noise Estimation Algorithm of the present invention, classical minimum statistics algorithm (MS), minimum recurrence average algorithm (MCRA) and its raising version IMCRA effect comparison chart controlled.
Embodiment
The present invention proposes a kind of noise power spectrum based on sequential hidden Markov model estimates and Voice activity detector method.
As shown in Figure 1, comprise the following steps:
1) the logarithm amplitude characteristic on each frequency for voice signal, sets up a HMM model, and mathematic(al) representation is as follows:
p ( x l | &lambda; l ) = &Sigma; s l &Pi; t = 1 l a s t - 1 , s t &Pi; t = 1 l b ( x t | s t , &lambda; l )
Here
Figure BDA0000064390660000092
represent state transition probability,
Figure BDA0000064390660000093
represent original state probability, wherein gaussian component is expressed as:
b ( x t | s t , &lambda; l ) = 1 2 &pi; &kappa; z , l exp { - 1 2 ( x t - &mu; z , l ) 2 / &kappa; z , l }
Wherein, x lrepresent the logarithm amplitude spectrum on certain frequency of l frame, z=0 represents that voice do not go out present condition, and z=1 expresses present condition.μ z, kand κ z, krepresent respectively average and variance, parameter set λ l={ μ 0, l, μ 1, l, κ 1, l, κ 0, l, a 01, l, a 10, l, a 00, l, a 11, l, π 0, π 1.
2), for one section of speech data, set M frame buffer, front M frame input signal is deposited in buffer memory in to the logarithm amplitude spectrum of M frame in extraction buffer memory, substitution step 1) GMM model carry out initialization, obtain initialized model λ 0, k; Initialization procedure adopts constraint EM algorithm; M represents the length of initialization window.
3) obtaining initialized model λ mafterwards, since M+1 frame, adopt the method for incremental learning, upgrade frame by frame HMM model, recursion obtains λ successively l.And draw noise figure μ 0, lwith the probability of occurrence of voice signal on certain frequency of l frame.
I=1 wherein, 2,3 ...
Wherein, the Increment Learning Algorithm of described GMM comprises recursion weight coefficient, recursion average and recursion variance;
Wherein forward direction factor recurrence method is: F l + 1 | &lambda; l ( z ) = &Sigma; y F l | &lambda; l - 1 ( z ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) , z∈{0,1}。
Voice and noise probability of occurrence recurrence method are: &gamma; l + 1 | &lambda; l ( z ) = F l + 1 | &lambda; l ( z ) &Sigma; z F l + 1 | &lambda; l ( z ) , z &Element; { 0,1 }
Conditional transfer probability recurrence method is: &xi; l + 1 | &lambda; l ( y , z ) = F l + 1 | &lambda; l ( y ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) &Sigma; yz F l + 1 | &lambda; l ( y ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l )
Average noise voice probability of occurrence recurrence method is: &gamma; ~ l + 1 ( z ) = &alpha; &gamma; ~ l ( z ) + ( 1 - &alpha; ) &gamma; l + 1 | &lambda; l ( z )
Time Dependent smoothing factor recurrence method is: &alpha; ~ l + 1 ( z ) = &alpha; &gamma; &OverBar; l ( z ) &alpha; &gamma; &OverBar; l ( z ) + ( 1 - &alpha; ) &gamma; l + 1 | &lambda; l ( z )
State average recurrence method is: &mu; z , l + 1 = &alpha; ~ l + 1 ( z ) &mu; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] x l + 1
State variance recurrence method is: &kappa; z , l + 1 = &alpha; ~ l + 1 ( z ) &kappa; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] ( x l + 1 - &mu; z , l ) 2
Mean transferred probability recurrence method is: &xi; ~ l + 1 ( y , z ) = &alpha; &xi; ~ l ( y , z ) + ( 1 - &alpha; ) &xi; l + 1 | &lambda; l ( y , z )
State probability recurrence method is: a yz , l + 1 = a yz , l + &xi; l + 1 | &lambda; l ( y , z ) a yz , l - &xi; l + 1 | &lambda; l ( y , 1 - z ) 1 - a yz , l K a yz , l 2 &xi; &OverBar; l + 1 ( y , z ) + K ( 1 - a yz , l ) 2 &xi; &OverBar; l + 1 ( y , 1 - z )
The maximum feature of sequential hidden Markov model be can online tracing frequency component on the sequential correlativity that occurs of voice, it regards the general envelope of the power in certain frequency component as a Markov chain of switching between voice and non-voice state.It adopts non-supervisory mode to build initialization model.Particularly, it has following feature:
● owing to adopting HMM, can adopt the mode of Viterbi decoding, provide in a time series optimal estimation whether voice occur.
● at initial phase, do not rely on the initial hypothesis of noise, so the range of application of this invention is more wide in range than one solution application.
● voice activity is the two-dimensional signal of " time---frequency ", and other Voice activity detector algorithm has only been described the existence of voice on time dimension.
In one embodiment, the carrier of unsupervised learning framework is binary hidden Markov model (Hidden MarkovModel, is abbreviated as HMM).The distribution of one of them representation in components speech energy, another component is the distribution of noise energy.In each frequency component, extract logarithm amplitude spectrum envelope, and set up a corresponding HMM.First adopt EM algorithm initialization HMM, then adopt the mode of incremental learning progressively to upgrade HMM.According to HMM model, deduce out respectively the power spectrum information of the probability of occurrence on this subband and the noise of voice.In the modeling process of HMM parameter, respectively the average of HMM, weight, variance and transition probability are retrained.Wherein, for the sequential estimation method of HMM parameter, specifically comprise the calculating of recursion weight coefficient, recursion average and recursion variance and recursion.
1) recursion average:
Figure BDA0000064390660000111
wherein
Figure BDA0000064390660000112
be one and depend on language
The smoothing factor of sound probability of occurrence, is less than 1 but close to 1.
2) recursion variance, &kappa; z , l + 1 = &alpha; ~ l + 1 ( z ) &kappa; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] ( x l + 1 - &mu; z , l ) 2 ;
3) recursion transition probability,
Figure BDA0000064390660000114
or
Figure BDA0000064390660000115
wherein β is one and is less than 1 but for example, close to 1 smoothing factor, β=0.99.
Below in conjunction with a preferred embodiment, the present invention is done further and described.
Principle of the present invention is as follows:
Noise estimation procedure parallel running in each frequency component, so, in following description, dispense frequency component index k.Logarithm amplitude spectrum time series x for voice signal in each frequency component l={ x 1, x 2..., x l, set up a hidden Markov model, s l={ s 1, s 2..., s l, s t{ 0,1} is its corresponding status switch to ∈, and 1 represents that voice go out present condition, and 0 represents that noise goes out present condition, λ lexpression is from sequence x lin the model parameter valuation of obtaining, so for a given parameter set λ l, corresponding observed value sequence x lprobability density function can be expressed as:
p ( x l | &lambda; l ) = &Sigma; s l p ( s l | &lambda; l ) p ( x l | &lambda; l , s l )
Wherein, p (s l| λ l) expression status switch s lthe prior probability gaussian component occurring is expressed as:
p ( s l | &lambda; l ) = &Pi; t = 1 l a s t - 1 , s t
Here
Figure BDA0000064390660000118
represent state transitions
Figure BDA0000064390660000119
represent original state probability, p (x l| λ l, s l) expression given state s lwith parameter set λ lsituation under observed value sequence x llikelihood score:
p ( x l | &lambda; l , s l ) = &Pi; t = 1 l b ( x t | s t , &lambda; l )
Wherein
b ( x t | s t , &lambda; l ) = 1 2 &pi; &kappa; z , l exp { - 1 2 ( x t - &mu; z , l ) 2 / &kappa; z , l }
Here κ z, lrepresent Gaussian distribution variance, μ z, lrepresent average, s l=z, λ l={ μ 0, l, μ 1, l, κ 1, l, κ 0, l, a 01, l, a 10, l, a 00, l, a 11, l, π 0, π 1, the initial probability π in parameter set znot along with the time changes.
μ in this model 0, lbe exactly that we want the noise of estimating.Meanwhile, the probability that we can derive that voice signal occurs on certain frequency of l frame is γ t| λ l(z)=p (s t=z|x t, λ l).
Based on above-mentioned principle, according to one embodiment of present invention, described noise power spectrum is estimated and Voice activity detector method comprises the following steps:
Step 100: set M frame buffer, front M frame input signal is deposited in buffer memory, extract the amplitude spectrum of M frame in buffer memory.The method of extracting a frame amplitude spectrum is as follows:
First the digitized sound signal of this frame is done to pre-service (according to system actual conditions, can comprise windowing, pre-emphasis etc.), establishing every frame length is F point, and first zero padding is to N point (N>=F wherein, N=2 j, j is integer and j>=8), carry out leaf transformation in N point discrete Fourier, obtain discrete spectrum
Figure BDA0000064390660000121
y wherein l, nn the sampled point that represents l frame in buffer memory, Y l, kk Fourier transform value of i frame in expression buffer memory (k=0,1 ..., N-1).So, its range value may be calculated
Figure BDA0000064390660000122
b (r) is windowed function (as Hanning window, breathing out peaceful window etc.), notices that the k is here omitted in the following description.
The initialization of step 200:HMM.Hidden Markov model λ of initialization on each frequency m, wherein subscript M represents initialized time window length, and initialization procedure adopts constraint EM algorithm, and on certain frequency, concrete initialization step is as follows:
Step 201: the method (for example non-supervisory cluster of LBG, or fuzzy clustering etc.) by cluster is divided into two classes by M sample:
Figure BDA0000064390660000123
with
Figure BDA0000064390660000124
m wherein 0+ M 1=M, the subscript for class (1) that average is larger represents, another kind of with subscript (0) expression.The average of two classes is
Figure BDA0000064390660000125
the average of the class that energy is less is wherein
Figure BDA0000064390660000127
the variance of two classes is respectively: the initializes weights coefficient of two classes:
Figure BDA0000064390660000129
the likelihood score of novel model of calculating, in following iterative process, old model parameter set expression is λ ' m, new model parameter is:
Figure BDA00000643906600001211
before beginning iteration,
Figure BDA00000643906600001212
l ' is set to very large negative, for example a L ' k=-10000.The initialization forward direction factor,
Figure BDA0000064390660000131
the backward factor of initialization,
Figure BDA0000064390660000132
below start interative computation.
Step 202: calculate the forward direction factor: F &OverBar; t ( z ) = &Sigma; y F &OverBar; t - 1 ( z ) a &OverBar; yz , M b ( x t | y , &lambda; &OverBar; M ) , z &Element; { 0,1 } .
Step 203: calculate the backward factor: B &OverBar; t ( z ) = &Sigma; y B &OverBar; t + 1 ( y ) a &OverBar; z y , M b ( x l + 1 | y , &lambda; &OverBar; M ) , z &Element; { 0,1 } .
Step 204: calculating noise and voice probability of occurrence: p ( z | x t , &lambda; &OverBar; M ) = F &OverBar; t ( z ) B &OverBar; t ( z ) &Sigma; z F &OverBar; t ( z ) B &OverBar; t ( z ) , z &Element; { 0,1 }
Step 205: if stop falling generation.Wherein ζ is close to zero but is greater than zero decimal.
Step 206: calculate transition probability: p ( s t - 1 = y , s t = z | x t , &lambda; &OverBar; M ) = F &OverBar; t - 1 ( y ) B &OverBar; t ( z ) a &OverBar; yz , M b ( x t | z , &lambda; &OverBar; M ) &Sigma; z F &OverBar; t - 1 ( y ) B &OverBar; t ( z ) a &OverBar; yz , M b ( x t | z , &lambda; &OverBar; M ) .
Step 207: calculate new initialization probability &pi; z &prime; = p ( s 1 = z | x 1 , &lambda; &OverBar; M )
Step 208: calculate new average &mu; z , M &prime; = &Sigma; t = 1 M p ( s t = z | x t , &lambda; &OverBar; M ) &Sigma; t = 1 M p ( s t = z | x t , &lambda; &OverBar; M ) &lambda; &OverBar; M &lambda; &OverBar; M &lambda; &OverBar; M
Step 209: new average is retrained: μ ' 1, k=max{ μ ' 0, k, μ ' 0, k+ δ }, wherein δ is a constant, span is between 0 to 100.
Step 210: calculate new variance &kappa; z , M &prime; = &Sigma; t = 1 M p ( s t = z | x t , &lambda; &OverBar; M ) ( x t - &mu; &OverBar; z , M ) 2 &Sigma; t = 1 M p ( s t = z | x t , &lambda; &OverBar; M )
Step 211: new variance is retrained to κ ' 1, M=max{ κ ' 0, M, κ ' 1, M}
Step 212: calculate new transition probability, a yz , M &prime; = &Sigma; t = 1 M p ( s t - 1 = y , s t = z | x t , &lambda; &OverBar; M ) &Sigma; t = 1 M &Sigma; z p ( s t - 1 = y , s t = z | x t , &lambda; &OverBar; M )
Step 213: the likelihood score of novel model of calculating L &OverBar; = log ( p ( x M | &lambda; M &prime; ) ) ,
Step 214: if satisfied condition
Figure BDA00000643906600001313
termination of iterations, wherein ε is very little numeral, for example ε=0.1.If
Figure BDA00000643906600001314
iteration jumps to " step 202 ".
The sequential renewal of step 300:HMM.Setting up initialized model λ mafterwards, since M+1 frame, adopt the method for incremental learning, upgrade frame by frame HMM model.Iterative process can be expressed as: on each frequency, and known λ lwith current observed value x l, infer λ l+1.For l+1 frame, carry out Fourier transform, obtain T l+1, k, 0≤k < N wherein.On each frequency, calculate range value for each frequency, as follows in the parameter step of updating of l+1 frame:
Step 301: calculate the forward direction factor, F l + 1 | &lambda; l ( z ) = &Sigma; y F l | &lambda; l - 1 ( z ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) , z &Element; { 0,1 } .
Step 302: computing voice and noise probability of occurrence, &gamma; l + 1 | &lambda; l ( z ) = F l + 1 | &lambda; l ( z ) &Sigma; z F l + 1 | &lambda; l ( z ) , z &Element; { 0,1 }
Step 303: design conditions transition probability, &xi; l + 1 | &lambda; l ( y , z ) = F l + 1 | &lambda; l ( y ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) &Sigma; yz F l + 1 | &lambda; l ( y ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l )
Step 304: calculating average noise voice probability of occurrence, &gamma; ~ l + 1 ( z ) = &alpha; &gamma; ~ l ( z ) + ( 1 - &alpha; ) &gamma; l + 1 | &lambda; l ( z )
Step 305: rely on smoothing factor computing time, &alpha; ~ l + 1 ( z ) = &alpha; &gamma; &OverBar; l ( z ) &alpha; &gamma; &OverBar; l ( z ) + ( 1 - &alpha; ) &gamma; l + 1 | &lambda; l ( z )
Step 306: computation of mean values, &mu; z , l + 1 = &alpha; ~ l + 1 ( z ) &mu; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] x l + 1
Step 307: new average is retrained: μ 1, l+1=max{ μ 1, l+1, μ 0, l+1+ δ }.
Step 308: calculate new variance, &kappa; z , l + 1 = &alpha; ~ l + 1 ( z ) &kappa; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] ( x l + 1 - &mu; z , l ) 2
Step 309: new variance is retrained to κ 1, l+1=max{ κ 0, l+1, κ 1, l+1}
Step 310: calculate mean transferred probability, &xi; ~ l + 1 ( y , z ) = &alpha; &xi; ~ l ( y , z ) + ( 1 - &alpha; ) &xi; l + 1 | &lambda; l ( y , z )
Step 311: calculate transition probability, a yz , l + 1 = a yz , l + &xi; l + 1 | &lambda; l ( y , z ) a yz , l - &xi; l + 1 | &lambda; l ( y , 1 - z ) 1 - a yz , l K a yz , l 2 &xi; &OverBar; l + 1 ( y , z ) + K ( 1 - a yz , l ) 2 &xi; &OverBar; l + 1 ( y , 1 - z )
Step 312: new transition probability is retrained to a 01, l=max{a 01, l, η }, a 00, l=1-a 01, l, a 10, l=max{a 10, l, η }, a 11, l=1-a 10, l.
From above sub-step, we have obtained λ l+1in all parameters, thereby obtained corresponding voice probability of occurrence power spectrum valuation μ with noise signal 0, l+1.
Algorithm based on above-described embodiment, the performance that noise power spectrum is estimated is evaluated, adopt each 8 sentences of men and women words person speech data in TIMIT database, and the white Gaussian noise in NOISEX92 noise data storehouse, F16 fight support storehouse noise and babble noise according to 0,5, the signal to noise ratio (S/N ratio) such as 10dB mixes.The first evaluation index linear segmented error is defined as follows:
&epsiv; n = 1 L &Sigma; l = 1 L { 10 log 10 &Sigma; k = 1 N [ D k , l - D ^ k , l ] 2 / &Sigma; k = 1 N D k , l 2 }
Wherein D (k, l) represents actual noise amplitude spectrum,
Figure BDA0000064390660000152
represent the noise amplitude spectrum of estimating, notice that error amount is less, represent that estimated value is more close to actual value, it is more accurate to estimate.The second evaluation index logarithm segmentation error is defined as follows:
&epsiv; r = 1 L &Sigma; l = 1 L { 1 N &Sigma; k = 1 N [ 20 log 10 | D k , l | - 20 log 10 | D ^ k , l | ] 2 } 1 / 2 .
In like manner, error amount is less, represents that estimating noise is more accurate.
Algorithm compares with three kinds of noise power spectrum algorithm for estimating of current main-stream respectively, wherein MS represents minimum statistics algorithm, MCRA represents the minimum recurrence average algorithm of controlling, and IMCRA represents that the minimum control that improves version returns average algorithm, and SHMM is algorithm of the present invention.Table 1 has been expressed the result of line spectrum error SegError.
Noise under the various environment of table 1 is estimated linear segmented error
Figure BDA0000064390660000154
Noise under the various environment of table 2 is estimated logarithm segmentation error
As can be seen from the above table, the algorithm that the present invention proposes all has obvious advantage for three kinds of algorithms of current main flow.
In addition, Fig. 2 has compared SHMM noise Estimation Algorithm, classical minimum statistics algorithm (MS), minimum recurrence average algorithm (MCRA) and its raising version IMCRA of controlling by an example.In this example, Noisy Speech Signal was the position of 3.75 seconds, and signal to noise ratio (S/N ratio) drops to suddenly 4dB from 10dB, and the position at 13.1 seconds, rises to 10dB from 4dB again.(a) be the power spectrum of noisy speech on a certain subband; (b) the noise power spectrum envelope of estimating for MS algorithm, wherein dotted line represents real noise power spectrum envelope; (c) represent the estimated result of MCRA algorithm; (d) represent the estimated result of IMCRA algorithm; (e) represent the estimated result of this algorithm.As can be seen from the figure, other three algorithms are slower for the reaction of the unexpected rising of noise, can not catch up with fast jumping of noise.And the better performances that SHMM algorithm table reveals.
In the past few decades, people have invented various algorithms, for estimating voice activity and noise power spectrum.Voice signal is one of them important clue in the sequential correlativity of frequency domain, and because voice signal is non-stabilization signal, this sequential correlativity is also along with the time changes.Yet the algorithm in past does not cause enough attention to this sequential correlativity, be just used simply, while not adopting adaptive method to describe, become correlativity.The present invention adopts sequential hidden Markov model SHMM to describe the sequential correlativity of voice in each frequency component, this sequential estimation method is to be based upon on the basis of single order recurrence, together with sequential correlativity and its parameter set, along with the variation of input signal, change.This statistical model has accurately reflected the sequential statistical nature of voice, and therefore, the algorithm for estimating that the present invention proposes is better than the algorithm (for example minimum statistics, the minimum recurrence of controlling is average) of current main flow in performance.
In addition, SHMM model was to be all based upon on the basis of high-order recurrence in the past, and the single order proposing in the present invention returns the SHMM that SHMM returns with respect to high-order, has greatly saved calculated amount and storage space.This is another innovation of the present invention.
It should be noted last that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described.Although the present invention is had been described in detail with reference to embodiment, those of ordinary skill in the art is to be understood that, technical scheme of the present invention is modified or is equal to replacement, do not depart from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of claim scope of the present invention.

Claims (9)

1. a noise power spectrum is estimated and Voice activity detector method, the sequential hidden Markov model SHMM that the method returns based on single order describes the sequential correlativity of voice in each frequency component, and adopt the mode of incremental learning progressively to upgrade SHMM, finally, deduce out the power spectrum information of the probability of occurrence on this frequency subband and the noise of voice, accurately to reflect the sequential statistical nature of voice, the method comprises the following steps:
1) for voice signal, in each frequency component, extract logarithm amplitude spectrum envelope, and set up a corresponding binary hidden Markov model, wherein, one-component represents the distribution of speech energy, another component is the distribution of noise energy, and each state represents by Gaussian distribution;
2) for one section of speech data, set M frame buffer, front M frame input signal is deposited in buffer memory, extract the logarithm amplitude spectrum of M frame in buffer memory, adopt maximum likelihood estimation algorithm to set up an initialized model;
3) obtaining initialized model λ mafterwards, since M+1 frame, adopt the method for incremental learning, upgrade frame by frame the HMM model of each frequency band, recursion obtains the probability of occurrence of noise figure and voice signal successively;
The sequential renewal of HMM in described step 3) is to set up initialized model λ mafterwards, since M+1 frame, adopt the method for incremental learning, upgrade frame by frame HMM model, its iterative process can be expressed as: on each frequency, and known λ lwith current observed value x l, infer λ l+1; For l+1 frame, carry out Fourier transform, obtain Y l+1, k, wherein, 0≤k < N; On each frequency, calculate range value for each frequency, as follows in the parameter step of updating of l+1 frame:
Step 301): calculate the forward direction factor, F l + 1 | &lambda; l ( z ) = &Sigma; y F l | &lambda; l - 1 ( z ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) , z∈{0,1};
Step 302): computing voice and noise probability of occurrence, &gamma; l + 1 | &lambda; l ( z ) = F l + 1 | &lambda; l ( z ) &Sigma; z F l + 1 | &lambda; l ( z ) , z &Element; { 0,1 } ;
Step 303): design conditions transition probability,
&xi; l + 1 | &lambda; l ( y , z ) = F l + 1 | &lambda; l ( y ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) &Sigma; yz F l + 1 | &lambda; l ( y ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) ;
Step 304): calculating average noise voice probability of occurrence,
Figure FDA0000392541200000015
Step 305): rely on smoothing factor computing time, &alpha; ~ l + 1 ( z ) = &alpha; &gamma; &OverBar; l ( z ) &alpha; &gamma; &OverBar; l ( z ) + ( 1 - &alpha; ) &gamma; l + 1 | &lambda; l ( z ) ;
Step 306): computing mode average, &mu; z , l + 1 = &alpha; ~ l + 1 ( z ) &mu; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] x l + 1 ;
Step 307): new state average is retrained: &mu; 1 , l + 1 = max { &mu; 1 , l + 1 , &mu; 0 , l + 1 + &delta; } , l &GreaterEqual; M ;
Step 308): calculate new state variance, &kappa; z , l + 1 = &alpha; ~ l + 1 ( z ) &kappa; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] ( x l + 1 - &mu; z , l ) 2 ;
Step 309): new state variance is retrained to κ 1, l+1=max{ κ 0, l+1, κ 1, l+1, l>=M;
Step 310): calculate mean transferred probability, &xi; l + 1 ( y , z ) = &alpha; &xi; l ( y , z ) + ( 1 - &alpha; ) &xi; l + 1 | &lambda; l ( y , z ) ;
Step 311): computing mode probability, a yz , l + 1 = a yz , l + &xi; l + 1 | &lambda; l ( y , z ) a yz , l - &xi; l + 1 | &lambda; l ( y , 1 - z ) 1 - a yz , l K a yz , l 2 &xi; &OverBar; l + 1 ( y , z ) + K ( 1 - a yz , l ) 2 &xi; &OverBar; l + 1 ( y , 1 - z ) ;
Step 312): new transition probability is retrained to a 01, l=max{a 01, l,η }, a 00, l=1-a 01, l, a 10, l=max{a 10, l, η }, a 11, l=1-a 10, l, l>=M;
From above sub-step, obtained λ l+1in all parameters, thereby obtained corresponding voice probability of occurrence γ l+1| λ land the power spectrum valuation μ of noise signal (1) 0, l+1.
2. noise power spectrum according to claim 1 is estimated and Voice activity detector method, and the concrete steps of the method comprise:
1) for voice signal, in each frequency component, extract logarithm amplitude spectrum envelope, for the logarithm amplitude spectrum time series x in a frequency component l={ x 1, x 2..., x l, set up a hidden Markov model s l={ s 1, s 2..., s l, s t{ 0,1} is its corresponding status switch to ∈, and 1 represents that voice go out present condition, and 0 represents that noise goes out present condition, λ lexpression is from sequence x lin the model parameter valuation of obtaining, so, for a given parameter set λ l, corresponding observed value sequence x lprobability density function can be expressed as:
p ( x l | &lambda; l ) = &Sigma; s l p ( s l | &lambda; l ) p ( x l | &lambda; l , s l ) ;
Wherein, p (s l| λ l) expression status switch s lthe prior probability occurring, gaussian component is expressed as:
p ( s l | &lambda; l ) = &Pi; t = 1 l a s t - 1 , s t ;
Here
Figure FDA0000392541200000028
represent state transition probability,
Figure FDA0000392541200000029
represent original state probability, p (x l| λ l, s l) expression given state s lwith parameter set λ lsituation under observed value sequence x llikelihood score:
p ( x l | &lambda; l , s l ) = &Pi; t = 1 l b ( x t | s t , &lambda; l ) ;
Wherein,
b ( x t | s t , &lambda; l ) = 1 2 &pi; &kappa; s t , l exp { - 1 2 ( x t - &mu; s t , l ) 2 / &kappa; s t , l } ;
Here expression state s tcorresponding Gaussian distribution variance,
Figure FDA0000392541200000033
represent corresponding average, λ l={ μ 0, l, μ 1, l, κ 1, l, κ 0, l, a 01, l, a 10, l, a 00, l, a 11, l, π 0, π 1, the initial probability π in parameter set inot along with the time changes;
μ in this model 0, lbe exactly that we want the noise of estimating, meanwhile, the probability that we can derive that voice signal occurs on certain frequency of l frame is
Figure FDA0000392541200000037
2) for one section of speech data, set M frame buffer, front M frame input signal is deposited in buffer memory, extract the logarithm amplitude spectrum of M frame in buffer memory, the HMM model of substitution step 1) is to hidden Markov model λ of initialization on each frequency m, subscript M represents initialized time window length, l>=M;
3) obtaining initialized model λ mafterwards, since M+1 frame, HMM model adopts the method for incremental learning, upgrades frame by frame SHMM model, and recursion obtains λ successively l; And draw noise figure μ 0, lwith the probability of occurrence of voice signal on certain frequency of l frame.
3. noise power spectrum according to claim 1 and 2 is estimated and Voice activity detector method, it is characterized in that, the step of extracting a frame amplitude spectrum in described step 1) comprises:
First, the digitized sound signal of this frame is done to pre-service, establishing every frame length is F point, and first zero padding is to N point, N>=F, N=2 j, j is integer, and j>=8, carries out leaf transformation in N point discrete Fourier, obtains discrete spectrum
Figure FDA0000392541200000034
wherein, y l,nn the sampled point that represents l frame in buffer memory, Y l,kk Fourier transform value of i frame in expression buffer memory (k=0,1 ..., N-1); So, its range value may be calculated x l = 10 * log 10 [ &Sigma; r = - w w b ( r ) | Y l , k - r | 2 ] , In formula, b (r) is windowed function.
4. noise power spectrum according to claim 3 is estimated and Voice activity detector method, it is characterized in that, described pre-service comprises windowing or/and pre-emphasis.
5. noise power spectrum according to claim 3 is estimated and Voice activity detector method, it is characterized in that, described windowed function adopts Hanning window or breathes out peaceful window.
6. noise power spectrum according to claim 1 and 2 is estimated and Voice activity detector method, it is characterized in that described step 2) in the initialization of HMM, on certain frequency, concrete initialized step comprises:
Step 201): the method by cluster is divided into two classes by M sample:
Figure FDA0000392541200000036
with wherein, M 0+ M 1=M, the subscript for class (1) that average is larger represents, another kind of with subscript (0) expression;
The average of two classes is
Figure FDA0000392541200000042
the average of the class that energy is less is
Figure FDA0000392541200000043
wherein, &mu; &OverBar; 0 , M < &mu; &OverBar; 1 , M ;
The variance of two classes is respectively: &kappa; &OverBar; 0 , M = 1 M 0 &Sigma; j = 1 M 0 ( x i j - &mu; &OverBar; 0 , M ) 2 , &kappa; &OverBar; 1 , M = 1 M 1 &Sigma; j = 1 M 1 ( x i j - &mu; &OverBar; 1 , M ) 2 ;
The initializes weights coefficient of two classes is: a &OverBar; 00 , M = a &OverBar; 01 , M = a &OverBar; 11 , M = a &OverBar; 10 , M = 0.5 ;
The likelihood score of novel model of calculating,
Figure FDA0000392541200000047
and start interative computation; In following iterative process, old model parameter set expression is λ ' m, new model parameter is: start before iteration,
Figure FDA0000392541200000049
l ' is set to a very large negative, the initialization forward direction factor,
Figure FDA00003925412000000410
the backward factor of initialization, B &OverBar; M ( z ) = 1
Step 202): calculate the forward direction factor: F &OverBar; l ( z ) = &Sigma; y F &OverBar; l - 1 ( z ) a &OverBar; y , z , M b ( x l | y , &lambda; &OverBar; M ) , z,y∈{0,1};
Step 203): calculate the backward factor: B &OverBar; l ( z ) = &Sigma; y B &OverBar; l + 1 ( z ) a &OverBar; y , z , M b ( x l + 1 | y , &lambda; &OverBar; M ) , z,y∈{0,1};
Step 204): calculating noise and voice probability of occurrence:
Figure FDA00003925412000000414
z ∈ { 0,1};
Step 205): if 1 M &Sigma; t = 1 M p ( s t = 1 | x t , &lambda; &OverBar; M ) < &zeta; , &lambda; M = &lambda; &OverBar; M , Stop falling generation, wherein ζ is close to zero but is greater than zero decimal;
Step 206): calculate transition probability:
p ( s l - 1 = y , s l = z | x l , &lambda; &OverBar; M ) = F &OverBar; l - 1 ( y ) B &OverBar; l ( z ) a &OverBar; yz , M b ( x l | z , &lambda; &OverBar; M ) &Sigma; z F &OverBar; l - 1 ( y ) B &OverBar; l ( z ) a &OverBar; yz , M b ( x l | z , &lambda; &OverBar; M ) ;
Step 207): calculate new initialization probability
Step 208): calculate new average &mu; z , M ' = &Sigma; t = 1 M p ( s t = z | x t , &lambda; &OverBar; M ) x t &Sigma; t = 1 M p ( s t = z | x t , &lambda; &OverBar; M ) &lambda; &OverBar; M &lambda; &OverBar; M &lambda; &OverBar; M ;
Step 209): new average is retrained: μ ' 1, M=max{ μ ' 0, M, μ ' 0, M+ δ }, wherein, δ is a constant, span is between 0 to 100;
Step 210): calculate new variance &kappa; z , M ' = &Sigma; t = 1 M p ( s t = z | x t , &lambda; &OverBar; M ) ( x t - &mu; &OverBar; z , M ) 2 &Sigma; t = 1 M p ( s t = z | x t , &lambda; &OverBar; M ) ;
Step 211): new variance is retrained to κ ' 1, M=max{ κ ' 0, M, κ ' 1, M;
Step 212): calculate new transition probability, a yz , M ' = &Sigma; t = 1 M p ( s t - 1 = y , s t = z | x t , &lambda; &OverBar; M ) &Sigma; t - 1 M &Sigma; z p ( s t - 1 = y , s t = z | x t , &lambda; &OverBar; M ) ;
Step 213): the likelihood score of novel model of calculating
Step 214): if satisfied condition termination of iterations, wherein, ε is a very little numeral, if | L &prime; - L &OverBar; | > &epsiv; , &lambda; &OverBar; M = &lambda; M &prime; , L &prime; = L &OverBar; , Iteration jumps to step 202).
7. noise power spectrum according to claim 6 is estimated and Voice activity detector method, it is characterized in that described step 201) in the method for cluster adopt the non-supervisory cluster of LBG or fuzzy clustering method.
8. noise power spectrum according to claim 1 is estimated and Voice activity detector method, it is characterized in that, the Increment Learning Algorithm that the HMM model in described step 3) adopts comprises: recursion weight coefficient, recursion average and recursion variance;
Wherein, described recursion average: &mu; z , ' + 1 = &alpha; ~ l + 1 ( z ) &mu; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] x l + 1 , In formula,
Figure FDA0000392541200000057
be a smoothing factor that depends on voice probability of occurrence, be less than 1 but close to 1;
Described recursion variance: &kappa; z , l + 1 = &alpha; ~ l + 1 ( z ) &kappa; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] ( x l + 1 - &mu; z , l ) 2 ;
Described recursion transition probability: a yz , l + 1 = a yz , l + &xi; l + 1 | &lambda; l ( y , z ) a yz , l - &xi; l + 1 | &lambda; l ( y , 1 - z ) 1 - a yz , l K a yz , l 2 &xi; &OverBar; l + 1 ( y , z ) + K ( 1 - a yz , l ) 2 &xi; &OverBar; l + 1 ( y , 1 - z ) ; Or a yz, l+1=β a yz, l+ (1-β) ξ l+1| λ l(y, z); In formula, β is one and is less than 1 but close to 1 smoothing factor.
9. noise power spectrum according to claim 1 is estimated and Voice activity detector method, it is characterized in that, the parameter recurrence method of the described sequential hidden Markov model returning based on single order is:
Calculate the forward direction factor of HMM: F l + 1 | &lambda; l ( z ) = &Sigma; y F l | &lambda; l - 1 ( z ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) , z∈{0,1};
Computing voice and noise probability of occurrence,
Figure FDA00003925412000000511
z ∈ { 0,1};
Design conditions transition probability, &xi; l + 1 | &lambda; l ( y , z ) = F l + 1 | &lambda; l ( y ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) &Sigma; yz F l + 1 | &lambda; l ( y ) a yz , l b ( x l + 1 | s l + 1 = z , &lambda; l ) ;
Calculating average noise voice probability of occurrence, &gamma; ~ l + 1 ( z ) = a &gamma; ~ l ( z ) + ( 1 - &alpha; ) &gamma; l + 1 | &lambda; l ( z ) ;
Rely on smoothing factor computing time, &alpha; ~ l + 1 ( z ) = &alpha; &gamma; &OverBar; l ( z ) &alpha; &gamma; &OverBar; l ( z ) + ( 1 - &alpha; ) &gamma; l + 1 | &lambda; l ( z ) ;
Computation of mean values, &mu; z , l + 1 = &alpha; ~ l + 1 ( z ) &mu; z , l + [ 1 - &alpha; ~ l + 1 ( z ) x l + 1 ] ;
Calculate new variance, &kappa; z , l + 1 = &alpha; ~ l + 1 ( z ) &kappa; z , l + [ 1 - &alpha; ~ l + 1 ( z ) ] ( x l + 1 - &mu; z , l ) 2 ;
Calculate mean transferred probability, &xi; l + 1 ( y , z ) = &alpha; &xi; l ( y , z ) + ( 1 - &alpha; ) &xi; l + 1 | &lambda; l ( y , z ) ;
Calculate transition probability, a yz , l + 1 = a yz , l + &xi; l + 1 | &lambda; l ( y , z ) a yz , l - &xi; l + 1 | &lambda; l ( y , 1 - z ) 1 - a yz , l K a yz , l 2 &xi; &OverBar; l + 1 ( y , z ) + K ( 1 - a yz , l ) 2 &xi; &OverBar; l + 1 ( y , 1 - z ) .
CN201110141137.5A 2011-05-27 2011-05-27 Method for estimating noise power spectrum and voice activity Active CN102800322B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110141137.5A CN102800322B (en) 2011-05-27 2011-05-27 Method for estimating noise power spectrum and voice activity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110141137.5A CN102800322B (en) 2011-05-27 2011-05-27 Method for estimating noise power spectrum and voice activity

Publications (2)

Publication Number Publication Date
CN102800322A CN102800322A (en) 2012-11-28
CN102800322B true CN102800322B (en) 2014-03-26

Family

ID=47199411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110141137.5A Active CN102800322B (en) 2011-05-27 2011-05-27 Method for estimating noise power spectrum and voice activity

Country Status (1)

Country Link
CN (1) CN102800322B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103489454B (en) * 2013-09-22 2016-01-20 浙江大学 Based on the sound end detecting method of wave configuration feature cluster
CN104575513B (en) * 2013-10-24 2017-11-21 展讯通信(上海)有限公司 The processing system of burst noise, the detection of burst noise and suppressing method and device
CN103632677B (en) 2013-11-27 2016-09-28 腾讯科技(成都)有限公司 Noisy Speech Signal processing method, device and server
CN104269180B (en) * 2014-09-29 2018-04-13 华南理工大学 A kind of quasi- clean speech building method for speech quality objective assessment
CN105810201B (en) * 2014-12-31 2019-07-02 展讯通信(上海)有限公司 Voice activity detection method and its system
DK3118851T3 (en) * 2015-07-01 2021-02-22 Oticon As IMPROVEMENT OF NOISY SPEAKING BASED ON STATISTICAL SPEECH AND NOISE MODELS
CN106571146B (en) 2015-10-13 2019-10-15 阿里巴巴集团控股有限公司 Noise signal determines method, speech de-noising method and device
CN105355199B (en) * 2015-10-20 2019-03-12 河海大学 A kind of model combination audio recognition method based on the estimation of GMM noise
CN108113646A (en) * 2016-11-28 2018-06-05 中国科学院声学研究所 A kind of detection in cardiechema signals cycle and the state dividing method of heart sound
CN106653047A (en) * 2016-12-16 2017-05-10 广州视源电子科技股份有限公司 Automatic gain control method and device for audio data
CN107731230A (en) * 2017-11-10 2018-02-23 北京联华博创科技有限公司 A kind of court's trial writing-record system and method
CN108986832B (en) * 2018-07-12 2020-12-15 北京大学深圳研究生院 Binaural voice dereverberation method and device based on voice occurrence probability and consistency
CN109741759B (en) * 2018-12-21 2020-07-31 南京理工大学 Acoustic automatic detection method for specific bird species
CN109616139B (en) * 2018-12-25 2023-11-03 平安科技(深圳)有限公司 Speech signal noise power spectral density estimation method and device
CN110136738A (en) * 2019-06-13 2019-08-16 苏州思必驰信息科技有限公司 Noise estimation method and device
JP7301154B2 (en) * 2019-09-23 2023-06-30 ▲騰▼▲訊▼科技(深▲セン▼)有限公司 Audio data processing method and its apparatus, electronic equipment and computer program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1242553A (en) * 1998-03-24 2000-01-26 松下电器产业株式会社 Speech detection system for noisy conditions
US20020188445A1 (en) * 2001-06-01 2002-12-12 Dunling Li Background noise estimation method for an improved G.729 annex B compliant voice activity detection circuit
CN101853661A (en) * 2010-05-14 2010-10-06 中国科学院声学研究所 Noise spectrum estimation and voice mobility detection method based on unsupervised learning
CN102044243A (en) * 2009-10-15 2011-05-04 华为技术有限公司 Method and device for voice activity detection (VAD) and encoder

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1242553A (en) * 1998-03-24 2000-01-26 松下电器产业株式会社 Speech detection system for noisy conditions
US20020188445A1 (en) * 2001-06-01 2002-12-12 Dunling Li Background noise estimation method for an improved G.729 annex B compliant voice activity detection circuit
CN102044243A (en) * 2009-10-15 2011-05-04 华为技术有限公司 Method and device for voice activity detection (VAD) and encoder
CN101853661A (en) * 2010-05-14 2010-10-06 中国科学院声学研究所 Noise spectrum estimation and voice mobility detection method based on unsupervised learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵力.语音识别.《语音信号处理》.机械工业出版社,2003,第100-102页,33页. *

Also Published As

Publication number Publication date
CN102800322A (en) 2012-11-28

Similar Documents

Publication Publication Date Title
CN102800322B (en) Method for estimating noise power spectrum and voice activity
CN102800316B (en) Optimal codebook design method for voiceprint recognition system based on nerve network
CN100543842C (en) Realize the method that ground unrest suppresses based on multiple statistics model and least mean-square error
CN102496363B (en) Correction method for Chinese speech synthesis tone
Cui et al. Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR
CN109192200B (en) Speech recognition method
CN101853661B (en) Noise spectrum estimation and voice mobility detection method based on unsupervised learning
US10762417B2 (en) Efficient connectionist temporal classification for binary classification
CN105023580A (en) Unsupervised noise estimation and speech enhancement method based on separable deep automatic encoding technology
Frey et al. Algonquin-learning dynamic noise models from noisy speech for robust speech recognition
US11087213B2 (en) Binary and multi-class classification systems and methods using one spike connectionist temporal classification
CN103345920A (en) Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation
US7454336B2 (en) Variational inference and learning for segmental switching state space models of hidden speech dynamics
Liang et al. An improved noise-robust voice activity detector based on hidden semi-Markov models
CN116189671B (en) Data mining method and system for language teaching
Yuan et al. Speech recognition on DSP: issues on computational efficiency and performance analysis
JP5288378B2 (en) Acoustic model speaker adaptation apparatus and computer program therefor
Pham et al. Using artificial neural network for robust voice activity detection under adverse conditions
Arslan et al. Noise robust voice activity detection based on multi-layer feed-forward neural network
Van Dalen Statistical models for noise-robust speech recognition
Yu Adaptive training for large vocabulary continuous speech recognition
Gao et al. DNN Speech Separation Algorithm Based on Improved Segmented Masking Target
Zweig et al. Speech recognition with segmental conditional random fields: final report from the 2010 JHU summer workshop
Sarkar et al. Supervector-based approaches in a discriminative framework for speaker verification in noisy environments
Djamel et al. Optimisation of multiple feature stream weights for distributed speech processing in mobile environments

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant