CN101853661B - Noise spectrum estimation and voice mobility detection method based on unsupervised learning - Google Patents

Noise spectrum estimation and voice mobility detection method based on unsupervised learning Download PDF

Info

Publication number
CN101853661B
CN101853661B CN2010101781664A CN201010178166A CN101853661B CN 101853661 B CN101853661 B CN 101853661B CN 2010101781664 A CN2010101781664 A CN 2010101781664A CN 201010178166 A CN201010178166 A CN 201010178166A CN 101853661 B CN101853661 B CN 101853661B
Authority
CN
China
Prior art keywords
lambda
kappa
alpha
voice
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010101781664A
Other languages
Chinese (zh)
Other versions
CN101853661A (en
Inventor
应冬文
颜永红
付强
潘接林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CN2010101781664A priority Critical patent/CN101853661B/en
Publication of CN101853661A publication Critical patent/CN101853661A/en
Application granted granted Critical
Publication of CN101853661B publication Critical patent/CN101853661B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The noise power Power estimation and voice mobility detection method that the present invention relates to a kind of based on unsupervised learning,Include the following steps: the log-magnitude feature 1) for voice signal on each frequency point,Establish a GMM model; 2) for one section of voice data,M frame buffer is set,Preceding M frame input signal is stored in caching,The log-magnitude spectrum of M frame in caching is extracted,The GMM model for substituting into step 1) is initialized,The model λ 0 initialized,k; 3) in the model λ 0 initialized,After k,Since M+1 frame,Using the method for incremental learning,GMM model is updated frame by frame,Successively recursion obtains
Figure DSA00000122082600011
And obtain noise figure
Figure DSA00000122082600012
With probability of occurrence of the voice signal on k-th of frequency point of the i-th frame. The present invention is the tight coupling solution of Power estimation and voice mobility detection, can enhance voice application system to the adaptability of noise circumstance; The present invention independent of " noise starting " it is assumed that also, the present invention description of the voice mobility on time-frequency two-dimensional space can also be provided.

Description

Noise spectrum estimation and voice mobility detection method based on unsupervised learning
Technical field
The present invention relates to voice process technology field, specifically, the present invention relates to a kind of noise power Power estimation based on unsupervised learning and voice mobility detection method.Wherein, voice mobility detection is that the algorithm whether voice occurs is judged on time dimension, and it can answer existence in the form of "Yes" or "No", and the existence of voice can also be described with voice probability of occurrence.
Background technology
Most voice application system has to face ambient noise interference.Forefathers propose many methods and remove interference of the noise to voice system, and almost all of method all relies on voice mobility detection and noise power spectrum is estimated.The two modules have close contact, and their accuracy directly affects the overall noiseproof feature of system.Traditional solution has following Railway Project:
1. in general anti-noise algorithm, voice mobility detection and noise power Power estimation are a loose couplings cascaded, first calculate the mobility of voice, are then composed according to mobility come estimating noise power.Voice mobility detection device directly affects the accuracy of noise power Power estimation to the sensitivity of voice signal.
Voice mobility detection device is excessively sensitive, is easily caused underestimating for noise power spectrum;Conversely, it is excessively blunt, it is easily caused over-evaluating for noise power spectrum.Therefore, generally require to adjust the sensitivity of speech detector in traditional scheme according to noise circumstance, influence is brought on the adaptability of noise circumstance to system.
2. traditional solution is the mode based on semi-supervised learning.In initial period, general system need to make " noise starting " it is assumed that the beginning of i.e. hypothesis sentence is constantly present one section of non-speech audio.
This section of non-speech audio can be understood as the ambient noise sample manually marked, the initialization model of noise be set up from these mark samples, this is a kind of supervised learning method.Its defect is:This hypothesis is difficult to be met in some applications, such as when sentence is started with voice signal, then the initialization for causing noise model is failed, and then make it that speech detection and noise power Power estimation are all inaccurate.Follow-up phase after the initialization model of noise is set up, traditional solution is mostly using the result of detection and estimation come more new model, and this learning method is decision making-oriented, and it is a kind of non-supervisory study.
The learning method of this decision making-oriented, by the output result of estimation/detector, feedback is fed back for more new model.But, incorrect result is easily fed back to model by it, causes the precise decreasing of model, and model further results in the precise decreasing of estimation/detection.So mistake is progressively accumulated over time, and systematic function also can progressively decline over time.Supervised learning in initial period, adds the unsupervised learning in follow-up phase, forms a semi-supervised learning process.Two problems in initial period and follow-up phase, are all due to caused by the mode of this semi-supervised learning.
3. conventional most of voice mobility detection devices are only to provide description of the voice mobility on time dimension, lack description of the voice mobility in frequency domain dimension, therefore further process of refinement can not be carried out to noise.
The content of the invention
The present invention is directed to the shortcoming of conventional voice mobility detection device and noise power spectrum estimator, propose a tightly coupled solution, voice mobility detection and noise power Power estimation is set to obtain unification under a unsupervised learning framework, so as to strengthen adaptability of the voice application system to noise circumstance.In addition, the invention independent of " noise starting " it is assumed that practicality is stronger than traditional method;Meanwhile, the present invention also provides description of the voice mobility in time frequency space, is conducive to carrying out further process of refinement to noise.
For achieving the above object, the invention provides a kind of noise power Power estimation based on unsupervised learning and voice mobility detection method, as shown in Fig. 2 comprising the following steps:
1) the log-magnitude feature for voice signal on each frequency, sets up a GMM model, and mathematic(al) representation is as follows:
p ( x i , k | λ i , k ) = w i , k ( 0 ) p ( x i , k | h = 0 , λ i , k ) + w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) ;
Wherein, the Gaussian component of GMM model is expressed as:
p ( x i , k | h , λ i , k ) = 1 2 π κ i , k ( h ) exp { - 1 2 ( x i , k - μ i , k ( h ) ) 2 } ,
Wherein, xI, kRepresent the log-magnitude spectrum on k-th of frequency of the i-th frame, h ∈ { 0,1 },
Figure GDA0000131578150000023
GMM weight coefficient is represented,
Figure GDA0000131578150000024
WithAverage and variance are represented respectively, and wherein h=1 represents speech components, and h=0 represents noise component(s); λ i , k = { μ i , k ( 1 ) , μ i , k ( 0 ) , κ i , k ( 1 ) , κ i , k ( 0 ) , w i , k ( 1 ) , w i , k ( 0 ) } Represent the parameter set of gauss hybrid models;
2) for one section of speech data, set M frame buffers, preceding M frames input signal deposit caching in, extract caching in M frames log-magnitude spectrum, substitute into step 1) GMM model initialized, the model λ initialized0, k;Initialization procedure is using constraint EM algorithms;
3) in the model λ initialized0, kAfterwards, since M+1 frames, using the method for incremental learning, GMM model is updated frame by frame, recursion is obtained successively λ i , k = { μ i , k ( 1 ) , μ i , k ( 0 ) , κ i , k ( 1 ) , κ i , k ( 0 ) , w i , k ( 1 ) , w i , k ( 0 ) } , And draw noise figure
Figure GDA0000131578150000032
With probability of occurrence of the voice signal on k-th of frequency of the i-th frame:
p ( h = 1 | x i , k , λ i , k ) = w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) w i , k ( 0 ) p ( x i , k | h = 0 , λ i , k ) + w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) ,
Wherein i=1,2,3 ... ....
Wherein, the Increment Learning Algorithm includes recursion weight coefficient, recursion average and recursion variance;
Recursion weight coefficient method is: w i + 1 , k ( h ) = α w i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) ;
Recursion Mean Method is: μ i + 1 , k ( h ) = α w i , k ( h ) μ i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) x i + 1 , k w i + 1 , k ( h ) ; Or μ i + 1 , k ( h ) = α μ μ i , k ( h ) + ( 1 - α μ ) p ( h | x i + 1 , k , λ i , k ) x i + 1 , k ;
Recursion Variance Method is: κ i + 1 , k ( h ) = α w i , k ( h ) κ i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i + 1 , k ( h ) ) 2 w i + 1 , k ( h ) ; Or κ i + 1 , k ( h ) = α κ κ i , k ( h ) + ( 1 - α κ ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i + 1 , k ( h ) ) 2 ; Or κ i + 1 , k ( h ) = α κ κ i , k ( h ) + ( 1 - α κ ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i + k ( h ) ) 2 ;
Wherein, ακ、αμThe smoothing factor that 1 cut-grafting is bordering on 1 is less than with α.
Compared with prior art, the present invention has following technique effect:
The present invention is a kind of voice mobility detection and the tightly coupled scheme of noise power Power estimation, can strengthen adaptability of the voice application system to noise circumstance;In addition, the present invention independent of " noise starting " it is assumed that with stronger practicality;Also, the present invention can also provide the description of voice mobility in time-frequency two-dimensional spatially, be conducive to carrying out further process of refinement to noise.
Brief description of the drawings
Fig. 1 shows one section of voice time domain figure and sound spectrograph by noise jamming;
Wherein (a) is partly one section of sound spectrograph destroyed by white noise, and signal to noise ratio is 0dB;(b) probability graph partly existed for voice signal, the gray scale in figure represents that the probability of (i.e. in the presence of) occurs in voice signal;As can be seen that the presence probability of this method output accurately describes the structure of sound spectrograph from the contrast of (a) and (b) figure.
Fig. 2 is a kind of noise power Power estimation based on unsupervised learning of the present invention and the flow chart of voice mobility detection method.
Embodiment
The present invention proposes a kind of noise power Power estimation based on unsupervised learning framework and voice mobility detection method.The maximum feature of unsupervised learning framework is that the model of noise and voice messaging is set up in a kind of non-supervisory mode, no matter model initialization or at no point in the update process, all independent of the information manually marked.Specifically, it has following feature:
● in initial phase, originated independent of noise it is assumed that so the application of the invention is more wide in range than general solution application.
● at no point in the update process, it is not necessary to feedback information, therefore, the problem of error accumulation it can be eased to a certain extent.
● it is tightly coupled relation between them, it is only necessary to just can be with regulating system by a few parameters while provide the information of voice mobility and the information of noise power spectrum.And in loosely coupled system, voice mobility module and noise detection module have respective regulation parameter, parameter is more, and system is difficult to adjust.
● voice mobility is the two-dimensional signal of one " time --- frequency ", and other voice mobility detection algorithms merely depict existence of the voice on time dimension.
In one embodiment, the carrier of unsupervised learning framework is the gauss hybrid models (Gaussian Mixture Model, be abbreviated as GMM) of double components.The distribution of one of representation in components speech energy, another component is the distribution of noise energy.Frequency band is divided into 8 subbands, energy envelope is extracted on each subband, and set up a corresponding GMM by the present invention according to melscale.EM algorithm initialization GMM are used first, then come progressive updating GMM by the way of incremental learning.According to GMM model, the mobility on this subband of voice and the power spectral information of noise are deduced out respectively.
The present invention is fitted using the GMM with Prescribed Properties to the spectrum-envelope of voice.
In fit procedure, average, weight conjunction variance respectively to GMM etc. enter row constraint.No matter in EM algorithms or during incremental learning, require
Figure GDA0000131578150000041
κ i + 1 , k ( 1 ) = max { κ i + 1 , k ( 0 ) , κ i + 1 , k ( 1 ) } , w i + 1 , k ( 1 ) = max { w i + 1 , k ( 1 ) , ϵ } , And w i + 1 , k ( 0 ) = 1 - w i + 1 , k ( 1 ) .
Wherein, for GMM Increment Learning Algorithm, the calculating of recursion weight coefficient, recursion average and recursion variance is specifically included.1) recursion weight coefficient: w i + 1 , k ( h ) = α w i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) . Wherein α be one be less than 1 but close to 1 smoothing factor, such as α=0.99.
2) recursion average. μ i + 1 , k ( h ) = α w i , k ( h ) μ i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) x i + 1 , k w i + 1 , k ( h ) ; Or μ i + 1 , k ( h ) = α μ μ i , k ( h ) + ( 1 - α μ ) p ( h | x i + 1 , k , λ i , k ) x i + 1 , k . Wherein αμBe one be less than 1 but close to 1 smoothing factor, such as αμ=0.99.
3) recursion variance. κ i + 1 , k ( h ) = α w i , k ( h ) κ i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i + 1 , k ( h ) ) 2 w i + 1 , k ( h ) ; Or κ i + 1 , k ( h ) = α κ κ i , k ( h ) + ( 1 - α κ ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i + 1 , k ( h ) ) 2 ; Or κ i + 1 , k ( h ) = α κ κ i , k ( h ) + ( 1 - α κ ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i + k ( h ) ) 2 . Wherein ακBe one be less than 1 but close to 1 smoothing factor, such as ακ=0.99.
The present invention is further described through with reference to a preferred embodiment.
The principle of the present invention is as follows:
For log-magnitude feature of the voice signal on each frequency, a gauss hybrid models GMM is set up, this model changes with the change of input signal over time.The mathematic(al) representation of model is as follows:
p ( x i , k | λ i , k ) = w i , k ( 0 ) p ( x i , k | h = 0 , λ i , k ) + w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k )
Gaussian component wherein in GMM model is expressed as:
p ( x i , k | h , λ i , k ) = 1 2 π κ i , k ( h ) exp { - 1 2 ( x i , k - μ i , k ( h ) ) 2 }
Here xI, kRepresent the log-magnitude spectrum on k-th of frequency of the i-th frame, h represents the classification of Gaussian component, h ∈ { 0,1 },
Figure GDA0000131578150000057
GMM weight coefficient is represented,
Figure GDA0000131578150000058
With
Figure GDA0000131578150000059
Average and variance are represented respectively.Wherein h=1 represents speech components, and h=0 represents noise component(s). λ i , k = { μ i , k ( 1 ) , μ i , k ( 0 ) , κ i , k ( 1 ) , κ i , k ( 0 ) , w i , k ( 1 ) , w i , k ( 0 ) } Represent the parameter set of gauss hybrid models.
In this model
Figure GDA00001315781500000511
It is exactly that we want the noise of estimation.Meanwhile, we can derive probability of occurrence of the voice signal on k-th of frequency of the i-th frame:
p ( h = 1 | x i , k , λ i , k ) = w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) w i , k ( 0 ) p ( x i , k | h = 0 , λ i , k ) + w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k )
Based on above-mentioned principle, according to one embodiment of present invention, as shown in Fig. 2 the noise power Power estimation and voice mobility detection method comprise the following steps:
Step 100:M frame buffers are set, in preceding M frames input signal deposit caching, the amplitude spectrum of M frames in caching are extracted.The method for extracting frame amplitude spectrum is as follows:
Digitized sound signal first to this frame pre-processes and (according to system actual conditions, can include adding window, preemphasis etc.), if being F points, first zero padding to N points (wherein N >=F, N=2 per frame lengthj, j is integer and j >=8), carries out leaf transformation in N point discrete Fouriers, obtains discrete spectrum
Figure GDA0000131578150000061
Wherein yI, nRepresent n-th of sampled point of the i-th frame in caching, YI, kRepresent k-th of Fourier transformation value (k=0,1 ..., N-1) of the i-th frame in caching.So, its range value may be calculated xI, k=20*log10|YI, k|。
Step 200:GMM initialization.The gauss hybrid models λ of a double components is initialized on each frequency kI, k, wherein subscript i represents time, λI=0, kRepresent the model of initialization.Initialization procedure is using constraint EM algorithms, and on some frequency k, specific initialization step is as follows:
Step 201:M+1 sample is divided into by two classes by the method for cluster (such as IBG Non-surveillance clusterings, or fuzzy clustering etc.): { x i j , k ( 1 ) | j = 0,1 , . . . , M 1 } With { x i j , k ( 0 ) | j = 0,1 , . . . , M 0 } , Wherein M0+M1- 1=M, the larger class of average represents that another kind of use subscript (0) represents with subscript (1).The average of two classes is
Figure GDA0000131578150000064
The average of the less class of energy is
Figure GDA0000131578150000065
Wherein
Figure GDA0000131578150000066
The variance of two classes is respectively: κ ‾ 0 , k ( 0 ) = 1 M 0 + 1 Σ j = 0 M 0 ( x i j , k - μ ‾ 0 , k ( 0 ) ) 2 , κ ‾ 0 , k ( 1 ) = 1 M 1 + 1 Σ j = 0 M 1 ( x i j , k - μ ‾ 0 , k ( 1 ) ) 2 . The initialization weight coefficient of two classes:
Figure GDA0000131578150000069
The likelihood score of novel model of calculating,
Figure GDA00001315781500000610
In following iterative process, old model parameter set expression is λ '0, k, new model parameter is: λ ‾ 0 , k = { μ ‾ 0 , k ( 1 ) , μ ‾ 0 , k ( 0 ) , κ ‾ 0 , k ( 1 ) , κ ‾ 0 , k ( 0 ) , w ‾ 0 , k ( 1 ) , w ‾ 0 , k ( 0 ) } . Before iteration is started,
Figure GDA00001315781500000612
L′kIt is set to a very big number, such as L 'k=-10000.Start interative computation below.
Step 202:The probability that noise and voice occur is calculated, p ( h | x i , k , λ 0 , k ′ ) = w 0 , k ( h ) p ( x i , k | h , λ 0 , k ′ ) Σ h w 0 , k ( h ) p ( x i , k | h , λ 0 , k ′ ) , h ∈ { 0,1 } ;
Step 203:Calculate new weight coefficient: w ‾ 0 , k ( h ) = 1 M + 1 Σ j = 0 M p ( h | x j , λ 0 , k ′ ) ;
Step 204:If
Figure GDA0000131578150000071
Then stop iteration, while λ0, k=λ '0, k;Wherein υ is a number close to 0 and more than 0, such as υ=0.05.
Step 205:Calculate new average: μ ‾ 0 , k ( h ) = Σ j = 0 M x j p ( h | x j , λ 0 , k ′ ) ( M + 1 ) w ‾ 0 , k ( h ) ;
Step 206:Row constraint is entered to new average:
Figure GDA0000131578150000073
Wherein δ is a constant, and span is between 1 to 10.
Step 207:New variance is calculated, κ ‾ 0 , k ( h ) = Σ j = 0 M ( x j - μ ‾ 0 , k ( h ) ) 2 p ( h | x j , λ 0 , k ′ ) ( M + 1 ) w ‾ 0 , k ( h ) ;
Step 208:Row constraint is entered to new variance,
Figure GDA0000131578150000075
Step 209:The likelihood score of novel model of calculating
Figure GDA0000131578150000076
Step 210:If meeting condition
Figure GDA0000131578150000077
Iteration is terminated, wherein ε is the numeral of a very little, such as ε=0.1.If
Figure GDA0000131578150000079
Figure GDA00001315781500000710
Figure GDA00001315781500000711
Iteration is jumped to " step 202 ".
Step 300:GMM progressive updating.Setting up the model λ of initialization0, kAfterwards, since M+1 frames, using the method for incremental learning, GMM model is updated frame by frame.Iterative process can be expressed as:On each frequency k, it is known that λI, kWith current observed value xI+1, k, infer λI+1, k.Fourier transform is carried out for i+1 frame, Y is obtainedI+1, k, wherein 0≤k < N.On each frequency k, amplitude spectrum x is calculatedI, k=20*log10|YI, k|.For k-th of frequency, specific iterative step is as follows:
Step 301:The probability that noise and voice occur is calculated, p ( h | x i + 1 , k , λ i , k ) = w i , k ( h ) p ( x i + 1 , k | h , λ 0 , k ) Σ h w i , k ( h ) p ( x i + 1 , k | h , λ 0 , k ) , H ∈ { 0,1 }.
Step 302:Calculate new weight coefficient: w i + 1 , k ( h ) = α w i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) . Wherein, α be one be less than 1 but close to 1 smoothing factor, such as α=0.99.
Step 303:Row constraint is entered to new weight coefficient,And
Figure GDA00001315781500000715
Step 304:New average is calculated, μ i + 1 , k ( h ) = α w i , k ( h ) μ i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) x i + 1 , k w i + 1 , k ( h ) .
Step 305:Row constraint is entered to new average:
Step 306:New variance is calculated, κ i + 1 , k ( h ) = α w i , k ( h ) κ i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i + 1 , k ( h ) ) 2 w i + 1 , k ( h )
Step 307:Row constraint is entered to new variance,
Figure GDA0000131578150000082
From above sub-step, we obtain λI+1, kIn all parameter so that obtained corresponding voice probability of occurrence p (h | xI+1, k, λI, k) and noise signal power spectrum valuation
Figure GDA0000131578150000083
Algorithm based on above-described embodiment, performance to noise power Power estimation is evaluated, using each 8 sentences of men and women's words person's speech data in TIMIT databases, and white Gaussian noise in NOISEX92 noise databases, F16 fight support storehouse noise and babble noises are according to 0,5, the signal to noise ratio such as 10dB mixes.Evaluation index is line spectrum error, is defined as follows formula:
SegError = 1 M Σ l = 1 M { 10 log 10 Σ k = 0 N - 1 D 2 ( k , l ) / Σ k = 0 N - 1 [ D ( k , l ) - D ^ ( k , l ) ] 2 }
Wherein D (k, l) represents actual noise amplitude spectrum,
Figure GDA0000131578150000085
The noise amplitude spectrum of estimation is represented, notices that SegErr values are smaller, estimate is represented closer to actual value, it is about accurate to estimate.Algorithm is compared respectively at three kinds of noise power spectrum algorithm for estimating of current main-stream, wherein MS represents minimum statistics algorithm, MCRA represents the recurrence average algorithm of minimum control, and IMCRA represents that the minimum control of raising version returns average algorithm, and TV-GMM is algorithm of the invention.Table 1 indicates line spectrum error SegError result.
Table 1
Figure GDA0000131578150000086
As can be seen from the above table, algorithm proposed by the present invention is respectively provided with obvious advantage for three kinds of algorithms of current main flow.

Claims (2)

1. a kind of noise power Power estimation based on unsupervised learning and voice mobility detection method, comprise the following steps:
1) the log-magnitude feature for voice signal on each frequency, sets up a GMM model, and mathematic(al) representation is as follows:
p ( x i , k | λ i , k ) = w i , k ( 0 ) p ( x i , k | h = 0 , λ i , k ) + w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) ;
Wherein, the Gaussian component of GMM model is expressed as:
p ( x i , k | h , λ i , k ) = 1 2 π κ i , k ( h ) exp { - 1 2 ( x i , k - μ i , k ( h ) ) 2 } ,
Wherein, xI, kRepresent the log-magnitude spectrum on k-th of frequency of the i-th frame, h ∈ { 0,1 },
Figure FDA0000131578140000013
GMM weight coefficient is represented,With
Figure FDA0000131578140000015
Average and variance are represented respectively, and wherein h=1 represents speech components, and h=0 represents noise component(s); λ i , k = { μ i , k ( 1 ) , μ i , k ( 0 ) , κ i , k ( 1 ) , κ i , k ( 0 ) , w i , k ( 1 ) , w i , k ( 0 ) } Represent the parameter set of gauss hybrid models;
2) for one section of speech data, set M frame buffers, preceding M frames input signal deposit caching in, extract caching in M frames log-magnitude spectrum, substitute into step 1) GMM model initialized, the model λ initialized0, k;Initialization procedure is using constraint EM algorithms;
3) in the model λ initialized0, kAfterwards, since M+1 frames, using the method for incremental learning, the GMM model of each frequency band is updated frame by frame, recursion is obtained successively
Figure FDA0000131578140000017
And draw noise figure
Figure FDA0000131578140000018
With probability of occurrence of the voice signal on k-th of frequency of the i-th frame:
p ( h = 1 | x i , k , λ i , k ) = w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) w i , k ( 0 ) p ( x i , k | h = 0 , λ i , k ) + w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) ,
Wherein i=1,2,3 ... ....
2. noise power Power estimation according to claim 1 and voice mobility detection method, it is characterised in that the Increment Learning Algorithm includes:Recursion weight coefficient, recursion average and recursion variance;
Recursion weight coefficient method is: w i + 1 , k ( h ) = α w i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) ;
Recursion Mean Method is: μ i + 1 , k ( h ) = α w i , k ( h ) μ i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) x i + 1 , k w i + 1 , k ( h ) ; Or
μ i + 1 , k ( h ) = α μ μ i , k ( h ) + ( 1 - α μ ) p ( h | x i + 1 , k , λ i , k ) x i + 1 , k ;
Recursion Variance Method is: κ i + 1 , k ( h ) = α w i , k ( h ) κ i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i + 1 , k ( h ) ) 2 w i + 1 , k ( h ) ; Or
κ i + 1 , k ( h ) = α κ κ i , k ( h ) + ( 1 - α κ ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i + 1 , k ( h ) ) 2 ; Or
κ i + 1 , k ( h ) = α κ κ i , k ( h ) + ( 1 - α κ ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i + 1 , k ( h ) ) 2 ;
Wherein, ακ、αμWith α be less than 1 and close to 1 smoothing factor.
CN2010101781664A 2010-05-14 2010-05-14 Noise spectrum estimation and voice mobility detection method based on unsupervised learning Expired - Fee Related CN101853661B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010101781664A CN101853661B (en) 2010-05-14 2010-05-14 Noise spectrum estimation and voice mobility detection method based on unsupervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010101781664A CN101853661B (en) 2010-05-14 2010-05-14 Noise spectrum estimation and voice mobility detection method based on unsupervised learning

Publications (2)

Publication Number Publication Date
CN101853661A CN101853661A (en) 2010-10-06
CN101853661B true CN101853661B (en) 2012-05-30

Family

ID=42805116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010101781664A Expired - Fee Related CN101853661B (en) 2010-05-14 2010-05-14 Noise spectrum estimation and voice mobility detection method based on unsupervised learning

Country Status (1)

Country Link
CN (1) CN101853661B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800322B (en) * 2011-05-27 2014-03-26 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity
CN103839544B (en) * 2012-11-27 2016-09-07 展讯通信(上海)有限公司 Voice-activation detecting method and device
CN104575513B (en) * 2013-10-24 2017-11-21 展讯通信(上海)有限公司 The processing system of burst noise, the detection of burst noise and suppressing method and device
CN104464728A (en) * 2014-11-26 2015-03-25 河海大学 Speech enhancement method based on Gaussian mixture model (GMM) noise estimation
CN105989843A (en) * 2015-01-28 2016-10-05 中兴通讯股份有限公司 Method and device of realizing missing feature reconstruction
CN106571146B (en) 2015-10-13 2019-10-15 阿里巴巴集团控股有限公司 Noise signal determines method, speech de-noising method and device
CN107731230A (en) * 2017-11-10 2018-02-23 北京联华博创科技有限公司 A kind of court's trial writing-record system and method
CN107818780B (en) * 2017-11-13 2020-09-18 河海大学 Robust speech recognition method based on nonlinear feature compensation
CN110675885B (en) * 2019-10-17 2022-03-22 浙江大华技术股份有限公司 Sound mixing method, device and storage medium
CN111739562B (en) * 2020-07-22 2022-12-23 上海大学 Voice activity detection method based on data selectivity and Gaussian mixture model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101226742A (en) * 2007-12-05 2008-07-23 浙江大学 Method for recognizing sound-groove based on affection compensation
CN101464950A (en) * 2009-01-16 2009-06-24 北京航空航天大学 Video human face identification and retrieval method based on on-line learning and Bayesian inference

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101226742A (en) * 2007-12-05 2008-07-23 浙江大学 Method for recognizing sound-groove based on affection compensation
CN101464950A (en) * 2009-01-16 2009-06-24 北京航空航天大学 Video human face identification and retrieval method based on on-line learning and Bayesian inference

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Yongxin Zhang et al.Effective online unsupervised adaptation of Gaussian mixture models and its application to speech classification.《Pattern Recognition Letters》.2007,(第29期), *

Also Published As

Publication number Publication date
CN101853661A (en) 2010-10-06

Similar Documents

Publication Publication Date Title
CN101853661B (en) Noise spectrum estimation and voice mobility detection method based on unsupervised learning
CN100543842C (en) Realize the method that ground unrest suppresses based on multiple statistics model and least mean-square error
Lu et al. Estimators of the magnitude-squared spectrum and methods for incorporating SNR uncertainty
KR100919223B1 (en) The method and apparatus for speech recognition using uncertainty information in noise environment
US9245524B2 (en) Speech recognition device, speech recognition method, and computer readable medium
CN104464728A (en) Speech enhancement method based on Gaussian mixture model (GMM) noise estimation
CN102800322A (en) Method for estimating noise power spectrum and voice activity
CN104485103A (en) Vector Taylor series-based multi-environment model isolated word identifying method
CN105355198A (en) Multiple self-adaption based model compensation type speech recognition method
CN104732972A (en) HMM voiceprint recognition signing-in method and system based on grouping statistics
Hu et al. An iterative model-based approach to cochannel speech separation
CN103345920B (en) Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation
CN115758082A (en) Fault diagnosis method for rail transit transformer
US7236930B2 (en) Method to extend operating range of joint additive and convolutive compensating algorithms
Sunnydayal Speech enhancement using posterior regularized NMF with bases update
Katsir et al. Evaluation of a speech bandwidth extension algorithm based on vocal tract shape estimation
Astudillo et al. Uncertainty propagation
He et al. Spectrum enhancement with sparse coding for robust speech recognition
Badiezadegan et al. A wavelet-based thresholding approach to reconstructing unreliable spectrogram components
Dat et al. On-line Gaussian mixture modeling in the log-power domain for signal-to-noise ratio estimation and speech enhancement
You et al. Sparse representation with optimized learned dictionary for robust voice activity detection
Molla et al. Voiced/non-voiced speech classification using adaptive thresholding with bivariate EMD
Yechuri et al. Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain
Chehresa et al. MMSE speech enhancement based on GMM and solving an over-determined system of equations
Islam et al. Speech enhancement based on noise compensated magnitude spectrum

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120530

CF01 Termination of patent right due to non-payment of annual fee