CN104064179B

CN104064179B - A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers

Info

Publication number: CN104064179B
Application number: CN201410281284.6A
Authority: CN
Inventors: 刘明; 王明江
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2014-06-20
Filing date: 2014-06-20
Publication date: 2018-06-08
Anticipated expiration: 2034-06-20
Also published as: CN104064179A

Abstract

The present invention provides a kind of method for improving recognition accuracy for the identification of extensive alone word voice, and Hidden Markov Model is established for different isolated words（HMM）The mechanism of parameter adaptive variation solves the problems, such as that different isolated words recognition accuracy and identifies that robustness is low when event number is identical in HMM probabilistic models.The experimental results showed that method of the invention under the premise of identification calculation amount is slightly increased, is effectively improved the accuracy rate of extensive alone word voice identification.When isolated word to be identified is 5120 word, the average value of multiple recognition accuracy has been increased to 97.3% by 91%；When isolated word to be identified is 10240 word, the average value of multiple recognition accuracy has been increased to 96.3% by 87%.Compared to the speech recognition of traditional static models based on statistical probability, using inventive process have the advantage that parameter for the adaptive adjustment identification model of different user, so as to improve the accuracy rate of identification.

Description

A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers

Technical field

The present invention relates to alone word voices to identify field, and in particular to a kind of standard for improving extensive alone word voice identification The method of true rate.

Background technology

Voice after obtaining cluster coding, merely judges some in characteristic parameter extraction by Euclidean distance at this time It is very inaccurate when word to be identified belongs to the cluster of which of dictionary word.In voice rule be statistically Probabilistic model, and Euclidean distance reaction is the distance of vector distance cluster centre vector, thus need to obtained parameter with Code book does further training, establishes more accurate Statistical Probabilistic Models, so as to which preferably reflection characteristic parameter is in voice In the embodiment of rule.Hidden Markov (HMM) model is that a kind of reaction event redirects probability, observation sample probability of occurrence very Good mathematical model, therefore speech characteristic parameter is handled according to certain algorithm, obtain HMM probabilistic models.

Hidden Markov model is a kind of probabilistic model represented with parameter, for describing statistics of random processes characteristic, by Markov chain develops, and is always a research hotspot of speech recognition, is obtained in the every field of speech processes extensive Using.The foundation of the HMM probability templates of voice needs the cluster coding of the characteristic parameter vector of voice, and speech vector encodes, generally Rate template training process carries out forward, backward probability calculation, until obtaining a convergent probabilistic model.

Acoustic model is typically to be generated after the speech characteristic parameter of acquisition is trained using specific probabilistic algorithm. In speech recognition based on HMM, an acoustic model is exactly a HMM model, typically makes the speech characteristic parameter of acquisition Generation HMM model set after algorithm is trained is redirected with HMM probability.Voice to be identified is consistent with HMM model by extracting Characteristic parameter using backward Bayesian probability algorithm, calculates posterior probability, generates the HMM probability moulds of maximum posterior probability Speech samples representated by plate are voice to be identified.

For voice data, mainly when frequency sampling and Spectrum Conversion, the voice for having slightly time-frequency characteristic difference all may be used To establish the HMM model of husband.Secondly, model training is exactly that the parameter of HMM is adjusted using existing sample, is made it Enough corresponding speech probability features of accurate description different phonetic.The process that model is established to voice is actually to voice doing mathematics Modeling, and assume that corresponding voice spy identification probability has these mathematical models to be calculated, and there are one extreme values.To HMM For, be mainly to determine the Basic Topological of model, including event number, event skipped mode and redirect probability etc..

Different words, the event number of corresponding HMM model is different, even if human ear thinks identical voice (same word) due to the pronunciation of different people, tone, accent, also results in the difference of HMM parameters, i.e. its HMM thing included Number of packages is different.With the increase of isolated word number, if using same HMM event numbers, it is clear that accuracy rate can decline.

Invention content

To solve problems of the prior art, the present invention proposes a kind of event for changing HMM model by dynamic The method for counting to improve extensive alone word voice recognition accuracy is solved with the increase of identification isolated word quantity and is identified The problem of accuracy rate declines.

The invention is realized by the following technical scheme：

A kind of method of the extensive alone word voice recognition accuracy of raising based on dynamic HMM event numbers, including following Step：

A. the parameter of initial HMM model is provided, the parameter includes event number N and observation symbolic number M, the HMM moulds Type is using from left to right without across model structure；Wherein, primary event number is 40, and observation number of symbols is 32, observation sequence Number is that redirect probability matrix be 40 × 20 to 20, HMM events, and one can be obtained by the number and observation number of symbols of observation sequence 20 × 32 observation sequence probability matrix；Primary event probability vector is the row matrix of one 1 × 20；

B. it is trained using Baum-Welch algorithms according to initial HMM event numbers, observation sequence number and observation symbolic number The HMM model arrived carries out alone word voice identification, observes the accuracy rate and robustness of identification；

C. HMM event number N values are dynamically changed, step-length 2 continues training and obtains new HMM model, and used in training Voice in dictionary carries out alone word voice identification, after the completion for the treatment of that all words all identify, counts and changes every time obtained by HMM event numbers The recognition accuracy and the probability variance of identification arrived；Repeat the step, find accuracy rate it is maximum and with probability variance minimum when institute Corresponding HMM event numbers N；

D. by characteristic parameter extraction after the voice typing of user, with reference to the HMM model parameter that step C is obtained, before After probability calculation and providing recognition result；Then, automatically by equivalent in the voice combination dictionary of the vocabulary of user's typing The voice of remittance is trained, and changes the event number of HMM again, and the HMM model for particular person and its best thing is calculated Number of packages M.

The beneficial effects of the invention are as follows：The present invention provides a kind of improve for the identification of extensive alone word voice and identifies accurately The method of rate establishes the mechanism of Hidden Markov Model (HMM) parameter adaptive variation for different isolated words, solves Different isolated words the problem of recognition accuracy is low with identification robustness when event number is identical in HMM probabilistic models.Experiment knot Fruit shows that method of the invention under the premise of identification calculation amount is slightly increased, is effectively improved extensive alone word voice The accuracy rate of identification.When isolated word to be identified is 5120 word, the average value of multiple recognition accuracy is increased to by 91% 97.3%；When isolated word to be identified is 10240 word, the average value of multiple recognition accuracy has been increased to 96.3% by 87%.Phase Than in the speech recognition of traditional static models based on statistical probability, using inventive process have the advantage that for different use The parameter of the adaptive adjustment identification model in family, so as to improve the accuracy rate of identification.

Description of the drawings

Fig. 1 is the flow chart of the method for the raising speech recognition accuracy based on dynamic HMM event numbers of the present invention.

Specific embodiment

The present invention is further described for explanation and specific embodiment below in conjunction with the accompanying drawings.

The hidden Markov HMM model probability parameter that the present invention uses is as follows：

(1) N, the event number in HMM model.Event number is implicit in HMM model, in statement afterwards, marks mould Each event in type is { S₁,S₂,...,S_N, it is q in the event residing for t moment_t。

(2) M, it is observed that the number of the element in sequence, that is, observe symbolic number under each event in HMM model.Mark Remember that each observation symbol is V={ v₁,v₂,L,v_M, observation sequence is O={ o₁,o₂,L,o_T, wherein o_tFor one kind in set V Symbol is observed, T is observation sequence length.

(3) event transfering probability distribution A=[a_ij], wherein

a_ij=p [q_t+1=S_j|q_t=S_i] 1≤i≤N,1≤j≤N。

(4) observation sequence probability distribution B=[b_j(k)], wherein

b_j(k)=p [o_t=v_k|q_t=S_j] 1≤k≤M,1≤j≤N。

(5) primary event probability distribution π=[π_i], wherein

π_i=P [q₁=S_i] 1≤i≤N。

The number of correct word divided by all isolated words to be identified, obtained percentage knot are identified in isolated word to be identified The accuracy rate that fruit identifies for expression.There are one recognition accuracy after each isolated word recognition, when HMM parameters change When, which can also change, and the variance of recognition accuracy is square for representing the robustness of identification under different HMM parameters Difference is smaller, and robustness is better.

HMM model is calculated by the coding that speech characteristic parameter obtains after cluster calculation according to Baum-Welch algorithms During parameter, primary event probability distribution is inessential, if meet probability and for 1, only can be to the iteration in calculating process Number has minimal effect.Therefore the primary event probability distribution π of the present invention_i=1/N.

It is specific calculate realize during, algorithm that the present invention uses for the probability calculation of Bayes's forward, backward with Baum-welch algorithms, attached drawing 1 are the flow charts of the realization of the method for the present invention, and details are as follows：

1. determining the parameter of initial HMM model first, and model is trained, after speech recognition process, obtained One initial HMM model, the probabilistic model are not optimal to different vocabulary.The structure of model include event number N and Each event is corresponding to observe symbolic number M.Alone word voice is identified, can suitable HMM events be chosen according to voice length Number, experiment show that too big event number can cause recognition accuracy to decline.To Discrete HMM, observation symbolic number is in principle by sample Space determines, but is limited by calculation amount, generally desirable 16~64, through experiment, in addition to certain words, the M of most of word 24~ The accuracy rate identified between 50 does not have too great fluctuation process.

The present invention use the primary event number of HMM, and it is 32 to observe number of symbols, the vector clusters of character pair parameter for 40 Number, i.e. M=32, at the same determine observation sequence number be 20.Therefore, it can be obtained by event number and observation symbol numbers The event of one 40 × 20 redirects probability matrix, and one 20 × 32 can be obtained by the number and observation number of symbols of observation sequence Observation sequence probability matrix；Primary event probability vector is the row matrix of one 1 × 20.

2. according to initial HMM event numbers and observation sequence, carried out using the HMM model that Baum-Welch algorithms are trained Alone word voice identifies, observes the accuracy rate and robustness of identification.In training process, change the event number of HMM model, each N The change step-length of value is 2.Continue training and obtain new HMM model, and the voice in the dictionary used in training carries out alone word Sound identifies.After the completion for the treatment of that all words all identify, count and change the obtained recognition accuracy of HMM event numbers every time and identify general Rate variance.The step is repeated, when finding accuracy rate maximum and HMM event numbers that when probability variance minimum is corresponding.

Present invention isolated word to be identified is 5120 words, and the maximum value of multiple recognition accuracy is 97.3%, to be identified isolated When word is 10240 word, the maximum value of multiple recognition accuracy is 96.3%；At this point, the event number of HMM corresponding to each word is just It is optimal, if continuing the event number of variation HMM, no matter increase or reduction, accuracy rate can all be less than the maximum value；It is and same When, the variance of each word identification probability is also minimum.

3. in practical application, user inputs voice vocabulary and realizes adaptive learning, the voice of one vocabulary of user's typing, warp After crossing parameter extraction, with reference to the HMM of each vocabulary in dictionary, through preceding after probability calculation, all probability are obtained, by row Sequence finds out maximum probability value, then the vocabulary of user's typing at this time is exactly in that dictionary corresponding to most probable value Vocabulary.Later, system obtains the new vocabulary using this vocabulary in the vocabulary combination dictionary of user's typing, re -training HMM model, i.e., new observation event number.

So far, adaptively learn to adjust model parameter the method for the invention realizes speech recognition algorithm.

The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, it is impossible to assert The specific implementation of the present invention is confined to these explanations.For those of ordinary skill in the art to which the present invention belongs, exist Under the premise of not departing from present inventive concept, several simple deduction or replace can also be made, should all be considered as belonging to the present invention's Protection domain.

Claims

A kind of 1. method of the raising speech recognition accuracy based on dynamic HMM event numbers, which is characterized in that the method includes Following steps：

A. the parameter of initial HMM model is provided, the parameter includes event number N and observation symbolic number M, and the HMM model is adopted With from left to right without across model structure；Wherein, primary event number is 40, and observation number of symbols is 32, and observation sequence number is 20, HMM events redirect probability matrix as 40 × 20, by the number and observation number of symbols of observation sequence can obtain one 20 × 32 observation sequence probability matrix；Primary event probability vector is the row matrix of one 1 × 20

B. it according to initial HMM event numbers, observation sequence number and observation symbolic number, is trained using Baum-Welch algorithms HMM model carries out alone word voice identification, observes the accuracy rate and robustness of identification；

C. for each vocabulary in dictionary used in training, HMM event number N values are dynamically changed, step-length 2 continues to train New HMM model is obtained, and the voice in the dictionary used in training carries out alone word voice identification, treats that all words all identify completion Afterwards, statistics changes the probability variance of the obtained recognition accuracy of HMM event numbers and identification every time；The step is repeated, finds standard Corresponding HMM event numbers N when true rate maximum and probability variance minimum；

D. by characteristic parameter extraction after the voice typing of user, with reference to the HMM model parameter that step C is obtained, by preceding to general After rate calculates and provides recognition result；Then, vocabulary will be corresponded in the voice combination dictionary of the vocabulary of user's typing automatically Voice is trained, and changes the event number of HMM again, and the HMM model for particular person and its best event number is calculated M, for the parameter of the adaptive adjustment identification model of different user, so as to improve the accuracy rate of identification.
2. according to the method described in claim 1, it is characterized in that：Observation symbolic number M in the step A is between 24~50 Value.
3. according to the method described in claim 1, it is characterized in that：With identifying that the number of correct word is removed in isolated word to be identified With all isolated words to be identified, the obtained result accuracy rate in per cent.
4. according to the method described in claim 1, it is characterized in that：In the step C, accuracy rate maximum and probability variance are found Corresponding HMM event number N when minimum, specially：When isolated word to be identified be 5120 words, the maximum value of multiple recognition accuracy It is 97.3%, when isolated word to be identified is 10240 word, the maximum value of multiple recognition accuracy is 96.3%.