CN104064179B - A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers - Google Patents

A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers Download PDF

Info

Publication number
CN104064179B
CN104064179B CN201410281284.6A CN201410281284A CN104064179B CN 104064179 B CN104064179 B CN 104064179B CN 201410281284 A CN201410281284 A CN 201410281284A CN 104064179 B CN104064179 B CN 104064179B
Authority
CN
China
Prior art keywords
hmm
word
event
identification
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410281284.6A
Other languages
Chinese (zh)
Other versions
CN104064179A (en
Inventor
刘明
王明江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN201410281284.6A priority Critical patent/CN104064179B/en
Publication of CN104064179A publication Critical patent/CN104064179A/en
Application granted granted Critical
Publication of CN104064179B publication Critical patent/CN104064179B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of method for improving recognition accuracy for the identification of extensive alone word voice, and Hidden Markov Model is established for different isolated words(HMM)The mechanism of parameter adaptive variation solves the problems, such as that different isolated words recognition accuracy and identifies that robustness is low when event number is identical in HMM probabilistic models.The experimental results showed that method of the invention under the premise of identification calculation amount is slightly increased, is effectively improved the accuracy rate of extensive alone word voice identification.When isolated word to be identified is 5120 word, the average value of multiple recognition accuracy has been increased to 97.3% by 91%;When isolated word to be identified is 10240 word, the average value of multiple recognition accuracy has been increased to 96.3% by 87%.Compared to the speech recognition of traditional static models based on statistical probability, using inventive process have the advantage that parameter for the adaptive adjustment identification model of different user, so as to improve the accuracy rate of identification.

Description

A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers
Technical field
The present invention relates to alone word voices to identify field, and in particular to a kind of standard for improving extensive alone word voice identification The method of true rate.
Background technology
Voice after obtaining cluster coding, merely judges some in characteristic parameter extraction by Euclidean distance at this time It is very inaccurate when word to be identified belongs to the cluster of which of dictionary word.In voice rule be statistically Probabilistic model, and Euclidean distance reaction is the distance of vector distance cluster centre vector, thus need to obtained parameter with Code book does further training, establishes more accurate Statistical Probabilistic Models, so as to which preferably reflection characteristic parameter is in voice In the embodiment of rule.Hidden Markov (HMM) model is that a kind of reaction event redirects probability, observation sample probability of occurrence very Good mathematical model, therefore speech characteristic parameter is handled according to certain algorithm, obtain HMM probabilistic models.
Hidden Markov model is a kind of probabilistic model represented with parameter, for describing statistics of random processes characteristic, by Markov chain develops, and is always a research hotspot of speech recognition, is obtained in the every field of speech processes extensive Using.The foundation of the HMM probability templates of voice needs the cluster coding of the characteristic parameter vector of voice, and speech vector encodes, generally Rate template training process carries out forward, backward probability calculation, until obtaining a convergent probabilistic model.
Acoustic model is typically to be generated after the speech characteristic parameter of acquisition is trained using specific probabilistic algorithm. In speech recognition based on HMM, an acoustic model is exactly a HMM model, typically makes the speech characteristic parameter of acquisition Generation HMM model set after algorithm is trained is redirected with HMM probability.Voice to be identified is consistent with HMM model by extracting Characteristic parameter using backward Bayesian probability algorithm, calculates posterior probability, generates the HMM probability moulds of maximum posterior probability Speech samples representated by plate are voice to be identified.
For voice data, mainly when frequency sampling and Spectrum Conversion, the voice for having slightly time-frequency characteristic difference all may be used To establish the HMM model of husband.Secondly, model training is exactly that the parameter of HMM is adjusted using existing sample, is made it Enough corresponding speech probability features of accurate description different phonetic.The process that model is established to voice is actually to voice doing mathematics Modeling, and assume that corresponding voice spy identification probability has these mathematical models to be calculated, and there are one extreme values.To HMM For, be mainly to determine the Basic Topological of model, including event number, event skipped mode and redirect probability etc..
Different words, the event number of corresponding HMM model is different, even if human ear thinks identical voice (same word) due to the pronunciation of different people, tone, accent, also results in the difference of HMM parameters, i.e. its HMM thing included Number of packages is different.With the increase of isolated word number, if using same HMM event numbers, it is clear that accuracy rate can decline.
Invention content
To solve problems of the prior art, the present invention proposes a kind of event for changing HMM model by dynamic The method for counting to improve extensive alone word voice recognition accuracy is solved with the increase of identification isolated word quantity and is identified The problem of accuracy rate declines.
The invention is realized by the following technical scheme:
A kind of method of the extensive alone word voice recognition accuracy of raising based on dynamic HMM event numbers, including following Step:
A. the parameter of initial HMM model is provided, the parameter includes event number N and observation symbolic number M, the HMM moulds Type is using from left to right without across model structure;Wherein, primary event number is 40, and observation number of symbols is 32, observation sequence Number is that redirect probability matrix be 40 × 20 to 20, HMM events, and one can be obtained by the number and observation number of symbols of observation sequence 20 × 32 observation sequence probability matrix;Primary event probability vector is the row matrix of one 1 × 20;
B. it is trained using Baum-Welch algorithms according to initial HMM event numbers, observation sequence number and observation symbolic number The HMM model arrived carries out alone word voice identification, observes the accuracy rate and robustness of identification;
C. HMM event number N values are dynamically changed, step-length 2 continues training and obtains new HMM model, and used in training Voice in dictionary carries out alone word voice identification, after the completion for the treatment of that all words all identify, counts and changes every time obtained by HMM event numbers The recognition accuracy and the probability variance of identification arrived;Repeat the step, find accuracy rate it is maximum and with probability variance minimum when institute Corresponding HMM event numbers N;
D. by characteristic parameter extraction after the voice typing of user, with reference to the HMM model parameter that step C is obtained, before After probability calculation and providing recognition result;Then, automatically by equivalent in the voice combination dictionary of the vocabulary of user's typing The voice of remittance is trained, and changes the event number of HMM again, and the HMM model for particular person and its best thing is calculated Number of packages M.
The beneficial effects of the invention are as follows:The present invention provides a kind of improve for the identification of extensive alone word voice and identifies accurately The method of rate establishes the mechanism of Hidden Markov Model (HMM) parameter adaptive variation for different isolated words, solves Different isolated words the problem of recognition accuracy is low with identification robustness when event number is identical in HMM probabilistic models.Experiment knot Fruit shows that method of the invention under the premise of identification calculation amount is slightly increased, is effectively improved extensive alone word voice The accuracy rate of identification.When isolated word to be identified is 5120 word, the average value of multiple recognition accuracy is increased to by 91% 97.3%;When isolated word to be identified is 10240 word, the average value of multiple recognition accuracy has been increased to 96.3% by 87%.Phase Than in the speech recognition of traditional static models based on statistical probability, using inventive process have the advantage that for different use The parameter of the adaptive adjustment identification model in family, so as to improve the accuracy rate of identification.
Description of the drawings
Fig. 1 is the flow chart of the method for the raising speech recognition accuracy based on dynamic HMM event numbers of the present invention.
Specific embodiment
The present invention is further described for explanation and specific embodiment below in conjunction with the accompanying drawings.
The hidden Markov HMM model probability parameter that the present invention uses is as follows:
(1) N, the event number in HMM model.Event number is implicit in HMM model, in statement afterwards, marks mould Each event in type is { S1,S2,...,SN, it is q in the event residing for t momentt
(2) M, it is observed that the number of the element in sequence, that is, observe symbolic number under each event in HMM model.Mark Remember that each observation symbol is V={ v1,v2,L,vM, observation sequence is O={ o1,o2,L,oT, wherein otFor one kind in set V Symbol is observed, T is observation sequence length.
(3) event transfering probability distribution A=[aij], wherein
aij=p [qt+1=Sj|qt=Si] 1≤i≤N,1≤j≤N。
(4) observation sequence probability distribution B=[bj(k)], wherein
bj(k)=p [ot=vk|qt=Sj] 1≤k≤M,1≤j≤N。
(5) primary event probability distribution π=[πi], wherein
πi=P [q1=Si] 1≤i≤N。
The number of correct word divided by all isolated words to be identified, obtained percentage knot are identified in isolated word to be identified The accuracy rate that fruit identifies for expression.There are one recognition accuracy after each isolated word recognition, when HMM parameters change When, which can also change, and the variance of recognition accuracy is square for representing the robustness of identification under different HMM parameters Difference is smaller, and robustness is better.
HMM model is calculated by the coding that speech characteristic parameter obtains after cluster calculation according to Baum-Welch algorithms During parameter, primary event probability distribution is inessential, if meet probability and for 1, only can be to the iteration in calculating process Number has minimal effect.Therefore the primary event probability distribution π of the present inventioni=1/N.
It is specific calculate realize during, algorithm that the present invention uses for the probability calculation of Bayes's forward, backward with Baum-welch algorithms, attached drawing 1 are the flow charts of the realization of the method for the present invention, and details are as follows:
1. determining the parameter of initial HMM model first, and model is trained, after speech recognition process, obtained One initial HMM model, the probabilistic model are not optimal to different vocabulary.The structure of model include event number N and Each event is corresponding to observe symbolic number M.Alone word voice is identified, can suitable HMM events be chosen according to voice length Number, experiment show that too big event number can cause recognition accuracy to decline.To Discrete HMM, observation symbolic number is in principle by sample Space determines, but is limited by calculation amount, generally desirable 16~64, through experiment, in addition to certain words, the M of most of word 24~ The accuracy rate identified between 50 does not have too great fluctuation process.
The present invention use the primary event number of HMM, and it is 32 to observe number of symbols, the vector clusters of character pair parameter for 40 Number, i.e. M=32, at the same determine observation sequence number be 20.Therefore, it can be obtained by event number and observation symbol numbers The event of one 40 × 20 redirects probability matrix, and one 20 × 32 can be obtained by the number and observation number of symbols of observation sequence Observation sequence probability matrix;Primary event probability vector is the row matrix of one 1 × 20.
2. according to initial HMM event numbers and observation sequence, carried out using the HMM model that Baum-Welch algorithms are trained Alone word voice identifies, observes the accuracy rate and robustness of identification.In training process, change the event number of HMM model, each N The change step-length of value is 2.Continue training and obtain new HMM model, and the voice in the dictionary used in training carries out alone word Sound identifies.After the completion for the treatment of that all words all identify, count and change the obtained recognition accuracy of HMM event numbers every time and identify general Rate variance.The step is repeated, when finding accuracy rate maximum and HMM event numbers that when probability variance minimum is corresponding.
Present invention isolated word to be identified is 5120 words, and the maximum value of multiple recognition accuracy is 97.3%, to be identified isolated When word is 10240 word, the maximum value of multiple recognition accuracy is 96.3%;At this point, the event number of HMM corresponding to each word is just It is optimal, if continuing the event number of variation HMM, no matter increase or reduction, accuracy rate can all be less than the maximum value;It is and same When, the variance of each word identification probability is also minimum.
3. in practical application, user inputs voice vocabulary and realizes adaptive learning, the voice of one vocabulary of user's typing, warp After crossing parameter extraction, with reference to the HMM of each vocabulary in dictionary, through preceding after probability calculation, all probability are obtained, by row Sequence finds out maximum probability value, then the vocabulary of user's typing at this time is exactly in that dictionary corresponding to most probable value Vocabulary.Later, system obtains the new vocabulary using this vocabulary in the vocabulary combination dictionary of user's typing, re -training HMM model, i.e., new observation event number.
So far, adaptively learn to adjust model parameter the method for the invention realizes speech recognition algorithm.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, it is impossible to assert The specific implementation of the present invention is confined to these explanations.For those of ordinary skill in the art to which the present invention belongs, exist Under the premise of not departing from present inventive concept, several simple deduction or replace can also be made, should all be considered as belonging to the present invention's Protection domain.

Claims (4)

  1. A kind of 1. method of the raising speech recognition accuracy based on dynamic HMM event numbers, which is characterized in that the method includes Following steps:
    A. the parameter of initial HMM model is provided, the parameter includes event number N and observation symbolic number M, and the HMM model is adopted With from left to right without across model structure;Wherein, primary event number is 40, and observation number of symbols is 32, and observation sequence number is 20, HMM events redirect probability matrix as 40 × 20, by the number and observation number of symbols of observation sequence can obtain one 20 × 32 observation sequence probability matrix;Primary event probability vector is the row matrix of one 1 × 20
    B. it according to initial HMM event numbers, observation sequence number and observation symbolic number, is trained using Baum-Welch algorithms HMM model carries out alone word voice identification, observes the accuracy rate and robustness of identification;
    C. for each vocabulary in dictionary used in training, HMM event number N values are dynamically changed, step-length 2 continues to train New HMM model is obtained, and the voice in the dictionary used in training carries out alone word voice identification, treats that all words all identify completion Afterwards, statistics changes the probability variance of the obtained recognition accuracy of HMM event numbers and identification every time;The step is repeated, finds standard Corresponding HMM event numbers N when true rate maximum and probability variance minimum;
    D. by characteristic parameter extraction after the voice typing of user, with reference to the HMM model parameter that step C is obtained, by preceding to general After rate calculates and provides recognition result;Then, vocabulary will be corresponded in the voice combination dictionary of the vocabulary of user's typing automatically Voice is trained, and changes the event number of HMM again, and the HMM model for particular person and its best event number is calculated M, for the parameter of the adaptive adjustment identification model of different user, so as to improve the accuracy rate of identification.
  2. 2. according to the method described in claim 1, it is characterized in that:Observation symbolic number M in the step A is between 24~50 Value.
  3. 3. according to the method described in claim 1, it is characterized in that:With identifying that the number of correct word is removed in isolated word to be identified With all isolated words to be identified, the obtained result accuracy rate in per cent.
  4. 4. according to the method described in claim 1, it is characterized in that:In the step C, accuracy rate maximum and probability variance are found Corresponding HMM event number N when minimum, specially:When isolated word to be identified be 5120 words, the maximum value of multiple recognition accuracy It is 97.3%, when isolated word to be identified is 10240 word, the maximum value of multiple recognition accuracy is 96.3%.
CN201410281284.6A 2014-06-20 2014-06-20 A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers Expired - Fee Related CN104064179B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410281284.6A CN104064179B (en) 2014-06-20 2014-06-20 A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410281284.6A CN104064179B (en) 2014-06-20 2014-06-20 A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers

Publications (2)

Publication Number Publication Date
CN104064179A CN104064179A (en) 2014-09-24
CN104064179B true CN104064179B (en) 2018-06-08

Family

ID=51551858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410281284.6A Expired - Fee Related CN104064179B (en) 2014-06-20 2014-06-20 A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers

Country Status (1)

Country Link
CN (1) CN104064179B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111595234B (en) * 2020-04-24 2021-08-24 国网湖北省电力有限公司电力科学研究院 Intelligent diagnosis device and method for yield of pole material of power transmission tower structure
CN115730590A (en) * 2022-11-30 2023-03-03 金蝶软件(中国)有限公司 Intention recognition method and related equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5920839A (en) * 1993-01-13 1999-07-06 Nec Corporation Word recognition with HMM speech, model, using feature vector prediction from current feature vector and state control vector values
CN101030369A (en) * 2007-03-30 2007-09-05 清华大学 Built-in speech discriminating method based on sub-word hidden Markov model
CN102254087A (en) * 2010-05-20 2011-11-23 索尼公司 Data processing device, data processing method and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5920839A (en) * 1993-01-13 1999-07-06 Nec Corporation Word recognition with HMM speech, model, using feature vector prediction from current feature vector and state control vector values
CN101030369A (en) * 2007-03-30 2007-09-05 清华大学 Built-in speech discriminating method based on sub-word hidden Markov model
CN102254087A (en) * 2010-05-20 2011-11-23 索尼公司 Data processing device, data processing method and program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种改进的隐马尔可夫模型在语音识别中的应用;胡磊等;《信息与控制》;20071231;第36卷(第6期);全文 *
孤立词语音识别算法优化的研究和实现;刘德;《中国优秀硕士学位论文全文数据库 信息科技辑》;20120315(第03期);摘要、第4章 *

Also Published As

Publication number Publication date
CN104064179A (en) 2014-09-24

Similar Documents

Publication Publication Date Title
CN104681036B (en) A kind of detecting system and method for language audio
CN104036774B (en) Tibetan dialect recognition methods and system
US9767788B2 (en) Method and apparatus for speech synthesis based on large corpus
CN102509547B (en) Method and system for voiceprint recognition based on vector quantization based
CN103117060A (en) Modeling approach and modeling system of acoustic model used in speech recognition
CN102238190A (en) Identity authentication method and system
CN104485103B (en) A kind of multi-environment model isolated word recognition method based on vector Taylor series
CN103280224B (en) Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm
Hsu et al. Extracting domain invariant features by unsupervised learning for robust automatic speech recognition
US10789962B2 (en) System and method to correct for packet loss using hidden markov models in ASR systems
CN106653056A (en) Fundamental frequency extraction model based on LSTM recurrent neural network and training method thereof
CN106531157A (en) Regularization accent adapting method for speech recognition
Lee et al. Xi-vector embedding for speaker recognition
CN110349588A (en) A kind of LSTM network method for recognizing sound-groove of word-based insertion
CN107452374B (en) Multi-view language identification method based on unidirectional self-labeling auxiliary information
CN111091809B (en) Regional accent recognition method and device based on depth feature fusion
CN104064179B (en) A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers
TW201133470A (en) Compressing feature space transforms
CN102930863A (en) Voice conversion and reconstruction method based on simplified self-adaptive interpolation weighting spectrum model
CN102436815A (en) Voice identifying device applied to on-line test system of spoken English
CN1157711C (en) Adaptation of a speech recognizer for dialectal and linguistic domain variations
JP6027754B2 (en) Adaptation device, speech recognition device, and program thereof
CN104064183B (en) A kind of method of the raising speech recognition accuracy based on dynamic HMM observation symbolic numbers
KR101727306B1 (en) Languange model clustering based speech recognition apparatus and method
Hwang et al. Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180608

Termination date: 20210620

CF01 Termination of patent right due to non-payment of annual fee