CN104064179B - A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers - Google Patents
A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers Download PDFInfo
- Publication number
- CN104064179B CN104064179B CN201410281284.6A CN201410281284A CN104064179B CN 104064179 B CN104064179 B CN 104064179B CN 201410281284 A CN201410281284 A CN 201410281284A CN 104064179 B CN104064179 B CN 104064179B
- Authority
- CN
- China
- Prior art keywords
- hmm
- word
- event
- identification
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Machine Translation (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of method for improving recognition accuracy for the identification of extensive alone word voice, and Hidden Markov Model is established for different isolated words(HMM)The mechanism of parameter adaptive variation solves the problems, such as that different isolated words recognition accuracy and identifies that robustness is low when event number is identical in HMM probabilistic models.The experimental results showed that method of the invention under the premise of identification calculation amount is slightly increased, is effectively improved the accuracy rate of extensive alone word voice identification.When isolated word to be identified is 5120 word, the average value of multiple recognition accuracy has been increased to 97.3% by 91%;When isolated word to be identified is 10240 word, the average value of multiple recognition accuracy has been increased to 96.3% by 87%.Compared to the speech recognition of traditional static models based on statistical probability, using inventive process have the advantage that parameter for the adaptive adjustment identification model of different user, so as to improve the accuracy rate of identification.
Description
Technical field
The present invention relates to alone word voices to identify field, and in particular to a kind of standard for improving extensive alone word voice identification
The method of true rate.
Background technology
Voice after obtaining cluster coding, merely judges some in characteristic parameter extraction by Euclidean distance at this time
It is very inaccurate when word to be identified belongs to the cluster of which of dictionary word.In voice rule be statistically
Probabilistic model, and Euclidean distance reaction is the distance of vector distance cluster centre vector, thus need to obtained parameter with
Code book does further training, establishes more accurate Statistical Probabilistic Models, so as to which preferably reflection characteristic parameter is in voice
In the embodiment of rule.Hidden Markov (HMM) model is that a kind of reaction event redirects probability, observation sample probability of occurrence very
Good mathematical model, therefore speech characteristic parameter is handled according to certain algorithm, obtain HMM probabilistic models.
Hidden Markov model is a kind of probabilistic model represented with parameter, for describing statistics of random processes characteristic, by
Markov chain develops, and is always a research hotspot of speech recognition, is obtained in the every field of speech processes extensive
Using.The foundation of the HMM probability templates of voice needs the cluster coding of the characteristic parameter vector of voice, and speech vector encodes, generally
Rate template training process carries out forward, backward probability calculation, until obtaining a convergent probabilistic model.
Acoustic model is typically to be generated after the speech characteristic parameter of acquisition is trained using specific probabilistic algorithm.
In speech recognition based on HMM, an acoustic model is exactly a HMM model, typically makes the speech characteristic parameter of acquisition
Generation HMM model set after algorithm is trained is redirected with HMM probability.Voice to be identified is consistent with HMM model by extracting
Characteristic parameter using backward Bayesian probability algorithm, calculates posterior probability, generates the HMM probability moulds of maximum posterior probability
Speech samples representated by plate are voice to be identified.
For voice data, mainly when frequency sampling and Spectrum Conversion, the voice for having slightly time-frequency characteristic difference all may be used
To establish the HMM model of husband.Secondly, model training is exactly that the parameter of HMM is adjusted using existing sample, is made it
Enough corresponding speech probability features of accurate description different phonetic.The process that model is established to voice is actually to voice doing mathematics
Modeling, and assume that corresponding voice spy identification probability has these mathematical models to be calculated, and there are one extreme values.To HMM
For, be mainly to determine the Basic Topological of model, including event number, event skipped mode and redirect probability etc..
Different words, the event number of corresponding HMM model is different, even if human ear thinks identical voice
(same word) due to the pronunciation of different people, tone, accent, also results in the difference of HMM parameters, i.e. its HMM thing included
Number of packages is different.With the increase of isolated word number, if using same HMM event numbers, it is clear that accuracy rate can decline.
Invention content
To solve problems of the prior art, the present invention proposes a kind of event for changing HMM model by dynamic
The method for counting to improve extensive alone word voice recognition accuracy is solved with the increase of identification isolated word quantity and is identified
The problem of accuracy rate declines.
The invention is realized by the following technical scheme:
A kind of method of the extensive alone word voice recognition accuracy of raising based on dynamic HMM event numbers, including following
Step:
A. the parameter of initial HMM model is provided, the parameter includes event number N and observation symbolic number M, the HMM moulds
Type is using from left to right without across model structure;Wherein, primary event number is 40, and observation number of symbols is 32, observation sequence
Number is that redirect probability matrix be 40 × 20 to 20, HMM events, and one can be obtained by the number and observation number of symbols of observation sequence
20 × 32 observation sequence probability matrix;Primary event probability vector is the row matrix of one 1 × 20;
B. it is trained using Baum-Welch algorithms according to initial HMM event numbers, observation sequence number and observation symbolic number
The HMM model arrived carries out alone word voice identification, observes the accuracy rate and robustness of identification;
C. HMM event number N values are dynamically changed, step-length 2 continues training and obtains new HMM model, and used in training
Voice in dictionary carries out alone word voice identification, after the completion for the treatment of that all words all identify, counts and changes every time obtained by HMM event numbers
The recognition accuracy and the probability variance of identification arrived;Repeat the step, find accuracy rate it is maximum and with probability variance minimum when institute
Corresponding HMM event numbers N;
D. by characteristic parameter extraction after the voice typing of user, with reference to the HMM model parameter that step C is obtained, before
After probability calculation and providing recognition result;Then, automatically by equivalent in the voice combination dictionary of the vocabulary of user's typing
The voice of remittance is trained, and changes the event number of HMM again, and the HMM model for particular person and its best thing is calculated
Number of packages M.
The beneficial effects of the invention are as follows:The present invention provides a kind of improve for the identification of extensive alone word voice and identifies accurately
The method of rate establishes the mechanism of Hidden Markov Model (HMM) parameter adaptive variation for different isolated words, solves
Different isolated words the problem of recognition accuracy is low with identification robustness when event number is identical in HMM probabilistic models.Experiment knot
Fruit shows that method of the invention under the premise of identification calculation amount is slightly increased, is effectively improved extensive alone word voice
The accuracy rate of identification.When isolated word to be identified is 5120 word, the average value of multiple recognition accuracy is increased to by 91%
97.3%;When isolated word to be identified is 10240 word, the average value of multiple recognition accuracy has been increased to 96.3% by 87%.Phase
Than in the speech recognition of traditional static models based on statistical probability, using inventive process have the advantage that for different use
The parameter of the adaptive adjustment identification model in family, so as to improve the accuracy rate of identification.
Description of the drawings
Fig. 1 is the flow chart of the method for the raising speech recognition accuracy based on dynamic HMM event numbers of the present invention.
Specific embodiment
The present invention is further described for explanation and specific embodiment below in conjunction with the accompanying drawings.
The hidden Markov HMM model probability parameter that the present invention uses is as follows:
(1) N, the event number in HMM model.Event number is implicit in HMM model, in statement afterwards, marks mould
Each event in type is { S1,S2,...,SN, it is q in the event residing for t momentt。
(2) M, it is observed that the number of the element in sequence, that is, observe symbolic number under each event in HMM model.Mark
Remember that each observation symbol is V={ v1,v2,L,vM, observation sequence is O={ o1,o2,L,oT, wherein otFor one kind in set V
Symbol is observed, T is observation sequence length.
(3) event transfering probability distribution A=[aij], wherein
aij=p [qt+1=Sj|qt=Si] 1≤i≤N,1≤j≤N。
(4) observation sequence probability distribution B=[bj(k)], wherein
bj(k)=p [ot=vk|qt=Sj] 1≤k≤M,1≤j≤N。
(5) primary event probability distribution π=[πi], wherein
πi=P [q1=Si] 1≤i≤N。
The number of correct word divided by all isolated words to be identified, obtained percentage knot are identified in isolated word to be identified
The accuracy rate that fruit identifies for expression.There are one recognition accuracy after each isolated word recognition, when HMM parameters change
When, which can also change, and the variance of recognition accuracy is square for representing the robustness of identification under different HMM parameters
Difference is smaller, and robustness is better.
HMM model is calculated by the coding that speech characteristic parameter obtains after cluster calculation according to Baum-Welch algorithms
During parameter, primary event probability distribution is inessential, if meet probability and for 1, only can be to the iteration in calculating process
Number has minimal effect.Therefore the primary event probability distribution π of the present inventioni=1/N.
It is specific calculate realize during, algorithm that the present invention uses for the probability calculation of Bayes's forward, backward with
Baum-welch algorithms, attached drawing 1 are the flow charts of the realization of the method for the present invention, and details are as follows:
1. determining the parameter of initial HMM model first, and model is trained, after speech recognition process, obtained
One initial HMM model, the probabilistic model are not optimal to different vocabulary.The structure of model include event number N and
Each event is corresponding to observe symbolic number M.Alone word voice is identified, can suitable HMM events be chosen according to voice length
Number, experiment show that too big event number can cause recognition accuracy to decline.To Discrete HMM, observation symbolic number is in principle by sample
Space determines, but is limited by calculation amount, generally desirable 16~64, through experiment, in addition to certain words, the M of most of word 24~
The accuracy rate identified between 50 does not have too great fluctuation process.
The present invention use the primary event number of HMM, and it is 32 to observe number of symbols, the vector clusters of character pair parameter for 40
Number, i.e. M=32, at the same determine observation sequence number be 20.Therefore, it can be obtained by event number and observation symbol numbers
The event of one 40 × 20 redirects probability matrix, and one 20 × 32 can be obtained by the number and observation number of symbols of observation sequence
Observation sequence probability matrix;Primary event probability vector is the row matrix of one 1 × 20.
2. according to initial HMM event numbers and observation sequence, carried out using the HMM model that Baum-Welch algorithms are trained
Alone word voice identifies, observes the accuracy rate and robustness of identification.In training process, change the event number of HMM model, each N
The change step-length of value is 2.Continue training and obtain new HMM model, and the voice in the dictionary used in training carries out alone word
Sound identifies.After the completion for the treatment of that all words all identify, count and change the obtained recognition accuracy of HMM event numbers every time and identify general
Rate variance.The step is repeated, when finding accuracy rate maximum and HMM event numbers that when probability variance minimum is corresponding.
Present invention isolated word to be identified is 5120 words, and the maximum value of multiple recognition accuracy is 97.3%, to be identified isolated
When word is 10240 word, the maximum value of multiple recognition accuracy is 96.3%;At this point, the event number of HMM corresponding to each word is just
It is optimal, if continuing the event number of variation HMM, no matter increase or reduction, accuracy rate can all be less than the maximum value;It is and same
When, the variance of each word identification probability is also minimum.
3. in practical application, user inputs voice vocabulary and realizes adaptive learning, the voice of one vocabulary of user's typing, warp
After crossing parameter extraction, with reference to the HMM of each vocabulary in dictionary, through preceding after probability calculation, all probability are obtained, by row
Sequence finds out maximum probability value, then the vocabulary of user's typing at this time is exactly in that dictionary corresponding to most probable value
Vocabulary.Later, system obtains the new vocabulary using this vocabulary in the vocabulary combination dictionary of user's typing, re -training
HMM model, i.e., new observation event number.
So far, adaptively learn to adjust model parameter the method for the invention realizes speech recognition algorithm.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, it is impossible to assert
The specific implementation of the present invention is confined to these explanations.For those of ordinary skill in the art to which the present invention belongs, exist
Under the premise of not departing from present inventive concept, several simple deduction or replace can also be made, should all be considered as belonging to the present invention's
Protection domain.
Claims (4)
- A kind of 1. method of the raising speech recognition accuracy based on dynamic HMM event numbers, which is characterized in that the method includes Following steps:A. the parameter of initial HMM model is provided, the parameter includes event number N and observation symbolic number M, and the HMM model is adopted With from left to right without across model structure;Wherein, primary event number is 40, and observation number of symbols is 32, and observation sequence number is 20, HMM events redirect probability matrix as 40 × 20, by the number and observation number of symbols of observation sequence can obtain one 20 × 32 observation sequence probability matrix;Primary event probability vector is the row matrix of one 1 × 20B. it according to initial HMM event numbers, observation sequence number and observation symbolic number, is trained using Baum-Welch algorithms HMM model carries out alone word voice identification, observes the accuracy rate and robustness of identification;C. for each vocabulary in dictionary used in training, HMM event number N values are dynamically changed, step-length 2 continues to train New HMM model is obtained, and the voice in the dictionary used in training carries out alone word voice identification, treats that all words all identify completion Afterwards, statistics changes the probability variance of the obtained recognition accuracy of HMM event numbers and identification every time;The step is repeated, finds standard Corresponding HMM event numbers N when true rate maximum and probability variance minimum;D. by characteristic parameter extraction after the voice typing of user, with reference to the HMM model parameter that step C is obtained, by preceding to general After rate calculates and provides recognition result;Then, vocabulary will be corresponded in the voice combination dictionary of the vocabulary of user's typing automatically Voice is trained, and changes the event number of HMM again, and the HMM model for particular person and its best event number is calculated M, for the parameter of the adaptive adjustment identification model of different user, so as to improve the accuracy rate of identification.
- 2. according to the method described in claim 1, it is characterized in that:Observation symbolic number M in the step A is between 24~50 Value.
- 3. according to the method described in claim 1, it is characterized in that:With identifying that the number of correct word is removed in isolated word to be identified With all isolated words to be identified, the obtained result accuracy rate in per cent.
- 4. according to the method described in claim 1, it is characterized in that:In the step C, accuracy rate maximum and probability variance are found Corresponding HMM event number N when minimum, specially:When isolated word to be identified be 5120 words, the maximum value of multiple recognition accuracy It is 97.3%, when isolated word to be identified is 10240 word, the maximum value of multiple recognition accuracy is 96.3%.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410281284.6A CN104064179B (en) | 2014-06-20 | 2014-06-20 | A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410281284.6A CN104064179B (en) | 2014-06-20 | 2014-06-20 | A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104064179A CN104064179A (en) | 2014-09-24 |
CN104064179B true CN104064179B (en) | 2018-06-08 |
Family
ID=51551858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410281284.6A Expired - Fee Related CN104064179B (en) | 2014-06-20 | 2014-06-20 | A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104064179B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111595234B (en) * | 2020-04-24 | 2021-08-24 | 国网湖北省电力有限公司电力科学研究院 | Intelligent diagnosis device and method for yield of pole material of power transmission tower structure |
CN115730590A (en) * | 2022-11-30 | 2023-03-03 | 金蝶软件(中国)有限公司 | Intention recognition method and related equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5920839A (en) * | 1993-01-13 | 1999-07-06 | Nec Corporation | Word recognition with HMM speech, model, using feature vector prediction from current feature vector and state control vector values |
CN101030369A (en) * | 2007-03-30 | 2007-09-05 | 清华大学 | Built-in speech discriminating method based on sub-word hidden Markov model |
CN102254087A (en) * | 2010-05-20 | 2011-11-23 | 索尼公司 | Data processing device, data processing method and program |
-
2014
- 2014-06-20 CN CN201410281284.6A patent/CN104064179B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5920839A (en) * | 1993-01-13 | 1999-07-06 | Nec Corporation | Word recognition with HMM speech, model, using feature vector prediction from current feature vector and state control vector values |
CN101030369A (en) * | 2007-03-30 | 2007-09-05 | 清华大学 | Built-in speech discriminating method based on sub-word hidden Markov model |
CN102254087A (en) * | 2010-05-20 | 2011-11-23 | 索尼公司 | Data processing device, data processing method and program |
Non-Patent Citations (2)
Title |
---|
一种改进的隐马尔可夫模型在语音识别中的应用;胡磊等;《信息与控制》;20071231;第36卷(第6期);全文 * |
孤立词语音识别算法优化的研究和实现;刘德;《中国优秀硕士学位论文全文数据库 信息科技辑》;20120315(第03期);摘要、第4章 * |
Also Published As
Publication number | Publication date |
---|---|
CN104064179A (en) | 2014-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104681036B (en) | A kind of detecting system and method for language audio | |
CN104036774B (en) | Tibetan dialect recognition methods and system | |
US9767788B2 (en) | Method and apparatus for speech synthesis based on large corpus | |
CN102509547B (en) | Method and system for voiceprint recognition based on vector quantization based | |
CN103117060A (en) | Modeling approach and modeling system of acoustic model used in speech recognition | |
CN102238190A (en) | Identity authentication method and system | |
CN104485103B (en) | A kind of multi-environment model isolated word recognition method based on vector Taylor series | |
CN103280224B (en) | Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm | |
Hsu et al. | Extracting domain invariant features by unsupervised learning for robust automatic speech recognition | |
US10789962B2 (en) | System and method to correct for packet loss using hidden markov models in ASR systems | |
CN106653056A (en) | Fundamental frequency extraction model based on LSTM recurrent neural network and training method thereof | |
CN106531157A (en) | Regularization accent adapting method for speech recognition | |
Lee et al. | Xi-vector embedding for speaker recognition | |
CN110349588A (en) | A kind of LSTM network method for recognizing sound-groove of word-based insertion | |
CN107452374B (en) | Multi-view language identification method based on unidirectional self-labeling auxiliary information | |
CN111091809B (en) | Regional accent recognition method and device based on depth feature fusion | |
CN104064179B (en) | A kind of method of the raising speech recognition accuracy based on dynamic HMM event numbers | |
TW201133470A (en) | Compressing feature space transforms | |
CN102930863A (en) | Voice conversion and reconstruction method based on simplified self-adaptive interpolation weighting spectrum model | |
CN102436815A (en) | Voice identifying device applied to on-line test system of spoken English | |
CN1157711C (en) | Adaptation of a speech recognizer for dialectal and linguistic domain variations | |
JP6027754B2 (en) | Adaptation device, speech recognition device, and program thereof | |
CN104064183B (en) | A kind of method of the raising speech recognition accuracy based on dynamic HMM observation symbolic numbers | |
KR101727306B1 (en) | Languange model clustering based speech recognition apparatus and method | |
Hwang et al. | Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180608 Termination date: 20210620 |
|
CF01 | Termination of patent right due to non-payment of annual fee |