CN104064183B - A kind of method of the raising speech recognition accuracy based on dynamic HMM observation symbolic numbers - Google Patents

A kind of method of the raising speech recognition accuracy based on dynamic HMM observation symbolic numbers Download PDF

Info

Publication number
CN104064183B
CN104064183B CN201410279788.4A CN201410279788A CN104064183B CN 104064183 B CN104064183 B CN 104064183B CN 201410279788 A CN201410279788 A CN 201410279788A CN 104064183 B CN104064183 B CN 104064183B
Authority
CN
China
Prior art keywords
hmm
observation
event
symbolic
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410279788.4A
Other languages
Chinese (zh)
Other versions
CN104064183A (en
Inventor
刘明
王明江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN201410279788.4A priority Critical patent/CN104064183B/en
Publication of CN104064183A publication Critical patent/CN104064183A/en
Application granted granted Critical
Publication of CN104064183B publication Critical patent/CN104064183B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The present invention provides a kind of method for improving recognition accuracy for the identification of extensive alone word voice, the method of the present invention dynamically establishes HMM observation symbolic number for different isolated words, solves the problems, such as that different isolated words is identical and recognition accuracy is low because observing symbolic number.Test result indicates that method of the invention on the premise of slightly increase identification amount of calculation, is effectively improved the accuracy rate of extensive alone word voice identification.The method of the present invention can dynamically adjust the parameter of identification model, compared to the speech recognition of traditional static models based on statistical probability, using inventive process have the advantage that parameter for the adaptive adjustment identification model of different user, so as to improve the accuracy rate of identification.Isolated word to be identified is 10240 words, test result indicates that, the average value of total discrimination has been brought up to 99.2% by method of the invention by 96.3%.

Description

A kind of method of the raising speech recognition accuracy based on dynamic HMM observation symbolic numbers
Technical field
The present invention relates to alone word voice to identify field, and in particular to a kind of standard for improving extensive alone word voice identification The method of true rate.
Background technology
Hidden Markov (HMM) model is that a kind of reflection event redirects probability, observes the extraordinary of sample probability of occurrence Mathematical modeling, therefore speech characteristic parameter is handled according to certain algorithm, obtain HMM probabilistic models.HMM model is by horse Markov's chain is developed, and extensive use is obtained in the every field of speech processes.The foundation of the HMM probability templates of voice, need The cluster coding of the characteristic parameter vector of voice, speech vector coding are wanted, probability template training process carries out forward, backward probability Calculate, until obtaining a convergent probabilistic model.
Acoustic model is typically to be produced after the speech characteristic parameter of acquisition is trained using specific probabilistic algorithm. In speech recognition based on HMM, an acoustic model is exactly a HMM model, typically makes the speech characteristic parameter of acquisition Generation HMM model set after algorithm is trained is redirected with HMM probability.Voice to be identified is consistent with HMM model by extracting Characteristic parameter, using backward Bayesian probability algorithm, posterior probability is calculated, produce the HMM probability moulds of the posterior probability of maximum Speech samples representated by plate are voice to be identified.
For speech data, mainly when frequency sampling and Spectrum Conversion, the voice for having slightly time-frequency characteristic difference all may be used To establish corresponding HMM model.Secondly, model training is exactly that HMM parameter is adjusted using existing sample, is made it Speech probability feature corresponding to enough accurate description different phonetics.The process for establishing model is actually that voice doing mathematicses are modeled, And assume that corresponding voice spy identification probability is calculated by these mathematical modelings, and have an extreme value.For HMM, The main Basic Topological for being to determine model, including the skipped mode of event number, event and to redirect probability, observation sequence general Rate etc..
The HMM observation numbers of symbols of each Chinese isolated word differ.The quantity of isolated word is bigger, in mathematical modeling The state of probability statistics is more, corresponds in HMM parameters, i.e., its corresponding HMM number of symbols of different vocabulary is more.With orphan The increase of vertical word number, if representing the inner link of speech frame vector using same HMM symbolic numbers, it is clear that identification can be made Accuracy rate declines.
The content of the invention
To solve problems of the prior art, the present invention proposes a kind of observation for changing HMM model by dynamic Number of symbols improves the method for extensive alone word voice recognition accuracy, solves with the increase of identification isolated word quantity And the problem of recognition accuracy decline.
The present invention is achieved through the following technical solutions:
A kind of method of the extensive alone word voice recognition accuracy of raising based on dynamic HMM observation symbolic numbers, including Following steps:
A. the event number and observation symbolic number of initial HMM model are provided, and model is trained, through speech recognition Cheng Hou, obtain an initial HMM model;Wherein, primary event number is 40, and observation number of symbols is 32, observation sequence number It is 40 × 20 to redirect probability matrix for 20, HMM events, and one 20 can be obtained by the number and observation number of symbols of observation sequence × 32 observation sequence probability matrix;Primary event probability vector is the row matrix of one 1 × 20;
B. initial event number N takes 40, dynamically changes N values, step-length 2, changes HMM event numbers in training process to see Examine the recognition accuracy of each vocabulary and recorded, each vocabulary recognition accuracy percentage highest HMM event number N conducts The optimal HMM event numbers of the vocabulary;
C. for each vocabulary in dictionary used in training, the event number N of the HMM in fixing step B;It is dynamic to change HMM observes symbolic number M, step-length 2, continues training and obtains new HMM model, and the voice in the dictionary used in training carries out orphan Vertical word speech recognition, after the completion for the treatment of that all words all identify, the identification obtained by each change HMM observations symbolic number of statistics is accurate Rate;The step is repeated, finds HMM observation symbolic numbers M corresponding during accuracy rate maximum;
D. by characteristic parameter extraction, the HMM model parameter obtained with reference to step C, before after the voice typing of user To probability calculation and provide recognition result;Then, vocabulary will be corresponded in the voice combination dictionary of the vocabulary of user's typing automatically Voice be trained, again change HMM observation symbolic number M, and be calculated for particular person optimal HMM observe symbol Number.
The beneficial effects of the invention are as follows:Isolated according to the HMM model that initial HMM event numbers and symbolic number train to obtain Word speech recognition, observe the accuracy rate of identification;In the case of fixed HMM event numbers, change HMM observation symbolic numbers, continue to instruct Get new HMM model and carry out alone word voice identification, repeat said process, when contrasting every time different observation symbolic number Discrimination, the observation symbolic number corresponding to maximum is both optimal;User is passed through by inputting non-standard voice vocabulary, algorithm Study, this parameter of adaptively changing HMM observation symbolic number come make the accuracy rate of identification maximum.Compared to traditional based on system The speech recognition of the static models of probability is counted, using inventive process have the advantage that knowing for the adaptive adjustment of different user The parameter of other model, so as to improve the accuracy rate of identification.Isolated word to be identified is 10240 words, test result indicates that, it is of the invention The average value of total discrimination has been brought up to 99.2% by method by 96.3%.
Brief description of the drawings
Fig. 1 is the flow chart of the method for the raising speech recognition accuracy based on dynamic HMM observation symbolic numbers of the present invention.
Embodiment
The present invention is further described for explanation and embodiment below in conjunction with the accompanying drawings.
The hidden Markov HMM model probability parameter that the present invention uses is as follows:
(1) N, the event number in HMM model.Event number is implicit in HMM model, in statement afterwards, marks mould Each event in type is { S1,S2,...,SN, it is q in the event residing for tt
(2) M, it is observed that the number of the element in sequence, that is, observe symbolic number under each event in HMM model.Mark Remember that each observation symbol is V={ v1,v2,L,vM, observation sequence is O={ o1,o2,L,oT, wherein otFor one kind in set V Symbol is observed, T is observation sequence length.
(3) event transfering probability distribution A=[aij], wherein
aij=p [qt+1=Sj|qt=Si] 1≤i≤N,1≤j≤.。
(4) observation sequence probability distribution B=[bj(k)], wherein
bj(k)=p [ot=vk|qt=Sj]1≤k≤M,1≤j≤.。
(5) primary event probability distribution π=[πi], wherein
πi=P [q1=Si] 1≤i≤N。
The number of correct word divided by all isolated words to be identified, obtained percentage knot are identified in isolated word to be identified Fruit is used for the accuracy rate for representing identification.
The coding that is obtained according to Baum-Welch algorithms by speech characteristic parameter after cluster calculation calculates HMM model During parameter, primary event probability distribution is inessential, if meet probability and for 1, only can be to the iteration in calculating process Number has minimal effect.Therefore the primary event probability distribution π of the present inventioni=1/N.
In specifically implementation process is calculated, the algorithm that uses of the present invention for the probability calculation of Bayes's forward, backward with Baum-welch algorithms, accompanying drawing 1 are the flow charts of the realization of the method for the present invention, and details are as follows:
1. the event number and observation symbolic number of initial HMM model are provided first;Then, start to be trained model. After speech recognition process, an initial HMM model is obtained, now, the probabilistic model is not optimal to different vocabulary. The parameter of model includes event number N, observation symbolic number M corresponding to each event.To Discrete HMM, observation symbolic number in principle by Sample space determines, but is limited by amount of calculation, typically desirable 16~64, through experiment, except some words, the M of most of word exists The accuracy rate identified between 24~50 does not have too great fluctuation process.
The present invention uses HMM primary event number as 40, and observation number of symbols is 32, and observation sequence number is 20, HMM things Part redirects probability matrix as 40 × 20 dimensions, and the sight of one 20 × 32 can be obtained by the number and observation number of symbols of observation sequence Examine sequence probability matrix;Primary event probability vector is the row matrix of one 1 × 20.
2. obtaining the accuracy rate of voice identification result, with the number that correct word is identified in isolated word to be identified divided by own Isolated word to be identified, obtained result identifies the accuracy rate with percentage.Initial event number N takes 40, and dynamic changes Middle step-length is 2.Change HMM event numbers in training process to observe the recognition accuracy of each vocabulary and be recorded.Each word Remittance identifies optimal HMM event number of the quasi- curvature percentage highest HMM event numbers as the vocabulary.
3. for each vocabulary, the event number of the HMM model in fixing step 2 is dynamic to change HMM observation symbolic number M, The change step-length of M value is 2.Continue training and obtain new HMM model, and the voice in the dictionary used in training carries out isolated word Speech recognition.After the completion for the treatment of that all words all identify, the recognition accuracy changed every time obtained by HMM observation symbolic numbers is counted.Weight The multiple step, find HMM observation symbolic numbers corresponding during accuracy rate maximum.
Experiment shows, after the optimal HMM event numbers of each vocabulary are secured, observes symbolic number by changing HMM, treats When identification isolated word is 10240 word, the maximum of multiple recognition accuracy is 99.2%;Now, HMM corresponding to each word It is exactly optimal to observe symbolic number, if continuing to change HMM observation symbolic number, no matter increase or reduction, accuracy rate all can be low In 99.2%.
4. user inputs voice vocabulary and realizes adaptive learning, after the voice typing of user, after parameter extraction, with reference to The HMM model parameter of each vocabulary in the dictionary that step 3 obtains, all probability are obtained to after probability calculation through preceding, are passed through Sequence, find out the probable value of maximum, then the vocabulary of now user's typing is exactly in that dictionary corresponding to most probable value Vocabulary.Afterwards, system uses the voice of this vocabulary in the voice combination dictionary of the vocabulary of user's typing, re -training Obtain the new HMM model of the vocabulary, i.e., new observation symbolic number.
So far, adaptively learn to adjust model parameter The inventive method achieves speech recognition algorithm.
Above content is to combine specific preferred embodiment further description made for the present invention, it is impossible to is assert The specific implementation of the present invention is confined to these explanations.For general technical staff of the technical field of the invention, On the premise of not departing from present inventive concept, some simple deduction or replace can also be made, should all be considered as belonging to the present invention's Protection domain.

Claims (3)

  1. A kind of 1. method of the raising speech recognition accuracy based on dynamic HMM observation symbolic numbers, it is characterised in that methods described Comprise the following steps:
    A. the parameter of initial HMM model is provided, the parameter includes event number and observation symbolic number, and model is instructed Practice, after speech recognition process, obtain an initial HMM model;Wherein, primary event number is 40, and observation number of symbols is 32, observation sequence number is that to redirect probability matrix be 40 × 20 to 20, HMM events, by the number and observation symbolic number of observation sequence Mesh can obtain the observation sequence probability matrix of one 20 × 32;Primary event probability vector is the row matrix of one 1 × 20;
    B. initial event number N takes 40, dynamically changes N values, step-length 2, and it is every to observe to change HMM event numbers in training process The recognition accuracy of individual vocabulary is simultaneously recorded, and each vocabulary identifies quasi- curvature percentage highest HMM event numbers N as the word The optimal HMM event numbers converged;
    C. for each vocabulary in dictionary used in training, the event number N of the HMM in fixing step B;It is dynamic to change HMM Symbolic number M is observed, step-length 2, continues training and obtains new HMM model, and the voice in the dictionary used in training is isolated Word speech recognition, after the completion for the treatment of that all words all identify, count the recognition accuracy changed every time obtained by HMM observation symbolic numbers; The step is repeated, finds HMM observation symbolic numbers M corresponding during accuracy rate maximum;
    D. by characteristic parameter extraction after the voice typing of user, the HMM model parameter obtained with reference to step C, by preceding to general After rate calculates and provides recognition result;Then, automatically by the voice combination dictionary of the non-standard voice vocabulary of user's typing The voice of corresponding vocabulary is trained, and changes HMM observation symbolic number M again, and the optimal HMM for particular person is calculated Observe symbolic number.
  2. 2. according to the method for claim 1, it is characterised in that:With identifying that the number of correct word is removed in isolated word to be identified With all isolated words to be identified, the obtained result accuracy rate in per cent.
  3. 3. according to the method for claim 1, it is characterised in that:In the step C, find corresponding during accuracy rate maximum HMM observes symbolic number M, is specially:When isolated word to be identified is 10240 word, the maximum of multiple recognition accuracy is 99.2%.
CN201410279788.4A 2014-06-20 2014-06-20 A kind of method of the raising speech recognition accuracy based on dynamic HMM observation symbolic numbers Expired - Fee Related CN104064183B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410279788.4A CN104064183B (en) 2014-06-20 2014-06-20 A kind of method of the raising speech recognition accuracy based on dynamic HMM observation symbolic numbers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410279788.4A CN104064183B (en) 2014-06-20 2014-06-20 A kind of method of the raising speech recognition accuracy based on dynamic HMM observation symbolic numbers

Publications (2)

Publication Number Publication Date
CN104064183A CN104064183A (en) 2014-09-24
CN104064183B true CN104064183B (en) 2017-12-08

Family

ID=51551862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410279788.4A Expired - Fee Related CN104064183B (en) 2014-06-20 2014-06-20 A kind of method of the raising speech recognition accuracy based on dynamic HMM observation symbolic numbers

Country Status (1)

Country Link
CN (1) CN104064183B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106384587B (en) * 2015-07-24 2019-11-15 科大讯飞股份有限公司 A kind of audio recognition method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5920839A (en) * 1993-01-13 1999-07-06 Nec Corporation Word recognition with HMM speech, model, using feature vector prediction from current feature vector and state control vector values
CN101030369A (en) * 2007-03-30 2007-09-05 清华大学 Built-in speech discriminating method based on sub-word hidden Markov model
CN102254087A (en) * 2010-05-20 2011-11-23 索尼公司 Data processing device, data processing method and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5920839A (en) * 1993-01-13 1999-07-06 Nec Corporation Word recognition with HMM speech, model, using feature vector prediction from current feature vector and state control vector values
CN101030369A (en) * 2007-03-30 2007-09-05 清华大学 Built-in speech discriminating method based on sub-word hidden Markov model
CN102254087A (en) * 2010-05-20 2011-11-23 索尼公司 Data processing device, data processing method and program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种改进的隐马尔可夫模型在语音识别中的应用;胡磊等;《信息与控制》;20071231;第36卷(第6期);全文 *
孤立词语音识别算法优化的研究和实现;刘德;《中国优秀硕士学位论文全文数据库 信息科技辑》;20120315(第03期);摘要、第4章 *

Also Published As

Publication number Publication date
CN104064183A (en) 2014-09-24

Similar Documents

Publication Publication Date Title
CN107492382B (en) Voiceprint information extraction method and device based on neural network
CN104036774B (en) Tibetan dialect recognition methods and system
CN103117060A (en) Modeling approach and modeling system of acoustic model used in speech recognition
CN109493874A (en) A kind of live pig cough sound recognition methods based on convolutional neural networks
CN110263322A (en) Audio for speech recognition corpus screening technique, device and computer equipment
CN102238190A (en) Identity authentication method and system
CN106340297A (en) Speech recognition method and system based on cloud computing and confidence calculation
CN105261246B (en) A kind of Oral English Practice error correction system based on big data digging technology
CN106919897A (en) A kind of facial image age estimation method based on three-level residual error network
CN104485103B (en) A kind of multi-environment model isolated word recognition method based on vector Taylor series
CN103280224A (en) Voice conversion method under asymmetric corpus condition on basis of adaptive algorithm
CN107146615A (en) Audio recognition method and system based on the secondary identification of Matching Model
CN102201236A (en) Speaker recognition method combining Gaussian mixture model and quantum neural network
CN108090038A (en) Text punctuate method and system
CN104347071B (en) Method and system for generating reference answers of spoken language test
CN106782603A (en) Intelligent sound evaluating method and system
CN110349588A (en) A kind of LSTM network method for recognizing sound-groove of word-based insertion
CN111128211B (en) Voice separation method and device
CN105139856B (en) Probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge
CN107818795A (en) The assessment method and device of a kind of Oral English Practice
CN108109615A (en) A kind of construction and application method of the Mongol acoustic model based on DNN
CN112307130B (en) Document-level remote supervision relation extraction method and system
WO2015134579A1 (en) System and method to correct for packet loss in asr systems
CN105280181A (en) Training method for language recognition model and language recognition method
CN111091809B (en) Regional accent recognition method and device based on depth feature fusion

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171208

Termination date: 20210620

CF01 Termination of patent right due to non-payment of annual fee