CN102013253A - Speech recognition method based on speed difference of voice unit and system thereof - Google Patents

Speech recognition method based on speed difference of voice unit and system thereof Download PDF

Info

Publication number
CN102013253A
CN102013253A CN2009101728759A CN200910172875A CN102013253A CN 102013253 A CN102013253 A CN 102013253A CN 2009101728759 A CN2009101728759 A CN 2009101728759A CN 200910172875 A CN200910172875 A CN 200910172875A CN 102013253 A CN102013253 A CN 102013253A
Authority
CN
China
Prior art keywords
recognition result
voice
voice unit
word speed
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2009101728759A
Other languages
Chinese (zh)
Other versions
CN102013253B (en
Inventor
赵蕤
鄢翔
何磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Priority to CN2009101728759A priority Critical patent/CN102013253B/en
Publication of CN102013253A publication Critical patent/CN102013253A/en
Application granted granted Critical
Publication of CN102013253B publication Critical patent/CN102013253B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The present invention relates to a speech recognition method based on the speed difference of the voice unit, comprising: preprocessing an input voice; extracting acoustics characteristics of the voice; according to the acoustics model trained in advance and the extracted acoustics characteristics, decoding the voice to obtain a plurality of candidate recognition results, wherein, each of the candidate recognition results possesses an acoustics score and a section length of the voice units contained by the voice; based on the section length of the voice units contained by the voice, calculating the speed difference of the voice unit for each of the candidate recognition results; based on the speed difference of the voice unit and the acoustics score, calculating a comprehensive score for the candidate recognition result; and selecting the candidate recognition result with the highest comprehensive score from the plurality of candidate recognition results as the final recognition result of the voice. In addition, the present invention also provides a corresponding speech recognition system.

Description

Audio recognition method and speech recognition system based on the difference of voice unit word speed
Technical field
The present invention relates to speech recognition technology, particularly, relate to the method and the corresponding speech recognition system of carrying out speech recognition according to the difference of voice unit word speed.
Background technology
Usually, speech recognition process can comprise the extraction of pre-service, acoustic feature of voice signal and search decoding etc.When carrying out speech recognition, at first the voice signal to input carries out pre-service, and it comprises that pre-filtering, sampling and quantification, windowing divide frame, end-point detection, pre-emphasis etc.Then, pretreated voice signal is carried out feature extraction, to obtain acoustic features such as linear predictor coefficient LPC, cepstrum coefficient CEP, Mel cepstrum coefficient MFCC and perception linear prediction PLP.According to the acoustic model of acoustic feature that is obtained and training in advance, use and voice signal is decoded, to obtain corresponding recognition result such as the search strategy of Viterbi algorithm.
In the process of speech recognition, segment length's information is owing to the influence that is not subjected to noise or channel, and is therefore extremely important for the robustness of speech recognition.Carry out in the method for speech recognition in the existing segment length's of utilization information, commonly voice unit (for example state, phoneme, speech etc.) segment length is carried out explicit modeling with stochastic distribution (for example normal distribution, γ distribution, gauss hybrid models GMM etc.), then segment length's score is carried out together the decoding of voice in conjunction with the acoustics score.Such method can improve the performance of speech recognition to a certain extent.
For example, the article of being shown at David Burshtern " Robust Parametric Modeling of Durations in Hidden Markov Models " (is published in International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1995) described in detail in and used γ to distribute the scheme of state modeling.(be published in International Conference on Acoustics at the article " Phone Duration Modeling for LVCSR " that D.Povey showed, Speech and Signal Processing (ICASSP), 2004) described the scheme of Discrete Distribution of using in detail to the phoneme modeling.
Yet itself is subjected to the influence of word speed easily segment length's information, therefore, will can further improve the performance of speech recognition in the word speed information adding segment length model.Yet, how in speech recognition, to consider segment length's information and word speed information simultaneously and do not increase the time and memory consumption becomes the emphasis of research.
The existing basic thought that word speed information is added the method for segment length's model is to remove the negative effect of word speed to segment length's model.
A kind of method commonly used is with word speed the segment length to be carried out normalized, the average segment length of all voice units in wherein word speed is defined as in short.Yet,, therefore, can't in identifying, carry out segment length's normalization in real time because word speed only could be calculated the whole word of acquisition.About this word speed of utilizing to the normalized method of segment length, in the article " Modeling Word Duration for Better Speech Recognition " (Proc.Of Speech Transcription Workshop, 2000) that Gadde and V.R.R. showed, be described in detail.
Another kind method is the segment length's modeling respectively to different word speeds, for example, to high word speed, middle word speed and slow each self-built model of word speed, in identifying, selects the highest model of score then.Yet the degree of accuracy of these models is not high, and owing to need calculate the probability of three kinds of models respectively, therefore, will significantly increase calculated amount and computing time.About this method to the modeling of different word speeds difference, at Yun Tang, the article that Wenju Liu and Bo Xu are shown " Trigram Duration Modeling in Speech Recognition " (is published in International Symposium on Chinese Spoken Language Processing, 2004) and in the article " Duration Modeling for Mandarin Speech Recognition Using Prosodic Information, Speech Prosody " (being published in 2004) shown of Wern-Jun Wang and Chun-Jen Lee all be described in detail.
Another kind of segment length's method for normalizing is to utilize previous voice unit segment length to the normalization of current speech elementary section progress row, yet, in the method, need calculate and store the normalization segment length model of all possible two context voice units in advance, therefore, memory consumption is bigger.This method is at U.S. Pat patent:Masahide Arui, Shinichi Tanaka, Takashi Masuko is described in detail in " Apparatus, Method and Computer Program Product for Speech Recognition ".
Summary of the invention
The present invention just is being based on above technical matters and is proposing, its purpose is to provide a kind of audio recognition method and speech recognition system of the difference based on the voice unit word speed, it has considered the influence of word speed for the segment length, can improve speech recognition performance, but need not the segment length is carried out modeling, and memory consumption and computing time are all very little.
According to an aspect of the present invention, provide a kind of audio recognition method of the difference based on the voice unit word speed, comprising: the voice of being imported are carried out pre-service; Extract the acoustic feature of described voice; Based on the acoustic model of training in advance and the acoustic feature of the described voice that extracted, described voice are decoded, obtaining a plurality of recognition result candidates of described voice, each of wherein said a plurality of recognition result candidates has the acoustics score and the segment length of the voice unit that comprised; For described a plurality of recognition result candidates each,, calculate this recognition result candidate's voice unit word speed difference value based on the segment length of the voice unit that is comprised; Based on the voice unit word speed difference value harmony that the is calculated branch that learns, calculate this recognition result candidate's integrate score; And from described a plurality of recognition result candidates, select the highest recognition result candidate of described integrate score, as the final recognition result of described voice.
According to another aspect of the present invention, provide a kind of speech recognition system of the difference based on the voice unit word speed, comprising: speech processing module is used for the voice of being imported are carried out pre-service; Characteristic extracting module is used to extract the acoustic feature of described voice; Decoder module, be used for based on the acoustic model of training in advance and the acoustic feature of the described voice that extracted, described voice are decoded, obtaining a plurality of recognition result candidates of described voice, each of wherein said a plurality of recognition result candidates has the acoustics score and the segment length of the voice unit that comprised; Voice unit word speed difference value computing module is used for each for described a plurality of recognition result candidates, based on the segment length of the voice unit that is comprised, calculates this recognition result candidate's voice unit word speed difference value; The integrate score computing module is used for each for described a plurality of recognition result candidates, based on the voice unit word speed difference value harmony that the is calculated branch that learns, calculates this recognition result candidate's integrate score; And the selection module, be used for selecting the highest recognition result candidate of described integrate score, as the final recognition result of described voice from described a plurality of recognition result candidates.
Description of drawings
Fig. 1 is the process flow diagram of the audio recognition method of the difference based on the voice unit word speed according to an embodiment of the invention;
Fig. 2 is the schematic block diagram based on the speech recognition system of the difference of voice unit word speed of first embodiment according to the invention;
Fig. 3 is the schematic block diagram based on the speech recognition system of the difference of voice unit word speed according to second embodiment of the present invention;
Fig. 4 is the schematic block diagram based on the speech recognition system of the difference of voice unit word speed according to the 3rd embodiment of the present invention;
Fig. 5 is the schematic block diagram based on the speech recognition system of the difference of voice unit word speed according to the 4th embodiment of the present invention.
Embodiment
By below in conjunction with the detailed description of accompanying drawing to specific embodiments of the invention, above-mentioned and other goal of the invention, technical characterictic and advantage of the present invention will be more obvious.
Fig. 1 shows the process flow diagram of the audio recognition method of the difference based on the voice unit word speed according to an embodiment of the invention.Below in conjunction with accompanying drawing, present embodiment is described in detail.
In the present embodiment, word speed in supposing in short is stable, promptly the word speed of in short interior each voice unit is substantially the same, therefore, for the similar voice identification result candidate of acoustics score, the recognition result candidate that the recognition result candidate that the word speed difference of voice unit is little is bigger than word speed difference more may be correct recognition result.Present embodiment just is being based on the above-mentioned fact, utilizes the word speed difference of voice unit, and must assign to select best recognition result in conjunction with acoustics.
As shown in Figure 1,, the voice of being imported are carried out pre-service, extract the acoustic feature of the voice of being imported then at step S101.The pre-service of voice and feature extraction operation are known for the person of ordinary skill of the art, therefore, omit its detailed explanation at this.By step S101, can obtain the acoustic feature of voice, for example, linear predictor coefficient LPC, cepstrum coefficient CEP, Mel cepstrum coefficient MFCC and perception linear prediction PLP etc.
Then, at step S105,, voice are decoded, to obtain a plurality of recognition result candidates of these voice based on the acoustic feature that the acoustic model and the utilization of training in advance are extracted.Usually, the decoding of voice is according to search strategy, for example Viterbi algorithm, N-best search, multipass search etc., the speech decoding sequence of the voice that searching is imported.The decoding of voice is known for the person of ordinary skill of the art, therefore, omits its detailed description at this.In the present embodiment, search strategy can adopt the Viterbi algorithm.All has corresponding acoustics score and the segment length of the voice unit that comprised through each recognition result candidate of obtaining after the decoding.
Then, at step S110,,, calculate this recognition result candidate's voice unit word speed difference value based on the segment length of the voice unit that is comprised for a plurality of recognition result candidates that in step S105, obtain each.
In the present embodiment, voice unit can be any one in state, phoneme, syllable, speech or the phrase.The average segment length's of corresponding voice unit ratio in the actual segment length that the word speed of voice unit is defined in the voice unit that obtains among the step S105 and the sound bank, promptly
r u = d u m u - - - ( 1 )
Wherein, r uThe word speed of representing u voice unit, d uThe segment length who represents u voice unit, m uThe average segment length of the voice unit corresponding in the expression sound bank with u voice unit.
In step S110, at first,, calculate the word speed of each voice unit among this recognition result candidate according to formula (1), calculate this recognition result candidate's voice unit word speed difference value then.
In one embodiment, voice unit word speed difference value is defined as maximal value and the difference of minimum value, the i.e. extreme difference of word speed of word speed of all voice units of certain recognition result candidate.Suppose that the recognition result candidate comprises N voice unit, then voice unit word speed difference value can be calculated according to following formula:
s d=max(r 1,r 2,...,r N)-min(r 1,r 2,...,r N),
Wherein, s dExpression voice unit word speed difference value.In this case, from the word speed of all voice units of being calculated, select maximal value and minimum value, and calculate both poor.
In another embodiment, voice unit word speed difference value is defined as the variance of word speed of all voice units of certain recognition result candidate, promptly
s d=var(r 1,r 2,...,r N)。
In this case, calculate the variance of all word speeds according to formula of variance.
In another embodiment, voice unit word speed difference value is defined as the standard deviation of word speed of all voice units of certain recognition result candidate, promptly
s d=stdv(r 1,r 2,...,r N)。
In this case, calculate the standard deviation of all word speeds according to the standard deviation formula.
In another embodiment, voice unit word speed difference value is defined as the coefficient of variation of all voice unit word speeds of certain recognition result candidate, i.e. the ratio of the standard deviation of all voice unit word speeds and mean value, shown in following formula:
s d=stdv(r 1,r 2,...,r N)/mean(r 1,r 2,r N)
In this case, calculate the standard deviation and the mean value of all voice unit word speeds respectively, and calculate both ratio.
Though more than described the method for several computing voices unit word speed difference value, but those of ordinary skill in the art is to be understood that, can also use the method for other computing voice unit word speed difference value, as long as can obtain the total difference of all voice unit word speeds.
Like this, by step S110, can access each recognition result candidate's voice unit word speed difference value.Then, at step S115,, calculate each recognition result candidate's integrate score according to each the recognition result candidate's who is calculated voice unit word speed difference value and acoustics score.
Calculating for integrate score, consider recognition result for the best, its acoustics score should be high more good more, and voice unit word speed difference value is low more good more, therefore, when the branch that learns based on voice unit word speed difference value harmony calculates integrate score, usually voice unit word speed difference value is carried out inversion operation, calculate in conjunction with the acoustics score again.Provide the embodiment of several calculating integrate scores below.Certainly, those of ordinary skill in the art should be appreciated that the method except the calculating integrate score of the following stated, can also adopt other method to calculate integrate score.
In one embodiment, for each recognition result candidate, the reciprocal value of computing voice unit word speed difference value at first, according to predetermined weight coefficient this reciprocal value is weighted again, then with reciprocal value after the weighting and the addition of acoustics score, thereby obtain this recognition result candidate's integrate score.
In another embodiment, the opposite number of computing voice unit word speed difference value at first, according to predetermined weight coefficient this opposite number is weighted again,, thereby obtains this recognition result candidate's integrate score then with opposite number after the weighting and the addition of acoustics score.
In another embodiment, the reciprocal value of computing voice unit word speed difference value at first, according to predetermined weight coefficient this reciprocal value is weighted again, then reciprocal value after the weighting and acoustics score is multiplied each other, thereby obtain this recognition result candidate's integrate score.
In the embodiment of above-mentioned calculating integrate score, weight coefficient can be adjusted according to different identification missions.
At last, at step S120,, select the highest recognition result candidate of integrate score, as the final recognition result of the voice of being imported according to each recognition result candidate's integrate score.
By above description as can be seen, the audio recognition method based on the difference of voice unit word speed of present embodiment has been considered in the speech recognition word speed to segment length's influence, thereby can improve the performance of speech recognition, and has avoided the modeling to the segment length.In addition, the method for present embodiment only need be stored the average segment length of each voice unit in advance, and memory consumption is less, and the calculating of voice unit word speed difference value is simple, and computing time is short.The method of present embodiment is applicable to any speech recognition system, particularly little vocabulary speech recognition system.
Under same inventive concept, Fig. 2 shows the schematic block diagram based on the speech recognition system 200 of the difference of voice unit word speed of first embodiment according to the invention.Below in conjunction with accompanying drawing, present embodiment is described in detail, wherein, suitably omit its explanation for the part identical with front embodiment.
As shown in Figure 2, the speech recognition system 200 based on the difference of voice unit word speed of present embodiment comprises: pretreatment module 201, and it carries out pre-service to the voice of being imported; Characteristic extracting module 202, it extracts the acoustic feature of these voice; Decoder module 203, it is based on the acoustic model of training in advance and utilize the acoustic feature of the voice that extracted, and voice is decoded, to obtain a plurality of recognition result candidates of these voice; Voice unit word speed difference value computing module 204, it based on the segment length of the voice unit that is comprised, calculates this recognition result candidate's voice unit word speed difference value for each of a plurality of recognition result candidates; Integrate score computing module 205, it is for each of a plurality of recognition result candidates, based on the voice unit word speed difference value harmony that the is calculated branch that learns, calculates this recognition result candidate's integrate score; And select module 206, it selects the highest recognition result candidate of integrate score from a plurality of recognition result candidates, as the final recognition result of the voice of being imported.
In the present embodiment, after voice are transfused to pretreatment module 201, carry out the pre-service of voice, characteristic extracting module 202 is extracted the acoustic feature of these voice then.The acoustic feature that is extracted is provided for decoder module 203 with the acoustic model of training in advance, voice are decoded according to search strategy by decoder module 203, to obtain a plurality of recognition result candidates, wherein each recognition result candidate has the acoustics score and the segment length of the voice unit that comprised.As previously mentioned, voice unit can be any one in state, phoneme, syllable, speech or the phrase.
Behind a plurality of recognition result candidates of decoder module 203 output, voice unit word speed difference value computing module 204 is to each recognition result candidate, based on the segment length of the voice unit that is comprised, and computing voice unit word speed difference value.
In the present embodiment, in voice unit word speed difference value computing module 204, at first, word speed computing unit 2041 calculates the word speed of this voice unit for each voice unit among each recognition result candidate.As previously mentioned, the average segment length's of corresponding voice unit ratio in the word speed segment length (i.e. the actual segment length of the voice unit that obtains by decoder module 203) that is defined as voice unit and the sound bank.Then, the difference of maximal value and minimum value in the word speed of extreme difference computing unit 2042 all voice units of calculating is as this recognition result candidate's voice unit word speed difference value.
Then, in integrate score computing module 205,, calculate this recognition result candidate's integrate score according to each recognition result candidate's the voice unit word speed difference value harmony branch that learns.In the present embodiment, at first, computing unit 2051 reciprocal calculates the reciprocal value of this recognition result candidate's voice unit word speed difference value, then, weighted units 2052 is weighted the reciprocal value of being calculated according to predetermined weight coefficient, at last, sum unit 2053 is with reciprocal value after the weighting and the addition of acoustics score, with the integrate score as this recognition result candidate.
Alternatively, when calculating recognition result candidate's integrate score, can also replace reciprocal value with opposite number, that is, in integrate score computing module 205, at first the opposite number computing unit calculates the opposite number of this recognition result candidate's voice unit word speed difference value, then, weighted units is weighted the opposite number that is calculated according to predetermined weight coefficient, and opposite number and the acoustics score addition of sum unit after with weighting then is with the integrate score as this recognition result candidate.
In addition, alternatively, integrate score computing module 205 also can comprise: computing unit reciprocal, and it calculates the reciprocal value of this recognition result candidate's voice unit word speed difference value; Weighted units, it is weighted the reciprocal value of being calculated according to predetermined weight coefficient; And the product computing unit, its reciprocal value and acoustics score after with weighting multiplies each other, with the integrate score as this recognition result candidate.
In above-mentioned integrate score computing module 205, weight coefficient can be adjusted according to different voice recognition tasks.
At last, all recognition result candidates and integrate score thereof all are provided for selects module 206, by selecting module 206 according to integrate score, selects the highest recognition result candidate of integrate score from a plurality of recognition result candidates, as the final recognition result of voice.
Fig. 3 shows the schematic block diagram based on the speech recognition system 300 of the difference of voice unit word speed according to second embodiment of the present invention, and wherein, the part identical with front embodiment used identical Reference numeral, and suitably omits its explanation.Below in conjunction with accompanying drawing, present embodiment is described in detail.
The structure of the speech recognition system 300 of present embodiment and speech recognition system 200 shown in Figure 2 basic identical, difference is: the structure difference of voice unit word speed difference value computing module 304.
In the voice unit word speed difference value computing module 304 of present embodiment, at first, word speed computing unit 3041 calculates the word speed of this voice unit for each voice unit among each recognition result candidate.Then, calculate the variance of word speed of all voice units of each recognition result candidate by variance computing unit 3042, with voice unit word speed difference value as this recognition result candidate.
Equally, the difference based on the speech recognition system 400 of the difference of voice unit word speed and Fig. 2 and speech recognition system 200,300 shown in Figure 3 according to the 3rd embodiment of the present invention illustrated in fig. 4 also is: the structure of voice unit word speed difference value computing module 404 is different.
In the voice unit word speed difference value computing module 404 of present embodiment, at first, word speed computing unit 4041 calculates the word speed of this voice unit for each voice unit among each recognition result candidate.Then, the standard deviation of the word speed of all voice units of standard deviation computing unit 4042 each recognition result candidate of calculating is as this recognition result candidate's voice unit word speed difference value.
Equally, the difference based on the speech recognition system 500 of the difference of voice unit word speed and Fig. 2, Fig. 3 and speech recognition system 200,300,400 shown in Figure 4 according to the 4th embodiment of the present invention illustrated in fig. 5 also is: the structure of voice unit word speed difference value computing module 504 is different.
In the voice unit word speed difference value computing module 504 of present embodiment, at first, word speed computing unit 5041 calculates the word speed of this voice unit for each voice unit among each recognition result candidate.Then, standard deviation computing unit 5042 and average calculation unit 5043 are calculated the standard deviation and the mean value of word speed of all voice units of each recognition result candidate respectively, calculate the ratio of above-mentioned standard deviation and mean value again by ratio calculation unit 5044, as this recognition result candidate's voice unit word speed difference value.
Should be understood that, the foregoing description based on the speech recognition system 200,300,400 of the difference of voice unit word speed and 500 and each ingredient can constitute with special-purpose circuit or chip, also can realize by the corresponding program of computing machine (processor) execution.And the speech recognition system based on the difference of voice unit word speed of the foregoing description can realize the audio recognition method of difference based on the voice unit word speed shown in Figure 1 in operation.
Though more than describe audio recognition method and the speech recognition system of each embodiment of the present invention in detail by some exemplary embodiments based on the difference of voice unit word speed, but above these embodiment are not exhaustive, and those skilled in the art can realize variations and modifications within the spirit and scope of the present invention.Therefore, the present invention is not limited to these embodiment, and scope of the present invention is only defined by the appended claims.

Claims (10)

1. audio recognition method based on the difference of voice unit word speed comprises:
The voice of being imported are carried out pre-service;
Extract the acoustic feature of described voice;
Based on the acoustic model of training in advance and the acoustic feature of the described voice that extracted, described voice are decoded, obtaining a plurality of recognition result candidates of described voice, each of wherein said a plurality of recognition result candidates has the acoustics score and the segment length of the voice unit that comprised;
For each of described a plurality of recognition result candidates,
Based on the segment length of the voice unit that is comprised, calculate this recognition result candidate's voice unit word speed difference value;
Based on the voice unit word speed difference value harmony that the is calculated branch that learns, calculate this recognition result candidate's integrate score; And
From described a plurality of recognition result candidates, select the highest recognition result candidate of described integrate score, as the final recognition result of described voice.
2. audio recognition method according to claim 1, wherein, the step of this recognition result of described calculating candidate's voice unit word speed difference value comprises:
For each voice unit among this recognition result candidate, calculate the word speed of this voice unit, wherein said word speed is the average segment length's of voice unit corresponding in segment length and the sound bank of this voice unit a ratio; And
Calculate the difference of maximal value and minimum value in the word speed of all voice units, as this recognition result candidate's voice unit word speed difference value.
3. audio recognition method according to claim 1, wherein, the step of this recognition result of described calculating candidate's voice unit word speed difference value comprises:
For each voice unit among this recognition result candidate, calculate the word speed of this voice unit, wherein said word speed is the average segment length's of voice unit corresponding in segment length and the sound bank of this voice unit a ratio; And
Calculate the variance of the word speed of all voice units, as this recognition result candidate's voice unit word speed difference value.
4. audio recognition method according to claim 1, wherein, the step of this recognition result of described calculating candidate's voice unit word speed difference value comprises:
For each voice unit among this recognition result candidate, calculate the word speed of this voice unit, wherein said word speed is the average segment length's of voice unit corresponding in segment length and the sound bank of this voice unit a ratio; And
Calculate the standard deviation of the word speed of all voice units, as this recognition result candidate's voice unit word speed difference value.
5. audio recognition method according to claim 1, wherein, the step of this recognition result of described calculating candidate's voice unit word speed difference value comprises:
For each voice unit among this recognition result candidate, calculate the word speed of this voice unit, wherein said word speed is the average segment length's of voice unit corresponding in segment length and the sound bank of this voice unit a ratio;
Calculate the standard deviation and the mean value of the word speed of all voice units; And
Calculate the ratio of described standard deviation and described mean value, as this recognition result candidate's voice unit word speed difference value.
6. audio recognition method according to claim 1, wherein, the step of this recognition result of described calculating candidate's integrate score comprises:
Calculate the reciprocal value of this recognition result candidate's voice unit word speed difference value;
Described reciprocal value is weighted; And
With reciprocal value after the weighting and the addition of described acoustics score, with integrate score as this recognition result candidate.
7. audio recognition method according to claim 1, wherein, the step of this recognition result of described calculating candidate's integrate score comprises:
Calculate the opposite number of this recognition result candidate's voice unit word speed difference value;
Described opposite number is weighted; And
With opposite number after the weighting and the addition of described acoustics score, with integrate score as this recognition result candidate.
8. audio recognition method according to claim 1, wherein, the step of this recognition result of described calculating candidate's integrate score comprises:
Calculate the reciprocal value of this recognition result candidate's voice unit word speed difference value;
Described reciprocal value is weighted; And
Reciprocal value after the weighting and described acoustics score are multiplied each other, with integrate score as this recognition result candidate.
9. audio recognition method according to claim 1, wherein, described voice unit is any one in state, phoneme, syllable, speech or the phrase.
10. speech recognition system based on the difference of voice unit word speed comprises:
Pretreatment module is used for the voice of being imported are carried out pre-service;
Characteristic extracting module is used to extract the acoustic feature of described voice;
Decoder module, be used for based on the acoustic model of training in advance and the acoustic feature of the described voice that extracted, described voice are decoded, obtaining a plurality of recognition result candidates of described voice, each of wherein said a plurality of recognition result candidates has the acoustics score and the segment length of the voice unit that comprised;
Voice unit word speed difference value computing module is used for each for described a plurality of recognition result candidates, based on the segment length of the voice unit that is comprised, calculates this recognition result candidate's voice unit word speed difference value;
The integrate score computing module is used for each for described a plurality of recognition result candidates, based on the voice unit word speed difference value harmony that the is calculated branch that learns, calculates this recognition result candidate's integrate score; And
Select module, be used for selecting the highest recognition result candidate of described integrate score, as the final recognition result of described voice from described a plurality of recognition result candidates.
CN2009101728759A 2009-09-07 2009-09-07 Speech recognition method based on speed difference of voice unit and system thereof Expired - Fee Related CN102013253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101728759A CN102013253B (en) 2009-09-07 2009-09-07 Speech recognition method based on speed difference of voice unit and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009101728759A CN102013253B (en) 2009-09-07 2009-09-07 Speech recognition method based on speed difference of voice unit and system thereof

Publications (2)

Publication Number Publication Date
CN102013253A true CN102013253A (en) 2011-04-13
CN102013253B CN102013253B (en) 2012-06-06

Family

ID=43843398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101728759A Expired - Fee Related CN102013253B (en) 2009-09-07 2009-09-07 Speech recognition method based on speed difference of voice unit and system thereof

Country Status (1)

Country Link
CN (1) CN102013253B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103137126A (en) * 2011-11-30 2013-06-05 北京德信互动网络技术有限公司 Intelligent electronic device based on voice control and voice control method
CN103137127A (en) * 2011-11-30 2013-06-05 北京德信互动网络技术有限公司 Intelligent electronic device based on voice control and voice control method
CN103137125A (en) * 2011-11-30 2013-06-05 北京德信互动网络技术有限公司 Intelligent electronic device based on voice control and voice control method
CN104021786A (en) * 2014-05-15 2014-09-03 北京中科汇联信息技术有限公司 Speech recognition method and speech recognition device
CN104424290A (en) * 2013-09-02 2015-03-18 佳能株式会社 Voice based question-answering system and method for interactive voice system
CN104751847A (en) * 2015-03-31 2015-07-01 刘畅 Data acquisition method and system based on overprint recognition
CN104823235A (en) * 2013-11-29 2015-08-05 三菱电机株式会社 Speech recognition device
CN105989839A (en) * 2015-06-03 2016-10-05 乐视致新电子科技(天津)有限公司 Speech recognition method and speech recognition device
WO2018014537A1 (en) * 2016-07-22 2018-01-25 百度在线网络技术(北京)有限公司 Voice recognition method and apparatus
CN108428446A (en) * 2018-03-06 2018-08-21 北京百度网讯科技有限公司 Audio recognition method and device
CN109065051A (en) * 2018-09-30 2018-12-21 珠海格力电器股份有限公司 A kind of voice recognition processing method and device
CN109102810A (en) * 2017-06-21 2018-12-28 北京搜狗科技发展有限公司 Method for recognizing sound-groove and device
WO2021134546A1 (en) * 2019-12-31 2021-07-08 李庆远 Input method for increasing speech recognition rate
WO2021134549A1 (en) * 2019-12-31 2021-07-08 李庆远 Human merging and training of multiple artificial intelligence outputs
CN113782014A (en) * 2021-09-26 2021-12-10 联想(北京)有限公司 Voice recognition method and device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI627626B (en) * 2017-04-27 2018-06-21 醫療財團法人徐元智先生醫藥基金會亞東紀念醫院 Voice rehabilitation and therapy system and method thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1221937C (en) * 2002-12-31 2005-10-05 北京天朗语音科技有限公司 Voice identification system of voice speed adaption
CN1835076B (en) * 2006-04-07 2010-05-12 安徽中科大讯飞信息科技有限公司 Speech evaluating method of integrally operating speech identification, phonetics knowledge and Chinese dialect analysis

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103137127A (en) * 2011-11-30 2013-06-05 北京德信互动网络技术有限公司 Intelligent electronic device based on voice control and voice control method
CN103137125A (en) * 2011-11-30 2013-06-05 北京德信互动网络技术有限公司 Intelligent electronic device based on voice control and voice control method
CN103137126A (en) * 2011-11-30 2013-06-05 北京德信互动网络技术有限公司 Intelligent electronic device based on voice control and voice control method
CN104424290A (en) * 2013-09-02 2015-03-18 佳能株式会社 Voice based question-answering system and method for interactive voice system
CN104823235A (en) * 2013-11-29 2015-08-05 三菱电机株式会社 Speech recognition device
CN104823235B (en) * 2013-11-29 2017-07-14 三菱电机株式会社 Voice recognition device
CN104021786A (en) * 2014-05-15 2014-09-03 北京中科汇联信息技术有限公司 Speech recognition method and speech recognition device
CN104021786B (en) * 2014-05-15 2017-05-24 北京中科汇联信息技术有限公司 Speech recognition method and speech recognition device
CN104751847A (en) * 2015-03-31 2015-07-01 刘畅 Data acquisition method and system based on overprint recognition
CN105989839B (en) * 2015-06-03 2019-12-13 乐融致新电子科技(天津)有限公司 Speech recognition method and device
CN105989839A (en) * 2015-06-03 2016-10-05 乐视致新电子科技(天津)有限公司 Speech recognition method and speech recognition device
WO2018014537A1 (en) * 2016-07-22 2018-01-25 百度在线网络技术(北京)有限公司 Voice recognition method and apparatus
CN109102810A (en) * 2017-06-21 2018-12-28 北京搜狗科技发展有限公司 Method for recognizing sound-groove and device
CN109102810B (en) * 2017-06-21 2021-10-15 北京搜狗科技发展有限公司 Voiceprint recognition method and device
CN108428446A (en) * 2018-03-06 2018-08-21 北京百度网讯科技有限公司 Audio recognition method and device
US10978047B2 (en) 2018-03-06 2021-04-13 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for recognizing speech
CN109065051A (en) * 2018-09-30 2018-12-21 珠海格力电器股份有限公司 A kind of voice recognition processing method and device
CN109065051B (en) * 2018-09-30 2021-04-09 珠海格力电器股份有限公司 Voice recognition processing method and device
WO2021134546A1 (en) * 2019-12-31 2021-07-08 李庆远 Input method for increasing speech recognition rate
WO2021134549A1 (en) * 2019-12-31 2021-07-08 李庆远 Human merging and training of multiple artificial intelligence outputs
CN113782014A (en) * 2021-09-26 2021-12-10 联想(北京)有限公司 Voice recognition method and device
CN113782014B (en) * 2021-09-26 2024-03-26 联想(北京)有限公司 Speech recognition method and device

Also Published As

Publication number Publication date
CN102013253B (en) 2012-06-06

Similar Documents

Publication Publication Date Title
CN102013253B (en) Speech recognition method based on speed difference of voice unit and system thereof
CN110706690B (en) Speech recognition method and device thereof
US11996097B2 (en) Multilingual wakeword detection
CN109545243B (en) Pronunciation quality evaluation method, pronunciation quality evaluation device, electronic equipment and storage medium
Chang et al. Large vocabulary Mandarin speech recognition with different approaches in modeling tones.
Wester Pronunciation modeling for ASR–knowledge-based and data-derived methods
WO2020029404A1 (en) Speech processing method and device, computer device and readable storage medium
US20110077943A1 (en) System for generating language model, method of generating language model, and program for language model generation
US20220262352A1 (en) Improving custom keyword spotting system accuracy with text-to-speech-based data augmentation
CN112750446B (en) Voice conversion method, device and system and storage medium
Mouaz et al. Speech recognition of moroccan dialect using hidden Markov models
CN107093422B (en) Voice recognition method and voice recognition system
CN112750445B (en) Voice conversion method, device and system and storage medium
Mistry et al. Overview: Speech recognition technology, mel-frequency cepstral coefficients (mfcc), artificial neural network (ann)
CN111968622A (en) Attention mechanism-based voice recognition method, system and device
JP3660512B2 (en) Voice recognition method, apparatus and program recording medium
Sinha et al. Empirical analysis of linguistic and paralinguistic information for automatic dialect classification
Yousfi et al. Holy Qur'an speech recognition system Imaalah checking rule for warsh recitation
Singhal et al. Automatic speech recognition for connected words using DTW/HMM for English/Hindi languages
Sinha et al. Continuous density hidden markov model for hindi speech recognition
Ma et al. Language identification with deep bottleneck features
CN111785302A (en) Speaker separation method and device and electronic equipment
Jalalvand et al. A classifier combination approach for Farsi accents recognition
Tripathi et al. Robust vowel region detection method for multimode speech
Nouza Strategies for developing a real-time continuous speech recognition system for czech language

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120606

Termination date: 20160907

CF01 Termination of patent right due to non-payment of annual fee