CN101887725A - Phoneme confusion network-based phoneme posterior probability calculation method - Google Patents

Phoneme confusion network-based phoneme posterior probability calculation method Download PDF

Info

Publication number
CN101887725A
CN101887725A CN2010101648742A CN201010164874A CN101887725A CN 101887725 A CN101887725 A CN 101887725A CN 2010101648742 A CN2010101648742 A CN 2010101648742A CN 201010164874 A CN201010164874 A CN 201010164874A CN 101887725 A CN101887725 A CN 101887725A
Authority
CN
China
Prior art keywords
phoneme
posterior probability
confusion network
network
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010101648742A
Other languages
Chinese (zh)
Inventor
葛凤培
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CN2010101648742A priority Critical patent/CN101887725A/en
Publication of CN101887725A publication Critical patent/CN101887725A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a phoneme confusion network-based phoneme posterior probability calculation method, which comprises the following steps of: preprocessing subframes; extracting voice characteristics of each frame of voice; decoding according to a full syllable circulating network state graph, an acoustic model and voice characteristic vectors to obtain the information of each phoneme segmentation point on the optimal path; in each phoneme segment, establishing a phoneme confusion network corresponding to each phoneme segment, and calculating acoustic likelihood of the voice for each path of the network; calculating a numerator part of the phoneme posterior probability by utilizing the acoustic likelihood obtained on the path corresponding to a learning text, performing time warping on the acoustic likelihood on all paths of the fusion network, accumulating the acoustic likelihood and using the accumulated value as a denominator of the phoneme posterior probability so as to calculate more accurate phoneme posterior probability. In the method, an improved phoneme confusion network-based phoneme posterior probability algorithm is adopted as a basis for evaluating phoneme voice quality, and the accuracy of evaluating the voice quality is greatly improved on the basis of not influencing the calculating speed.

Description

A kind of phoneme posterior probability calculation method based on the phoneme confusion network
Technical field
The invention belongs to the pronunciation quality assessment technical field, specifically, the present invention relates to a kind of confidence calculations method that is used for pronunciation quality evaluation system.
Background technology
Use pronunciation quality evaluation system under field conditions (factors), be different from the use under desirable experimental situation, at this moment the performance of pronunciation quality evaluation system can have substantial degradation.And, in voice, can mix a lot of non-voices for real spoken language, and for example improper pause, cough sound and a lot of neighbourhood noises, this all reaches original assessment precision to pronunciation quality evaluation system and has caused difficulty.In addition, if the vocabulary that the user says also is easier to cause assessment errors not in the predefined territory of pronunciation quality evaluation system.In a word, for business-like pronunciation quality evaluation system, voice quality is as much as possible accurately assessed in being contemplated to be of user, meanwhile also require ratio estimating velocity faster, and the confidence evaluation method solves a kind of key measure of these difficulties just.
The confidence evaluation method can be carried out test of hypothesis to the target speaker of pronunciation quality evaluation system in particular time interval, threshold value by training in advance is estimated the accuracy of voiced segments to be assessed, thereby improves the accuracy rate and the robustness of pronunciation quality evaluation system.
At present, be the posterior probability of target text (being traditional Goodness of Pronunciation algorithm) the wider way of a kind of application as the degree of confidence of calculating pronunciation evaluation.Fig. 1 is the synoptic diagram of existing confidence calculations method.The input voice at first carry out a decoding by full syllable Network Recognition device, in this process, can obtain the phoneme cut-point corresponding to the input voice.In each phoneme section, force to align then, thereby obtain the acoustics likelihood value of target text correspondence with the target phoneme.Utilize the acoustics likelihood value on the best candidate path among the full syllable Network Recognition result again, finally calculate the phoneme posterior probability of target text under voice to be assessed as the degree of confidence score.This algorithm is the simplification to theoretic phoneme posterior probability algorithm.At first, in order to reduce the calculated amount of denominator, the result of its hypothesis summation algorithm is approximately equal to the result of maximizing algorithm.When the user sent out into the concentrated another one phoneme of phoneme by phoneme is wrong, this hypothesis can well be similar to the value of true posterior probability; But when user's pronunciation was different from the phone set any one Received Pronunciation, maximal value just differed with summation that it is enough.At this moment, the summation algorithm is approximately equal to the serious computational accuracy that reduces degree of confidence of hypothesis meeting of maximizing algorithm.Secondly, can be for making posterior probability values in the intersegmental comparison of different phonetic, the GOP algorithm has also adopted the regular strategy of the segment length on the posterior probability basis.But in theory, the acoustics likelihood value is that the semi-invariant of probability with number of speech frames observed in state transition probability and voice, time span directly influences the size of acoustics likelihood value, and indirectly this influence is delivered on the phoneme posterior probability, so it is regular just more reasonable that the acoustics likelihood value is done the time.More than two defectives cause the computational accuracy of traditional GOP algorithm very low, when particularly the user was the second language learner, its performance became and is difficult to accept, thereby was unfavorable for very much the online use and actual popularization of pronunciation quality evaluation system.
Summary of the invention
The objective of the invention is to overcome the deficiencies in the prior art, take all factors into consideration computing velocity and robustness, a kind of phoneme posterior probability algorithm based on the phoneme confusion network that is used for pronunciation quality evaluation system is provided, and this method is a kind ofly to utilize the phoneme confusion network to calculate the phoneme posterior probability and with its algorithm as the pronunciation quality assessment degree of confidence.
For achieving the above object, the phoneme posterior probability algorithm based on the phoneme confusion network in the pronunciation quality evaluation system provided by the invention comprises the steps:
1) with in the phonetic entry speech recognition system to be identified;
2) the input voice are carried out pre-service, comprise the branch frame in this pre-service;
3) adopt perceptual weighting linear forecasting parameter (PLP) feature extracting method or Mei Er territory cepstrum coefficient (MFCC) feature extracting method to extract phonetic feature;
4) utilize constitutional diagram of full syllable recirculating network and acoustic model, characteristic vector sequence is decoded, obtain optimal path, each the phoneme breakpoint information on the record optimal path;
5) context and the target learning text of the optimal path recognition result that obtains according to step 4) are built its corresponding phoneme confusion network in each phoneme section;
6) according to the phoneme confusion network of building in phoneme cut-point that obtains in the step 4) and the step 5), according to acoustic model and voice segments characteristic of correspondence sequence vector, on every paths of confusion network, model state and phonetic feature are done the pressure alignment, obtain the acoustics likelihood value of this voice segments on this path;
7) it is regular that the acoustics likelihood value that step 6) is obtained carries out the segment length, promptly
p nor((x 1,...,x t)|(s 1,...,s t))=p((x 1,...,x t)|(s 1,..,s t)) 1/T
Wherein, p ((x 1..., x t) | (s 1..., s t)) be regular preceding acoustics likelihood value, p Nor((x 1..., x t) | (s 1..., s t)) be the acoustics likelihood value after regular, T is the time span of this phoneme section;
8) calculate phoneme posterior probability based on the phoneme confusion network:
p ( ph ) = p nor ( ( x 1 , . . . , x t ) | ( s 1 , . . . , s t ) ref ) Σ k ∈ CN p nor ( ( x 1 , . . . , x t ) | ( s 1 , . . . , s t ) k ) ,
Wherein, (s 1... s t) RefBe the status switch that obtains according to learning text, CN is the confusion network that comprises many phonemes path in parallel;
9) will be based on the phoneme posterior probability of phoneme confusion network as the degree of confidence score of this phoneme in pronunciation quality evaluation system.
In the technique scheme, described step 2) in the input voice being carried out pre-service comprises the input voice is carried out digitizing, pre-emphasis high boost, divides frame and windowing process.
In the technique scheme, extract phonetic feature in the described step 3) and comprise: calculate PLP or MFCC parameter coefficient, calculating energy feature and calculate difference coefficient.
In the technique scheme, full syllable recirculating network decode procedure adopts the viterbi coding/decoding method in the described step 4).
In the technique scheme, in the described step 5) phoneme confusion network build the acoustics similarity of having utilized between phoneme, comprising: determine central phoneme and path in parallel bar number, central phoneme carried out the three-tone expansion, build phoneme confusion network in parallel according to the context of learning text and recognition result.
In the technique scheme, in the described step 7) acoustics likelihood value on every paths is adopted the regular strategy of time of phoneme voice segments length.
In the technique scheme, the phoneme posterior probability adopts the denominator calculative strategy of phoneme confusion network in the described step 8).
Advantage of the present invention is, makes up the foundation that the phoneme confusion network partly calculates as phoneme posterior probability denominator, and the regular strategy of time that adopts the acoustics likelihood value, thereby increases substantially the computational accuracy of pronunciation quality assessment degree of confidence.The present invention is guaranteeing that calculated amount increases the improvement algorithm of the confidence calculations that is used for pronunciation quality evaluation system under few prerequisite, promptly, build confusion network according to the acoustics similarity between phoneme, utilize the processing of suing for peace of acoustics likelihood value on all paths of confusion network, thereby obtain phoneme posterior probability denominator value more accurately, it is regular in addition the acoustics likelihood value to be carried out the segment length, eliminate the difference that the phoneme pronunciation speed causes, the phoneme posterior probability values of Ji Suaning obtains to increase substantially on computational accuracy thus, has effectively improved the accuracy of pronunciation evaluation.
Description of drawings
Fig. 1 is the synoptic diagram of the confidence calculations method of prior art;
Fig. 2 is the process flow diagram of the embodiment of phoneme posterior probability algorithm based on the phoneme confusion network of the present invention;
Fig. 3 is the synoptic diagram of building based on full syllable network state figure in the phoneme posterior probability algorithm of phoneme confusion network of the present invention;
Fig. 4 builds process flow diagram at the confusion network of initial consonant in the phoneme posterior probability algorithm based on the phoneme confusion network of the present invention;
Fig. 5 builds process flow diagram at the confusion network of simple or compound vowel of a Chinese syllable in the phoneme posterior probability algorithm based on the phoneme confusion network of the present invention;
Fig. 6 is of the present invention based on the pressure alignment synoptic diagram based on constitutional diagram in the phoneme posterior probability algorithm of phoneme confusion network.
Embodiment
Below in conjunction with drawings and the specific embodiments the phoneme posterior probability algorithm based on the phoneme confusion network of the present invention is done description further.
Fig. 2 is the process flow diagram of the embodiment of phoneme posterior probability algorithm based on the phoneme confusion network of the present invention.As shown in Figure 2, the phoneme posterior probability algorithm based on the phoneme confusion network in the pronunciation quality evaluation system provided by the invention comprises the steps:
1) with in the phonetic entry speech recognition system to be identified.
2) the input voice are carried out pre-service, described pre-service mainly is to carry out the branch frame.
In the present embodiment, following flow process is adopted in pre-service:
2-1) voice signal is carried out digitizing according to 16K (or 8K) sampling rate;
2-2) carry out high boost by pre-emphasis:
Preemphasis filter is: H (z)=1-α z -1, α=0.98 wherein.
2-3) data are carried out the branch frame: getting frame length is the overlapping 15ms of being of 25ms, interframe, can suitably adjust as required;
2-4) windowing process:
Window function adopts hamming window function commonly used:
Figure GSA00000112971500041
Wherein, 0≤n≤N-1.
3) extract phonetic feature: the present invention can adopt PLP (Perceptual Linear Predictive, the perceptual weighting linear prediction) or MFCC (mel-frequency cepstral coefficient, Mei Er territory cepstrum coefficient) parameter attribute extracting method, idiographic flow is as follows:
3-1) PLP or the MFCC parameter coefficient c (m) of the every frame voice of calculating, 1≤m≤N c, N wherein cBe the number of cepstrum coefficient, N c=12;
3-2) the energy feature of the every frame voice of calculating;
3-3) single order of calculating energy feature and cepstrum feature and second order difference.Adopt following regression formula to calculate the difference cepstrum coefficient:
Wherein μ is a normalized factor, and τ is an integer, and 2T+1 is the number of speech frames that is used to calculate the difference cepstrum coefficient, wherein: T=2, μ=0.375;
3-4) for each frame voice, generate the proper vector of 39 dimensions.
4) utilize constitutional diagram of full syllable recirculating network and acoustic model, characteristic vector sequence is decoded, obtain optimal path, each the phoneme breakpoint information on the record optimal path.
The construction method of used constitutional diagram is as follows in this step:
Fig. 3 is the synoptic diagram of building based on constitutional diagram in the phoneme posterior probability algorithm of phoneme confusion network of the present invention.As shown in Figure 3, at first erect a search volume, i.e. the network capable of circulation of all syllable parallel connections based on all syllables according to the full syllable grammer.Recognizer will find corresponding to the optimal path (being the path of acoustics likelihood value maximum) of importing voice as recognition result at the enterprising line search of this network.When building decoded state figure,, the network of speech is launched into the network of a phoneme by dictionary information.Each node is made of phoneme, and each phoneme is replaced by corresponding hidden Markov model (HMM) in the acoustic model more then, and each HMM is made up of several states.Like this, final search volume has just become a constitutional diagram, and any paths in the constitutional diagram is represented a syllable sequence candidate, obtains optimal path as recognition result by the likelihood probability value on the more different paths.
The acoustic model that adopts in the present embodiment is gender-related, and boy student's model comprises 4665 states, and schoolgirl's model comprises 4015 states, and each state is all described by 16 Gausses are common.
In the present embodiment, in decode procedure, adopted traditional viterbi search strategy.
5) context and the target learning text of the optimal path recognition result that obtains according to step 4) are built its corresponding phoneme confusion network in each phoneme section;
Because common HMM acoustic model adopts context-sensitive three-tone as basic modeling unit, so when the building of confusion network, also need phoneme conversion is become three-tone.To each phoneme voice segments, we adopt the foundation of the context of full syllable Network Recognition result's context and learning text as the expansion of phoneme confusion network three-tone simultaneously, and the main rule of the phoneme confusion networking of this expansion is as follows:
When recognition result is initial consonant, in this voice segments, builds parallel network by the three-tone of all initial consonant expansions and carry out statistic calculating.Mandarin initial has 27, when these initial consonants are carried out the three-tone expansion, considers the context of recognition result and learning text simultaneously.According to the Chinese syllable structure of the female series connection of sound, these contexts all are simple or compound vowel of a Chinese syllable, because the pronunciation of tone and phoneme is relatively independent, these simple or compound vowel of a Chinese syllable are carried out the not expansion of same tone, have 5 tones.The final like this confusion network that builds has 5 (above 5 tones) * 5 (hereinafter 5 tones) * 2 (recognition result and learning text be totally two class contexts) * 27 (27 initial consonants) paths and is in parallel.As Fig. 4, be that example is illustrated with " z ", the learning text context of this phoneme is respectively " a4 " and " uo2 ", the context of recognition result is respectively " an3 " and " ui2 ".At first with first initial consonant " aa " as central phoneme, when the learning text context that adopts it carries out the three-tone expansion, until " a5-aa+uo5 " is total to 5*5 three-tone, obtain 5*5 three-tone from " a1-aa+uo1 " when adopting the recognition result context to expand; Other initial consonant has similar operation during as central phoneme, can obtain 5*5*2*27 three-tone like this, constitutes confusion network with these three-tones are in parallel.
When recognition result is simple or compound vowel of a Chinese syllable, in this voice segments, builds parallel network by the three-tone of all simple or compound vowel of a Chinese syllable expansions and carry out statistic calculating.The Chinese simple or compound vowel of a Chinese syllable has 184, when these simple or compound vowel of a Chinese syllable are carried out the three-tone expansion, considers the context of recognition result and learning text simultaneously.According to the Chinese syllable structure of the female series connection of sound, these contexts all are initial consonants.The final like this confusion network that builds has 2 (recognition result and learning text be totally two class contexts) * 184 (184 initial consonants) paths and is in parallel.As Fig. 5, be that example is illustrated with " a4 ", the learning text context of this phoneme is respectively " d " and " z ", the context of recognition result is respectively " t " and " zh ".At first with first simple or compound vowel of a Chinese syllable " a1 " as central phoneme, adopting the learning text context extension be " d-a1+z ", employing recognition result context extension is " t-a1+zh "; Other simple or compound vowel of a Chinese syllable has similar operation during as central phoneme, can obtain 2*184 three-tone like this, constitutes last confusion network with these three-tones are in parallel.
6) according to the phoneme confusion network of building in phoneme cut-point that obtains in the step 4) and the step 5), according to acoustic model and voice segments characteristic of correspondence sequence vector, on every paths of confusion network, acoustic states and speech frame are done the pressure alignment, obtain the state number of each frame voice correspondence, and obtain the acoustics likelihood value of this voice segments on this path
Figure GSA00000112971500061
Its negative logarithm is:
- ln Π t = 0 T p ( x t | s t ) = d ( x t , s t ) = Σ t = 0 T 1 2 [ ( x t - μ t ) Σ t - 1 ( x t - μ t ) + n ln ( 2 π ) + ln ( | Σ t | ) ]
Wherein, x tT frame phonetic feature for input; s tBe the state of the Hidden Markov Model (HMM) of t frame phonetic feature correspondence, this state is normal distribution N (μ t, ∑ t), μ tAnd ∑ tBe respectively state s tThe mean value vector of model and covariance matrix, its concrete numerical value obtains from acoustic model; N is proper vector x tDimension, i.e. μ tAnd ∑ tDimension.
This pressure alignment procedure also is a simple decode procedure, and candidate item at this moment is all status switches of same phoneme, and the status switch of acoustics likelihood value maximum is separated out as optimal path.Fig. 6 is based on the pressure alignment synoptic diagram of constitutional diagram.Among the figure, dotted line is represented the candidate state sequence, and the optimal path that black solid line representative separates out is the optimum condition sequence.As shown in Figure 6, when a certain status switch is maximum to the likelihood probability P (X|S) of observation sequence (observation sequence in the present embodiment is a proper vector) appearance, think that this status switch is the optimum condition sequence.
7) it is regular that the acoustics likelihood value that step 6) is obtained carries out the segment length, promptly
p nor((x 1,...,x t)|(s 1,...,s t))=p((x 1,...,x t)|(s 1,...,s t)) 1/T
Wherein, p ((x 1..., x t) | (s 1..., s t)) be regular preceding acoustics likelihood value, p Nor((x 1..., x t) | (s 1..., s t)) be the acoustics likelihood value after regular, T is the time span of this phoneme section;
8) calculate phoneme posterior probability based on the phoneme confusion network:
p ( ph ) = p nor ( ( x 1 , . . . , x t ) | ( s 1 , . . . , s t ) ref ) Σ k ∈ CN p nor ( ( x 1 , . . . , x t ) | ( s 1 , . . . , s t ) k ) ,
Wherein, (s 1..., s t) RefBe the status switch that obtains according to learning text, CN is the confusion network that comprises many phonemes path in parallel;
9) will be based on the phoneme posterior probability of phoneme confusion network as the degree of confidence score of this phoneme in pronunciation quality evaluation system.
The degree of confidence score of phoneme is used to weigh the quality of this phoneme pronunciation quality.When estimating certainty factor algebra's performance, adopt with expert assessment and evaluation and carry out as mode of comparing, promptly same comments sound data machine assessment and expert assessment and evaluation voice quality are carried out simultaneously, with the result of expert assessment and evaluation as standard, the machine assessment result is consistent with it thinks that the machine assessment is correct, otherwise think the machine estimation error, count the value of a marking accuracy like this.The variation of the accuracy of relatively giving a mark can be known different certainty factor algebras' performance change situation.Must assign to the problem that exists the relation of hinting obliquely between the machine assessment result from the degree of confidence of phoneme, adopt the method for threshold value classification at this.At first adopt a development data collection according to the highest principle of marking accuracy, train the confidence threshold value of each phoneme; In test process,, think that when its degree of confidence score is higher than the threshold value of this phoneme pronunciation is more accurate, otherwise think that then there is defective in this pronunciation at particular phoneme.
Testing experiment:
Use the phoneme posterior probability algorithm among on-the-spot three the data set pairs the present invention who records of Hong Kong mandarin level examination to test based on the phoneme confusion network.Test mission is to estimate the phoneme marking accuracy of pronunciation quality evaluation system, and test set is made of 182 schoolgirls and 107 boy students' speech data.The target voice that every declaimer reads aloud all are 50 individual characters and 25 two-character words of prior appointment, and the target voice content of three data sets has nothing in common with each other.The declaimer all is the graduates in Hong Kong, and mandarin level is generally not so good.All speech datas all there is linguistics expert's phoneme marking result as the foundation of estimating the pronunciation quality evaluation system accuracy.The score of degree of confidence is used to distinguish the quality of voice quality, thinks that when the degree of confidence score is higher than the thresholding of prior setting pronunciation is more accurate, otherwise thinks that then there is defective in this pronunciation.We obtain this thresholding by training, and promptly we take out 60% at random from each data centralization and are used for training threshold value as exploitation collection, and remaining 40% as its test set.Our target is to improve phoneme marking accuracy, promptly makes the accuracy of machine assessment approach expert assessment and evaluation as far as possible.
Use two kinds of different algorithm computation degree of confidence.As shown in Figure 1 a kind of, be defined as traditional GOP algorithmic system, another kind is the phoneme posterior probability algorithm based on the phoneme confusion network of the present invention as shown in Figure 2, is defined as the improvement algorithmic system.
Table 1 be of the present invention based on the phoneme confusion network the phoneme posterior probability algorithm and the performance comparison test chart of traditional GOP algorithm of prior art.The performance comparison test result of two kinds of algorithms is as shown in table 1 below.
Table 1:
System The initial consonant accuracy The simple or compound vowel of a Chinese syllable accuracy
Tradition GOP algorithmic system ??0.877 ??0.885
Phoneme posterior probability algorithm system based on the phoneme confusion network ??0.918 ??0.922
As can be seen from the table, the performance of the phoneme posterior probability algorithm based on the phoneme confusion network used in the present invention is better than traditional GOP algorithm.The marking accuracy of the improvement algorithm that the present invention adopts improves 33.3% relatively at initial consonant, improves 28.7% relatively at initial consonant.
In addition, based on the unobvious calculated amount that increases of the phoneme posterior probability algorithm of phoneme confusion network, the result of real-time testing is as shown in table 2.From table, can find: improve algorithm and all do not bring serious computation burden.
Table 2:
Tradition GOP algorithmic system Phoneme posterior probability algorithm system based on the phoneme confusion network
Real-time rate ?1.021 ??1.030

Claims (6)

1. the phoneme posterior probability algorithm based on the phoneme confusion network is characterized in that, comprises the steps:
1) imports voice to be identified;
2) the input voice are carried out pre-service, described pre-service comprises the processing of branch frame;
3) extract phonetic feature, obtain the characteristic vector sequence of voice to be identified;
4) utilize constitutional diagram of full syllable recirculating network and acoustic model, characteristic vector sequence is decoded, obtain optimal path, each the phoneme breakpoint information on the record optimal path as recognition result;
5) recognition result and the target learning text that obtains according to step 4) built its corresponding phoneme confusion network in each voice segments;
6) according to the phoneme confusion network of building in phoneme cut-point that obtains in the step 4) and the step 5), according to acoustic model and phoneme section characteristic of correspondence sequence vector, on every paths of confusion network, model state and phonetic feature are done the pressure alignment, obtain the acoustics likelihood value of this voice segments on this path;
7) it is regular that the acoustics likelihood value that step 6) is obtained carries out the segment length, promptly
p nor((x 1,...,x t)|(s 1,...,s t))=p((x 1,...,x t)|(s 1,...,s t)) 1/T
Wherein, p ((x 1..., x t) | (s 1..., s t)) be regular preceding acoustics likelihood value, p Nor((x 1..., x t) | (s 1..., s t)) be the acoustics likelihood value after regular, T is the number of speech frames of this phoneme section;
8) calculate phoneme posterior probability based on the phoneme confusion network:
p ( ph ) = p nor ( ( x 1 , . . . , x t ) | ( s 1 , . . . , s t ) ref ) Σ k ∈ CN p nor ( ( x 1 , . . . , x t ) | ( s 1 , . . . , s t ) k ) ,
Wherein, (s 1..., s t) RefBe the status switch that obtains according to learning text, CN is the confusion network that comprises many phonemes path in parallel.
2. the phoneme posterior probability algorithm based on the phoneme confusion network according to claim 1 is characterized in that, full syllable recirculating network decode procedure adopts the viterbi coding/decoding method in the described step 4).
3. the phoneme posterior probability algorithm based on the phoneme confusion network according to claim 1 is characterized in that, builds the phoneme confusion network in the described step 5) and comprises three sub-steps, and idiographic flow is as follows:
3-1) determine central phoneme and path in parallel bar number;
3-2) context according to learning text and recognition result carries out the three-tone expansion to central phoneme;
3-3) build phoneme confusion network in parallel.
4. the phoneme posterior probability algorithm based on the phoneme confusion network according to claim 3, it is characterized in that, described step 3-2) in, when central hear sounds element is initial consonant, the context simple or compound vowel of a Chinese syllable of learning text and recognition result is carried out the tone expansion, and the context simple or compound vowel of a Chinese syllable that will carry out learning text after the tone expansion and recognition result is respectively as context factors, with all initial consonants respectively as central phoneme, form a plurality of three-tones, and these three-tones are built into parallel network;
In the middle of hear sounds is plain when be simple or compound vowel of a Chinese syllable, the context initial consonant of learning text and recognition result respectively as the context phoneme, will be carried out all simple or compound vowel of a Chinese syllable that tone expands respectively as central phoneme, form a plurality of three-tones, these three-tones are built into parallel network.
5. the phoneme posterior probability algorithm based on the phoneme confusion network according to claim 1 is characterized in that, in the described step 7) acoustics likelihood value on every paths is adopted phoneme segment length's regular strategy of time.
6. the phoneme posterior probability algorithm based on the phoneme confusion network according to claim 1 is characterized in that, in the described step 8), utilizes and draws the denominator of phoneme posterior probability based on the phoneme confusion network calculations, and then draw described phoneme posterior probability.
CN2010101648742A 2010-04-30 2010-04-30 Phoneme confusion network-based phoneme posterior probability calculation method Pending CN101887725A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010101648742A CN101887725A (en) 2010-04-30 2010-04-30 Phoneme confusion network-based phoneme posterior probability calculation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010101648742A CN101887725A (en) 2010-04-30 2010-04-30 Phoneme confusion network-based phoneme posterior probability calculation method

Publications (1)

Publication Number Publication Date
CN101887725A true CN101887725A (en) 2010-11-17

Family

ID=43073611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010101648742A Pending CN101887725A (en) 2010-04-30 2010-04-30 Phoneme confusion network-based phoneme posterior probability calculation method

Country Status (1)

Country Link
CN (1) CN101887725A (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063900A (en) * 2010-11-26 2011-05-18 北京交通大学 Speech recognition method and system for overcoming confusing pronunciation
CN102163428A (en) * 2011-01-19 2011-08-24 无敌科技(西安)有限公司 Method for judging Chinese pronunciation
CN102737638A (en) * 2012-06-30 2012-10-17 北京百度网讯科技有限公司 Voice decoding method and device
CN103186658A (en) * 2012-12-24 2013-07-03 中国科学院声学研究所 Method and device for reference grammar generation for automatic grading of spoken English test
CN103474062A (en) * 2012-08-06 2013-12-25 苏州沃通信息科技有限公司 Voice identification method
CN103680500A (en) * 2012-08-29 2014-03-26 北京百度网讯科技有限公司 Speech recognition method and device
CN103985391A (en) * 2014-04-16 2014-08-13 柳超 Phonetic-level low power consumption spoken language evaluation and defect diagnosis method without standard pronunciation
CN104142974A (en) * 2014-01-20 2014-11-12 腾讯科技(深圳)有限公司 Voice file querying method and device
CN104157285A (en) * 2013-05-14 2014-11-19 腾讯科技(深圳)有限公司 Voice recognition method and device, and electronic equipment
CN105981099A (en) * 2014-02-06 2016-09-28 三菱电机株式会社 Speech search device and speech search method
CN106205603A (en) * 2016-08-29 2016-12-07 北京语言大学 A kind of tone appraisal procedure
CN106504741A (en) * 2016-09-18 2017-03-15 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of phonetics transfer method based on deep neural network phoneme information
CN106782513A (en) * 2017-01-25 2017-05-31 上海交通大学 Speech recognition realization method and system based on confidence level
CN106782508A (en) * 2016-12-20 2017-05-31 美的集团股份有限公司 The cutting method of speech audio and the cutting device of speech audio
CN106856095A (en) * 2015-12-09 2017-06-16 中国科学院声学研究所 The voice quality evaluating system that a kind of phonetic is combined into syllables
CN107492373A (en) * 2017-10-11 2017-12-19 河南理工大学 The Tone recognition method of feature based fusion
CN108615525A (en) * 2016-12-09 2018-10-02 ***通信有限公司研究院 A kind of audio recognition method and device
CN109377981A (en) * 2018-11-22 2019-02-22 四川长虹电器股份有限公司 The method and device of phoneme alignment
CN109712643A (en) * 2019-03-13 2019-05-03 北京精鸿软件科技有限公司 The method and apparatus of Speech Assessment
CN109863554A (en) * 2016-10-27 2019-06-07 香港中文大学 Acoustics font model and acoustics font phonemic model for area of computer aided pronunciation training and speech processes
CN110176249A (en) * 2019-04-03 2019-08-27 苏州驰声信息科技有限公司 A kind of appraisal procedure and device of spoken language pronunciation
CN110490428A (en) * 2019-07-26 2019-11-22 合肥讯飞数码科技有限公司 Job of air traffic control method for evaluating quality and relevant apparatus
CN110808050A (en) * 2018-08-03 2020-02-18 蔚来汽车有限公司 Voice recognition method and intelligent equipment
CN111128238A (en) * 2019-12-31 2020-05-08 云知声智能科技股份有限公司 Mandarin assessment method and device
CN112259089A (en) * 2019-07-04 2021-01-22 阿里巴巴集团控股有限公司 Voice recognition method and device
CN113744718A (en) * 2020-05-27 2021-12-03 海尔优家智能科技(北京)有限公司 Voice text output method and device, storage medium and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956679A (en) * 1996-12-03 1999-09-21 Canon Kabushiki Kaisha Speech processing apparatus and method using a noise-adaptive PMC model
CN1773606A (en) * 2004-11-12 2006-05-17 中国科学院声学研究所 Voice decoding method based on mixed network
CN101315733A (en) * 2008-07-17 2008-12-03 安徽科大讯飞信息科技股份有限公司 Self-adapting method aiming at computer language learning system pronunciation evaluation
CN101447184A (en) * 2007-11-28 2009-06-03 中国科学院声学研究所 Chinese-English bilingual speech recognition method based on phoneme confusion
CN101464896A (en) * 2009-01-23 2009-06-24 安徽科大讯飞信息科技股份有限公司 Voice fuzzy retrieval method and apparatus
CN101645271A (en) * 2008-12-23 2010-02-10 中国科学院声学研究所 Rapid confidence-calculation method in pronunciation quality evaluation system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956679A (en) * 1996-12-03 1999-09-21 Canon Kabushiki Kaisha Speech processing apparatus and method using a noise-adaptive PMC model
CN1773606A (en) * 2004-11-12 2006-05-17 中国科学院声学研究所 Voice decoding method based on mixed network
CN101447184A (en) * 2007-11-28 2009-06-03 中国科学院声学研究所 Chinese-English bilingual speech recognition method based on phoneme confusion
CN101315733A (en) * 2008-07-17 2008-12-03 安徽科大讯飞信息科技股份有限公司 Self-adapting method aiming at computer language learning system pronunciation evaluation
CN101645271A (en) * 2008-12-23 2010-02-10 中国科学院声学研究所 Rapid confidence-calculation method in pronunciation quality evaluation system
CN101464896A (en) * 2009-01-23 2009-06-24 安徽科大讯飞信息科技股份有限公司 Voice fuzzy retrieval method and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《声学学报》 20100331 葛凤培等 汉语发音质量评估的实验研究 第35卷, 第2期 2 *

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063900A (en) * 2010-11-26 2011-05-18 北京交通大学 Speech recognition method and system for overcoming confusing pronunciation
CN102163428A (en) * 2011-01-19 2011-08-24 无敌科技(西安)有限公司 Method for judging Chinese pronunciation
CN102737638B (en) * 2012-06-30 2015-06-03 北京百度网讯科技有限公司 Voice decoding method and device
CN102737638A (en) * 2012-06-30 2012-10-17 北京百度网讯科技有限公司 Voice decoding method and device
CN103474062A (en) * 2012-08-06 2013-12-25 苏州沃通信息科技有限公司 Voice identification method
CN103680500A (en) * 2012-08-29 2014-03-26 北京百度网讯科技有限公司 Speech recognition method and device
CN103680500B (en) * 2012-08-29 2018-10-16 北京百度网讯科技有限公司 A kind of method and apparatus of speech recognition
CN103186658A (en) * 2012-12-24 2013-07-03 中国科学院声学研究所 Method and device for reference grammar generation for automatic grading of spoken English test
CN103186658B (en) * 2012-12-24 2016-05-25 中国科学院声学研究所 Reference grammer for Oral English Exam automatic scoring generates method and apparatus
CN104157285A (en) * 2013-05-14 2014-11-19 腾讯科技(深圳)有限公司 Voice recognition method and device, and electronic equipment
WO2014183373A1 (en) * 2013-05-14 2014-11-20 Tencent Technology (Shenzhen) Company Limited Systems and methods for voice identification
CN104157285B (en) * 2013-05-14 2016-01-20 腾讯科技(深圳)有限公司 Audio recognition method, device and electronic equipment
US9558741B2 (en) 2013-05-14 2017-01-31 Tencent Technology (Shenzhen) Company Limited Systems and methods for speech recognition
CN104142974B (en) * 2014-01-20 2016-02-24 腾讯科技(深圳)有限公司 A kind of voice document querying method and device
CN104142974A (en) * 2014-01-20 2014-11-12 腾讯科技(深圳)有限公司 Voice file querying method and device
US10453477B2 (en) 2014-01-20 2019-10-22 Tencent Technology (Shenzhen) Company Limited Method and computer system for performing audio search on a social networking platform
US9818432B2 (en) 2014-01-20 2017-11-14 Tencent Technology (Shenzhen) Company Limited Method and computer system for performing audio search on a social networking platform
CN105981099A (en) * 2014-02-06 2016-09-28 三菱电机株式会社 Speech search device and speech search method
CN103985391A (en) * 2014-04-16 2014-08-13 柳超 Phonetic-level low power consumption spoken language evaluation and defect diagnosis method without standard pronunciation
CN106856095A (en) * 2015-12-09 2017-06-16 中国科学院声学研究所 The voice quality evaluating system that a kind of phonetic is combined into syllables
CN106205603A (en) * 2016-08-29 2016-12-07 北京语言大学 A kind of tone appraisal procedure
CN106205603B (en) * 2016-08-29 2019-06-07 北京语言大学 A kind of tone appraisal procedure
CN106504741A (en) * 2016-09-18 2017-03-15 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of phonetics transfer method based on deep neural network phoneme information
CN109863554A (en) * 2016-10-27 2019-06-07 香港中文大学 Acoustics font model and acoustics font phonemic model for area of computer aided pronunciation training and speech processes
CN109863554B (en) * 2016-10-27 2022-12-02 香港中文大学 Acoustic font model and acoustic font phoneme model for computer-aided pronunciation training and speech processing
CN108615525A (en) * 2016-12-09 2018-10-02 ***通信有限公司研究院 A kind of audio recognition method and device
CN106782508A (en) * 2016-12-20 2017-05-31 美的集团股份有限公司 The cutting method of speech audio and the cutting device of speech audio
CN106782513B (en) * 2017-01-25 2019-08-23 上海交通大学 Speech recognition realization method and system based on confidence level
CN106782513A (en) * 2017-01-25 2017-05-31 上海交通大学 Speech recognition realization method and system based on confidence level
CN107492373A (en) * 2017-10-11 2017-12-19 河南理工大学 The Tone recognition method of feature based fusion
CN107492373B (en) * 2017-10-11 2020-11-27 河南理工大学 Tone recognition method based on feature fusion
CN110808050B (en) * 2018-08-03 2024-04-30 蔚来(安徽)控股有限公司 Speech recognition method and intelligent device
CN110808050A (en) * 2018-08-03 2020-02-18 蔚来汽车有限公司 Voice recognition method and intelligent equipment
CN109377981A (en) * 2018-11-22 2019-02-22 四川长虹电器股份有限公司 The method and device of phoneme alignment
CN109377981B (en) * 2018-11-22 2021-07-23 四川长虹电器股份有限公司 Phoneme alignment method and device
CN109712643A (en) * 2019-03-13 2019-05-03 北京精鸿软件科技有限公司 The method and apparatus of Speech Assessment
CN110176249A (en) * 2019-04-03 2019-08-27 苏州驰声信息科技有限公司 A kind of appraisal procedure and device of spoken language pronunciation
CN112259089A (en) * 2019-07-04 2021-01-22 阿里巴巴集团控股有限公司 Voice recognition method and device
CN110490428A (en) * 2019-07-26 2019-11-22 合肥讯飞数码科技有限公司 Job of air traffic control method for evaluating quality and relevant apparatus
CN111128238A (en) * 2019-12-31 2020-05-08 云知声智能科技股份有限公司 Mandarin assessment method and device
CN113744718A (en) * 2020-05-27 2021-12-03 海尔优家智能科技(北京)有限公司 Voice text output method and device, storage medium and electronic device

Similar Documents

Publication Publication Date Title
CN101887725A (en) Phoneme confusion network-based phoneme posterior probability calculation method
CN101645271B (en) Rapid confidence-calculation method in pronunciation quality evaluation system
CN100411011C (en) Pronunciation quality evaluating method for language learning machine
CN105845134B (en) Spoken language evaluation method and system for freely reading question types
Arora et al. Automatic speech recognition: a review
CN101930735B (en) Speech emotion recognition equipment and speech emotion recognition method
CN104575490A (en) Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm
CN101436403B (en) Method and system for recognizing tone
CN106782603B (en) Intelligent voice evaluation method and system
CN104240706B (en) It is a kind of that the method for distinguishing speek person that similarity corrects score is matched based on GMM Token
CN101840699A (en) Voice quality evaluation method based on pronunciation model
Ghai et al. Analysis of automatic speech recognition systems for indo-aryan languages: Punjabi a case study
CN104078039A (en) Voice recognition system of domestic service robot on basis of hidden Markov model
Fukuda et al. Detecting breathing sounds in realistic Japanese telephone conversations and its application to automatic speech recognition
Mary et al. Searching speech databases: features, techniques and evaluation measures
Luo et al. Automatic pronunciation evaluation of language learners' utterances generated through shadowing.
US20220199071A1 (en) Systems and Methods for Speech Validation
CN112767961B (en) Accent correction method based on cloud computing
CN104240699A (en) Simple and effective phrase speech recognition method
Li et al. Improving mandarin tone mispronunciation detection for non-native learners with soft-target tone labels and blstm-based deep models
CN113705671A (en) Speaker identification method and system based on text related information perception
Li et al. English sentence pronunciation evaluation using rhythm and intonation
Rocha et al. Voice segmentation system based on energy estimation
Kadir et al. Bangla speech sentence recognition using hidden Markov models
Baghai-Ravary et al. Precision of phoneme boundaries derived using Hidden Markov Models

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20101117