CN101887725A

CN101887725A - Phoneme confusion network-based phoneme posterior probability calculation method

Info

Publication number: CN101887725A
Application number: CN2010101648742A
Authority: CN
Inventors: 葛凤培; 颜永红
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2010-04-30
Filing date: 2010-04-30
Publication date: 2010-11-17

Abstract

The invention provides a phoneme confusion network-based phoneme posterior probability calculation method, which comprises the following steps of: preprocessing subframes; extracting voice characteristics of each frame of voice; decoding according to a full syllable circulating network state graph, an acoustic model and voice characteristic vectors to obtain the information of each phoneme segmentation point on the optimal path; in each phoneme segment, establishing a phoneme confusion network corresponding to each phoneme segment, and calculating acoustic likelihood of the voice for each path of the network; calculating a numerator part of the phoneme posterior probability by utilizing the acoustic likelihood obtained on the path corresponding to a learning text, performing time warping on the acoustic likelihood on all paths of the fusion network, accumulating the acoustic likelihood and using the accumulated value as a denominator of the phoneme posterior probability so as to calculate more accurate phoneme posterior probability. In the method, an improved phoneme confusion network-based phoneme posterior probability algorithm is adopted as a basis for evaluating phoneme voice quality, and the accuracy of evaluating the voice quality is greatly improved on the basis of not influencing the calculating speed.

Description

A kind of phoneme posterior probability calculation method based on the phoneme confusion network

Technical field

The invention belongs to the pronunciation quality assessment technical field, specifically, the present invention relates to a kind of confidence calculations method that is used for pronunciation quality evaluation system.

Background technology

Use pronunciation quality evaluation system under field conditions (factors), be different from the use under desirable experimental situation, at this moment the performance of pronunciation quality evaluation system can have substantial degradation.And, in voice, can mix a lot of non-voices for real spoken language, and for example improper pause, cough sound and a lot of neighbourhood noises, this all reaches original assessment precision to pronunciation quality evaluation system and has caused difficulty.In addition, if the vocabulary that the user says also is easier to cause assessment errors not in the predefined territory of pronunciation quality evaluation system.In a word, for business-like pronunciation quality evaluation system, voice quality is as much as possible accurately assessed in being contemplated to be of user, meanwhile also require ratio estimating velocity faster, and the confidence evaluation method solves a kind of key measure of these difficulties just.

The confidence evaluation method can be carried out test of hypothesis to the target speaker of pronunciation quality evaluation system in particular time interval, threshold value by training in advance is estimated the accuracy of voiced segments to be assessed, thereby improves the accuracy rate and the robustness of pronunciation quality evaluation system.

At present, be the posterior probability of target text (being traditional Goodness of Pronunciation algorithm) the wider way of a kind of application as the degree of confidence of calculating pronunciation evaluation.Fig. 1 is the synoptic diagram of existing confidence calculations method.The input voice at first carry out a decoding by full syllable Network Recognition device, in this process, can obtain the phoneme cut-point corresponding to the input voice.In each phoneme section, force to align then, thereby obtain the acoustics likelihood value of target text correspondence with the target phoneme.Utilize the acoustics likelihood value on the best candidate path among the full syllable Network Recognition result again, finally calculate the phoneme posterior probability of target text under voice to be assessed as the degree of confidence score.This algorithm is the simplification to theoretic phoneme posterior probability algorithm.At first, in order to reduce the calculated amount of denominator, the result of its hypothesis summation algorithm is approximately equal to the result of maximizing algorithm.When the user sent out into the concentrated another one phoneme of phoneme by phoneme is wrong, this hypothesis can well be similar to the value of true posterior probability; But when user's pronunciation was different from the phone set any one Received Pronunciation, maximal value just differed with summation that it is enough.At this moment, the summation algorithm is approximately equal to the serious computational accuracy that reduces degree of confidence of hypothesis meeting of maximizing algorithm.Secondly, can be for making posterior probability values in the intersegmental comparison of different phonetic, the GOP algorithm has also adopted the regular strategy of the segment length on the posterior probability basis.But in theory, the acoustics likelihood value is that the semi-invariant of probability with number of speech frames observed in state transition probability and voice, time span directly influences the size of acoustics likelihood value, and indirectly this influence is delivered on the phoneme posterior probability, so it is regular just more reasonable that the acoustics likelihood value is done the time.More than two defectives cause the computational accuracy of traditional GOP algorithm very low, when particularly the user was the second language learner, its performance became and is difficult to accept, thereby was unfavorable for very much the online use and actual popularization of pronunciation quality evaluation system.

Summary of the invention

The objective of the invention is to overcome the deficiencies in the prior art, take all factors into consideration computing velocity and robustness, a kind of phoneme posterior probability algorithm based on the phoneme confusion network that is used for pronunciation quality evaluation system is provided, and this method is a kind ofly to utilize the phoneme confusion network to calculate the phoneme posterior probability and with its algorithm as the pronunciation quality assessment degree of confidence.

For achieving the above object, the phoneme posterior probability algorithm based on the phoneme confusion network in the pronunciation quality evaluation system provided by the invention comprises the steps:

1) with in the phonetic entry speech recognition system to be identified;

2) the input voice are carried out pre-service, comprise the branch frame in this pre-service;

3) adopt perceptual weighting linear forecasting parameter (PLP) feature extracting method or Mei Er territory cepstrum coefficient (MFCC) feature extracting method to extract phonetic feature;

4) utilize constitutional diagram of full syllable recirculating network and acoustic model, characteristic vector sequence is decoded, obtain optimal path, each the phoneme breakpoint information on the record optimal path;

5) context and the target learning text of the optimal path recognition result that obtains according to step 4) are built its corresponding phoneme confusion network in each phoneme section;

6) according to the phoneme confusion network of building in phoneme cut-point that obtains in the step 4) and the step 5), according to acoustic model and voice segments characteristic of correspondence sequence vector, on every paths of confusion network, model state and phonetic feature are done the pressure alignment, obtain the acoustics likelihood value of this voice segments on this path;

7) it is regular that the acoustics likelihood value that step 6) is obtained carries out the segment length, promptly

p _nor((x ₁，...，x _t)|(s ₁，...，s _t))＝p((x ₁，...，x _t)|(s ₁，..，s _t)) ^1/T，

Wherein, p ((x ₁..., x _t) | (s ₁..., s _t)) be regular preceding acoustics likelihood value, p _Nor((x ₁..., x _t) | (s ₁..., s _t)) be the acoustics likelihood value after regular, T is the time span of this phoneme section;

8) calculate phoneme posterior probability based on the phoneme confusion network:

p (ph) = \frac{p_{nor} ({(x_{1}, . . ., x_{t}) | (s_{1}, . . ., s_{t})}_{ref})}{\underset{k &Element; CN}{Σ} p_{nor} ((x_{1}, . . ., x_{t}) | {(s_{1}, . . ., s_{t})}_{k})},

Wherein, (s ₁... s _t) _RefBe the status switch that obtains according to learning text, CN is the confusion network that comprises many phonemes path in parallel;

9) will be based on the phoneme posterior probability of phoneme confusion network as the degree of confidence score of this phoneme in pronunciation quality evaluation system.

In the technique scheme, described step 2) in the input voice being carried out pre-service comprises the input voice is carried out digitizing, pre-emphasis high boost, divides frame and windowing process.

In the technique scheme, extract phonetic feature in the described step 3) and comprise: calculate PLP or MFCC parameter coefficient, calculating energy feature and calculate difference coefficient.

In the technique scheme, full syllable recirculating network decode procedure adopts the viterbi coding/decoding method in the described step 4).

In the technique scheme, in the described step 5) phoneme confusion network build the acoustics similarity of having utilized between phoneme, comprising: determine central phoneme and path in parallel bar number, central phoneme carried out the three-tone expansion, build phoneme confusion network in parallel according to the context of learning text and recognition result.

In the technique scheme, in the described step 7) acoustics likelihood value on every paths is adopted the regular strategy of time of phoneme voice segments length.

In the technique scheme, the phoneme posterior probability adopts the denominator calculative strategy of phoneme confusion network in the described step 8).

Advantage of the present invention is, makes up the foundation that the phoneme confusion network partly calculates as phoneme posterior probability denominator, and the regular strategy of time that adopts the acoustics likelihood value, thereby increases substantially the computational accuracy of pronunciation quality assessment degree of confidence.The present invention is guaranteeing that calculated amount increases the improvement algorithm of the confidence calculations that is used for pronunciation quality evaluation system under few prerequisite, promptly, build confusion network according to the acoustics similarity between phoneme, utilize the processing of suing for peace of acoustics likelihood value on all paths of confusion network, thereby obtain phoneme posterior probability denominator value more accurately, it is regular in addition the acoustics likelihood value to be carried out the segment length, eliminate the difference that the phoneme pronunciation speed causes, the phoneme posterior probability values of Ji Suaning obtains to increase substantially on computational accuracy thus, has effectively improved the accuracy of pronunciation evaluation.

Description of drawings

Fig. 1 is the synoptic diagram of the confidence calculations method of prior art;

Fig. 2 is the process flow diagram of the embodiment of phoneme posterior probability algorithm based on the phoneme confusion network of the present invention;

Fig. 3 is the synoptic diagram of building based on full syllable network state figure in the phoneme posterior probability algorithm of phoneme confusion network of the present invention;

Fig. 4 builds process flow diagram at the confusion network of initial consonant in the phoneme posterior probability algorithm based on the phoneme confusion network of the present invention;

Fig. 5 builds process flow diagram at the confusion network of simple or compound vowel of a Chinese syllable in the phoneme posterior probability algorithm based on the phoneme confusion network of the present invention;

Fig. 6 is of the present invention based on the pressure alignment synoptic diagram based on constitutional diagram in the phoneme posterior probability algorithm of phoneme confusion network.

Embodiment

Below in conjunction with drawings and the specific embodiments the phoneme posterior probability algorithm based on the phoneme confusion network of the present invention is done description further.

Fig. 2 is the process flow diagram of the embodiment of phoneme posterior probability algorithm based on the phoneme confusion network of the present invention.As shown in Figure 2, the phoneme posterior probability algorithm based on the phoneme confusion network in the pronunciation quality evaluation system provided by the invention comprises the steps:

1) with in the phonetic entry speech recognition system to be identified.

2) the input voice are carried out pre-service, described pre-service mainly is to carry out the branch frame.

In the present embodiment, following flow process is adopted in pre-service:

2-1) voice signal is carried out digitizing according to 16K (or 8K) sampling rate;

2-2) carry out high boost by pre-emphasis:

Preemphasis filter is: H (z)=1-α z ^-1, α=0.98 wherein.

2-3) data are carried out the branch frame: getting frame length is the overlapping 15ms of being of 25ms, interframe, can suitably adjust as required;

2-4) windowing process:

Window function adopts hamming window function commonly used:

Wherein, 0≤n≤N-1.

3) extract phonetic feature: the present invention can adopt PLP (Perceptual Linear Predictive, the perceptual weighting linear prediction) or MFCC (mel-frequency cepstral coefficient, Mei Er territory cepstrum coefficient) parameter attribute extracting method, idiographic flow is as follows:

3-1) PLP or the MFCC parameter coefficient c (m) of the every frame voice of calculating, 1≤m≤N _c, N wherein _cBe the number of cepstrum coefficient, N _c=12;

3-2) the energy feature of the every frame voice of calculating;

3-3) single order of calculating energy feature and cepstrum feature and second order difference.Adopt following regression formula to calculate the difference cepstrum coefficient:

Wherein μ is a normalized factor, and τ is an integer, and 2T+1 is the number of speech frames that is used to calculate the difference cepstrum coefficient, wherein: T=2, μ=0.375;

3-4) for each frame voice, generate the proper vector of 39 dimensions.

4) utilize constitutional diagram of full syllable recirculating network and acoustic model, characteristic vector sequence is decoded, obtain optimal path, each the phoneme breakpoint information on the record optimal path.

The construction method of used constitutional diagram is as follows in this step:

Fig. 3 is the synoptic diagram of building based on constitutional diagram in the phoneme posterior probability algorithm of phoneme confusion network of the present invention.As shown in Figure 3, at first erect a search volume, i.e. the network capable of circulation of all syllable parallel connections based on all syllables according to the full syllable grammer.Recognizer will find corresponding to the optimal path (being the path of acoustics likelihood value maximum) of importing voice as recognition result at the enterprising line search of this network.When building decoded state figure,, the network of speech is launched into the network of a phoneme by dictionary information.Each node is made of phoneme, and each phoneme is replaced by corresponding hidden Markov model (HMM) in the acoustic model more then, and each HMM is made up of several states.Like this, final search volume has just become a constitutional diagram, and any paths in the constitutional diagram is represented a syllable sequence candidate, obtains optimal path as recognition result by the likelihood probability value on the more different paths.

The acoustic model that adopts in the present embodiment is gender-related, and boy student's model comprises 4665 states, and schoolgirl's model comprises 4015 states, and each state is all described by 16 Gausses are common.

In the present embodiment, in decode procedure, adopted traditional viterbi search strategy.

Because common HMM acoustic model adopts context-sensitive three-tone as basic modeling unit, so when the building of confusion network, also need phoneme conversion is become three-tone.To each phoneme voice segments, we adopt the foundation of the context of full syllable Network Recognition result's context and learning text as the expansion of phoneme confusion network three-tone simultaneously, and the main rule of the phoneme confusion networking of this expansion is as follows:

When recognition result is initial consonant, in this voice segments, builds parallel network by the three-tone of all initial consonant expansions and carry out statistic calculating.Mandarin initial has 27, when these initial consonants are carried out the three-tone expansion, considers the context of recognition result and learning text simultaneously.According to the Chinese syllable structure of the female series connection of sound, these contexts all are simple or compound vowel of a Chinese syllable, because the pronunciation of tone and phoneme is relatively independent, these simple or compound vowel of a Chinese syllable are carried out the not expansion of same tone, have 5 tones.The final like this confusion network that builds has 5 (above 5 tones) * 5 (hereinafter 5 tones) * 2 (recognition result and learning text be totally two class contexts) * 27 (27 initial consonants) paths and is in parallel.As Fig. 4, be that example is illustrated with " z ", the learning text context of this phoneme is respectively " a4 " and " uo2 ", the context of recognition result is respectively " an3 " and " ui2 ".At first with first initial consonant " aa " as central phoneme, when the learning text context that adopts it carries out the three-tone expansion, until " a5-aa+uo5 " is total to 5*5 three-tone, obtain 5*5 three-tone from " a1-aa+uo1 " when adopting the recognition result context to expand; Other initial consonant has similar operation during as central phoneme, can obtain 5*5*2*27 three-tone like this, constitutes confusion network with these three-tones are in parallel.

When recognition result is simple or compound vowel of a Chinese syllable, in this voice segments, builds parallel network by the three-tone of all simple or compound vowel of a Chinese syllable expansions and carry out statistic calculating.The Chinese simple or compound vowel of a Chinese syllable has 184, when these simple or compound vowel of a Chinese syllable are carried out the three-tone expansion, considers the context of recognition result and learning text simultaneously.According to the Chinese syllable structure of the female series connection of sound, these contexts all are initial consonants.The final like this confusion network that builds has 2 (recognition result and learning text be totally two class contexts) * 184 (184 initial consonants) paths and is in parallel.As Fig. 5, be that example is illustrated with " a4 ", the learning text context of this phoneme is respectively " d " and " z ", the context of recognition result is respectively " t " and " zh ".At first with first simple or compound vowel of a Chinese syllable " a1 " as central phoneme, adopting the learning text context extension be " d-a1+z ", employing recognition result context extension is " t-a1+zh "; Other simple or compound vowel of a Chinese syllable has similar operation during as central phoneme, can obtain 2*184 three-tone like this, constitutes last confusion network with these three-tones are in parallel.

6) according to the phoneme confusion network of building in phoneme cut-point that obtains in the step 4) and the step 5), according to acoustic model and voice segments characteristic of correspondence sequence vector, on every paths of confusion network, acoustic states and speech frame are done the pressure alignment, obtain the state number of each frame voice correspondence, and obtain the acoustics likelihood value of this voice segments on this path

Its negative logarithm is:

- \ln Π_{t = 0}^{T} p (x_{t} | s_{t}) = d (x_{t}, s_{t}) = Σ_{t = 0}^{T} \frac{1}{2} [(x_{t} - μ_{t}) Σ_{t}^{- 1} (x_{t} - μ_{t}) + n \ln (2 π) + \ln (| Σ_{t} |)]

Wherein, x _tT frame phonetic feature for input; s _tBe the state of the Hidden Markov Model (HMM) of t frame phonetic feature correspondence, this state is normal distribution N (μ _t, ∑ _t), μ _tAnd ∑ _tBe respectively state s _tThe mean value vector of model and covariance matrix, its concrete numerical value obtains from acoustic model; N is proper vector x _tDimension, i.e. μ _tAnd ∑ _tDimension.

This pressure alignment procedure also is a simple decode procedure, and candidate item at this moment is all status switches of same phoneme, and the status switch of acoustics likelihood value maximum is separated out as optimal path.Fig. 6 is based on the pressure alignment synoptic diagram of constitutional diagram.Among the figure, dotted line is represented the candidate state sequence, and the optimal path that black solid line representative separates out is the optimum condition sequence.As shown in Figure 6, when a certain status switch is maximum to the likelihood probability P (X|S) of observation sequence (observation sequence in the present embodiment is a proper vector) appearance, think that this status switch is the optimum condition sequence.

p _nor((x ₁，...，x _t)|(s ₁，...，s _t))＝p((x ₁，...，x _t)|(s ₁，...，s _t)) ^1/T，

p (ph) = \frac{p_{nor} ((x_{1}, . . ., x_{t}) | {(s_{1}, . . ., s_{t})}_{ref})}{\underset{k &Element; CN}{Σ} p_{nor} ((x_{1}, . . ., x_{t}) | {(s_{1}, . . ., s_{t})}_{k})},

Wherein, (s ₁..., s _t) _RefBe the status switch that obtains according to learning text, CN is the confusion network that comprises many phonemes path in parallel;

The degree of confidence score of phoneme is used to weigh the quality of this phoneme pronunciation quality.When estimating certainty factor algebra's performance, adopt with expert assessment and evaluation and carry out as mode of comparing, promptly same comments sound data machine assessment and expert assessment and evaluation voice quality are carried out simultaneously, with the result of expert assessment and evaluation as standard, the machine assessment result is consistent with it thinks that the machine assessment is correct, otherwise think the machine estimation error, count the value of a marking accuracy like this.The variation of the accuracy of relatively giving a mark can be known different certainty factor algebras' performance change situation.Must assign to the problem that exists the relation of hinting obliquely between the machine assessment result from the degree of confidence of phoneme, adopt the method for threshold value classification at this.At first adopt a development data collection according to the highest principle of marking accuracy, train the confidence threshold value of each phoneme; In test process,, think that when its degree of confidence score is higher than the threshold value of this phoneme pronunciation is more accurate, otherwise think that then there is defective in this pronunciation at particular phoneme.

Testing experiment:

Use the phoneme posterior probability algorithm among on-the-spot three the data set pairs the present invention who records of Hong Kong mandarin level examination to test based on the phoneme confusion network.Test mission is to estimate the phoneme marking accuracy of pronunciation quality evaluation system, and test set is made of 182 schoolgirls and 107 boy students' speech data.The target voice that every declaimer reads aloud all are 50 individual characters and 25 two-character words of prior appointment, and the target voice content of three data sets has nothing in common with each other.The declaimer all is the graduates in Hong Kong, and mandarin level is generally not so good.All speech datas all there is linguistics expert's phoneme marking result as the foundation of estimating the pronunciation quality evaluation system accuracy.The score of degree of confidence is used to distinguish the quality of voice quality, thinks that when the degree of confidence score is higher than the thresholding of prior setting pronunciation is more accurate, otherwise thinks that then there is defective in this pronunciation.We obtain this thresholding by training, and promptly we take out 60% at random from each data centralization and are used for training threshold value as exploitation collection, and remaining 40% as its test set.Our target is to improve phoneme marking accuracy, promptly makes the accuracy of machine assessment approach expert assessment and evaluation as far as possible.

Use two kinds of different algorithm computation degree of confidence.As shown in Figure 1 a kind of, be defined as traditional GOP algorithmic system, another kind is the phoneme posterior probability algorithm based on the phoneme confusion network of the present invention as shown in Figure 2, is defined as the improvement algorithmic system.

Table 1 be of the present invention based on the phoneme confusion network the phoneme posterior probability algorithm and the performance comparison test chart of traditional GOP algorithm of prior art.The performance comparison test result of two kinds of algorithms is as shown in table 1 below.

Table 1:

System	The initial consonant accuracy	The simple or compound vowel of a Chinese syllable accuracy
System	The initial consonant accuracy	The simple or compound vowel of a Chinese syllable accuracy	Tradition GOP algorithmic system	??0.877	??0.885
Phoneme posterior probability algorithm system based on the phoneme confusion network	??0.918	??0.922	Tradition GOP algorithmic system	??0.877	??0.885

As can be seen from the table, the performance of the phoneme posterior probability algorithm based on the phoneme confusion network used in the present invention is better than traditional GOP algorithm.The marking accuracy of the improvement algorithm that the present invention adopts improves 33.3% relatively at initial consonant, improves 28.7% relatively at initial consonant.

In addition, based on the unobvious calculated amount that increases of the phoneme posterior probability algorithm of phoneme confusion network, the result of real-time testing is as shown in table 2.From table, can find: improve algorithm and all do not bring serious computation burden.

Table 2:

	Tradition GOP algorithmic system	Phoneme posterior probability algorithm system based on the phoneme confusion network
	Tradition GOP algorithmic system		Real-time rate	?1.021	??1.030

Claims

1. the phoneme posterior probability algorithm based on the phoneme confusion network is characterized in that, comprises the steps:

1) imports voice to be identified;

2) the input voice are carried out pre-service, described pre-service comprises the processing of branch frame;

3) extract phonetic feature, obtain the characteristic vector sequence of voice to be identified;

4) utilize constitutional diagram of full syllable recirculating network and acoustic model, characteristic vector sequence is decoded, obtain optimal path, each the phoneme breakpoint information on the record optimal path as recognition result;

5) recognition result and the target learning text that obtains according to step 4) built its corresponding phoneme confusion network in each voice segments;

6) according to the phoneme confusion network of building in phoneme cut-point that obtains in the step 4) and the step 5), according to acoustic model and phoneme section characteristic of correspondence sequence vector, on every paths of confusion network, model state and phonetic feature are done the pressure alignment, obtain the acoustics likelihood value of this voice segments on this path;

Wherein, p ((x ₁..., x _t) | (s ₁..., s _t)) be regular preceding acoustics likelihood value, p _Nor((x ₁..., x _t) | (s ₁..., s _t)) be the acoustics likelihood value after regular, T is the number of speech frames of this phoneme section;

p (ph) = \frac{p_{nor} ((x_{1}, . . ., x_{t}) | {(s_{1}, . . ., s_{t})}_{ref})}{\underset{k &Element; CN}{Σ} p_{nor} ((x_{1}, . . ., x_{t}) | {(s_{1}, . . ., s_{t})}_{k})},

Wherein, (s ₁..., s _t) _RefBe the status switch that obtains according to learning text, CN is the confusion network that comprises many phonemes path in parallel.

2. the phoneme posterior probability algorithm based on the phoneme confusion network according to claim 1 is characterized in that, full syllable recirculating network decode procedure adopts the viterbi coding/decoding method in the described step 4).

3. the phoneme posterior probability algorithm based on the phoneme confusion network according to claim 1 is characterized in that, builds the phoneme confusion network in the described step 5) and comprises three sub-steps, and idiographic flow is as follows:

3-1) determine central phoneme and path in parallel bar number;

3-2) context according to learning text and recognition result carries out the three-tone expansion to central phoneme;

3-3) build phoneme confusion network in parallel.

4. the phoneme posterior probability algorithm based on the phoneme confusion network according to claim 3, it is characterized in that, described step 3-2) in, when central hear sounds element is initial consonant, the context simple or compound vowel of a Chinese syllable of learning text and recognition result is carried out the tone expansion, and the context simple or compound vowel of a Chinese syllable that will carry out learning text after the tone expansion and recognition result is respectively as context factors, with all initial consonants respectively as central phoneme, form a plurality of three-tones, and these three-tones are built into parallel network;

In the middle of hear sounds is plain when be simple or compound vowel of a Chinese syllable, the context initial consonant of learning text and recognition result respectively as the context phoneme, will be carried out all simple or compound vowel of a Chinese syllable that tone expands respectively as central phoneme, form a plurality of three-tones, these three-tones are built into parallel network.

5. the phoneme posterior probability algorithm based on the phoneme confusion network according to claim 1 is characterized in that, in the described step 7) acoustics likelihood value on every paths is adopted phoneme segment length's regular strategy of time.

6. the phoneme posterior probability algorithm based on the phoneme confusion network according to claim 1 is characterized in that, in the described step 8), utilizes and draws the denominator of phoneme posterior probability based on the phoneme confusion network calculations, and then draw described phoneme posterior probability.