CN104766611A - Objective task distribution estimation method and system and acoustic model self-adaptive method and system - Google Patents

Objective task distribution estimation method and system and acoustic model self-adaptive method and system Download PDF

Info

Publication number
CN104766611A
CN104766611A CN201410007278.1A CN201410007278A CN104766611A CN 104766611 A CN104766611 A CN 104766611A CN 201410007278 A CN201410007278 A CN 201410007278A CN 104766611 A CN104766611 A CN 104766611A
Authority
CN
China
Prior art keywords
data
distribution
goal task
low confidence
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410007278.1A
Other languages
Chinese (zh)
Inventor
贺志阳
吕萍
吴及
胡国平
胡郁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201410007278.1A priority Critical patent/CN104766611A/en
Publication of CN104766611A publication Critical patent/CN104766611A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an objective task distribution estimation method and system and an acoustic model self-adaptive method and system. The objective task distribution estimation method comprises the following steps: acquiring distribution of objective tasks corresponding to a candidate voice recognition result data set to serve as coverage distribution of the objective tasks; acquiring a first low-confidence data set in the way that the confidence, smaller than or equal to the threshold value of a first confidence, of a voice recognition result in the candidate voice recognition result data set is preferred; acquiring distribution of the objective tasks corresponding to the first low-confidence data set to serve as confusion distribution of the objective tasks; and fusing the coverage distribution and the confusion distribution to obtain distribution of the objective tasks. The objective task distribution estimation method and system estimate distribution of the objective tasks based on the candidate voice recognition result data set, so that the advantages of being strong in timeliness, saving manpower and lowering the cost are achieved. As the confusion distribution of the objective tasks, acquired based on the voice recognition result with relatively poor identification effect, is fused, the performance of an entire voice recognition system can be effectively improved.

Description

Goal task distribution is estimated and acoustics model self-adapting method and system
Technical field
The present invention relates to field of speech recognition, particularly relate to a kind of acoustic model adaptive approach and system of oriented mission.
Background technology
From the nineties in 20th century, researchist proposes speaker adaptation technology for the acoustic model in speech recognition system, as maximum likelihood linearly returns (MLLR), maximum a posteriori probability (MAP) etc., by this technology, only need to gather a small amount of speaker's data and can be optimized adjustment to original acoustic model (training the speaker's independence model obtained in advance in a large amount of conventional data), make the acoustic model after adjusting more close to speaker's feature, and then recognition accuracy can be improved.In recent years, along with the high speed development of mobile Internet and cloud computing, speech recognition technology is more universal, its applied environment also increasingly complex, and the acoustic model adaptive technique of oriented mission has become new study hotspot.The acoustic model adaptive technique of oriented mission (referring to voice recognition tasks) carries out self-adaptative adjustment for concrete identification application to acoustic model parameters, makes the acoustic model parameters after adjusting more mate with voice recognition tasks and then obtain better recognition performance.Traditional speaker adaptation technology for concrete speaker can not meet the demand of application.
Traditional acoustic model adaptive approach towards voice recognition tasks comprises the steps:
Steps A: in statistics concrete sound identification mission, the frequency of occurrences of basic voice unit distributes as goal task; Basic voice unit is wherein generally the basic recognition unit such as syllable unit, phoneme unit.When adding up the distribution of basic voice unit, system is added up according to the relevant artificial mark training data (namely voice recognition tasks being carried out to the result data of artificial cognition) of task or the relevant voice identification result data of task (i.e. system identify voice recognition tasks result data) usually, adds up the frequency of occurrences of each basic voice unit in above-mentioned data and distributes as goal task.
Step B: select self-adapting data the artificial mark training data of being correlated with from task according to the distribution of described goal task or the relevant voice identification result data of task, make the distribution of self-adapting data distribute consistent with goal task.
In stepb, by selecting this self-adapting data based on KL distance (Kullback-Leibler Divergence) by greedy algorithm, concrete steps are as follows:
Step B1: the voice identification result data that the artificial mark training data of task being correlated with or task are correlated with are as alternate data set, and it is empty set that data set has been selected in setting, and data volume is selected in setting.
Step B2: investigate each data in alternate data set successively, wherein, investigating the method for current data is: current data put into and select data acquisition, calculates the KL distance that the distribution of having selected data acquisition and goal task distribute, recovers afterwards to have selected data acquisition.
Step B3: select to make in step B2 new select KL that data acquisition and goal task distribute and select object apart from minimum data as this, these data are put into and selects data acquisition, and delete this data from alternate data set.
Step B4: what judge whether the data volume having selected data centralization reach setting selects data volume, if reached, then exits selecting step, otherwise, continue to perform step B2.
Step three: manually correction is marked to the self-adapting data selected in step B; If alternate data set derives from the relevant voice identification result data of task, so in order to ensure that the correctness of self-adapting data also needs manually to mark correction to selected self-adapting data, if alternate data set derives from the relevant artificial mark training data of task, then omit step 3.
Step four: utilize the self-adapting data selected original acoustic model to be carried out to the self-adaptative adjustment of model parameter, obtain the acoustic model optimized.
As can be seen here, above traditional acoustic model adaptive approach towards voice recognition tasks mainly selects self-adapting data based on the voice unit conforming principle that distributes, the realization of this kind of method is simply direct, and achieves certain achievement, but still there is following defect in actual applications:
1. calculate goal task distribution and there is certain uncertainty, such as: the self-adapting data that classic method is selected only considers that self-adapting data has the voice unit data cover degree consistent with specific tasks, and the lifting of speech recognition system performance needs emphasis to improve the voice unit identification situation that in original system, recognition effect is poor, thus rationally pay close attention under the condition ensureing data balancing and identify that the data of poor voice unit are of practical significance to raising system performance.
2. there is certain problem in the coverage analysis of the artificial mark training data that traditional task based access control is correlated with, to estimate that Data distribution8 needs relatively large artificial labeled data comparatively accurately on the one hand, the artificial labeled data in another aspect is not often ageing very strong data, uses the distribution of these data estimation to be difficult to the Data distribution8 situation truly reflected in current system.
3. traditional distribution of the coverage based on voice identification result data, although can ensure the ageing of distribution, because voice identification result is often wrong, the distribution obtained accordingly is estimated also to be inaccurate.
4. selected self-adapting data not necessarily meets application demand, carry out data based on artificial labeled data select if this is embodied in traditional work transformation matrix method, so in the data volume of artificial mark, there is certain problem, if the data volume of artificial mark is inadequate, then be difficult to ensure the target that the Data distribution8 of distribution and the estimation reaching the self-adapting data selected is close as much as possible, and if such target will be reached, just need a large amount of artificial labeled data as the alternate data selected, this needs to consume a large amount of manpower mark resources; Select if traditional work transformation matrix method carries out data based on voice identification result data, so because recognition result is wrong, the Data distribution8 of the data set selected probably has larger gap with the target distribution estimated.
Based on above-mentioned analysis, traditional task acoustic model self-adaptation all likely causes the poor effect of final work transformation matrix.This this case is proposed to a kind of adaptive approach of new oriented mission, propose a kind of new goal task distribution estimation criterion and method, by estimating that voice unit distribution and efficient data are selected more accurately and effectively, improve the recognition performance of self-adapting recognition system.
Summary of the invention
One object of the present invention is to overcome deficiency of the prior art, provides one goal task distribution estimation method more accurately and effectively.
For achieving the above object, the technical solution used in the present invention is: a kind of goal task distribution estimation method, comprising:
Obtain the distribution of described goal task corresponding to candidate speech recognition result data set, the coverage as described goal task distributes;
The degree of confidence obtaining the first-selected voice identification result of described candidate speech recognition result data centralization is less than or equal to the voice identification result of the first confidence threshold value, forms the first low confidence data set;
Obtain the distribution of described goal task corresponding to described first low confidence data set, the degree of aliasing as described goal task distributes;
The distribution of the coverage of described goal task and degree of aliasing distribution are merged, obtains the distribution of described goal task.
Preferably, the described goal task of described acquisition comprises corresponding to the distribution of candidate speech recognition result data set:
Described goal task is decomposed into each voice unit;
Calculate the frequency of occurrences of institute's speech units in each voice identification result of described candidate speech recognition result data set, as first frequency of occurrences of institute's speech units;
Obtain described first frequency of occurrences of all voice units in described goal task, as the distribution of described goal task corresponding to candidate speech recognition result data set.
Preferably, the described goal task of described acquisition comprises corresponding to the distribution of described first low confidence data set;
Calculate the frequency of occurrences of institute's speech units in each voice identification result of described first low confidence data set, as second frequency of occurrences of institute's speech units;
Obtain described second frequency of occurrences of all voice units in described goal task, as the distribution of described goal task corresponding to described first low confidence data set.
Preferably, the described distribution of the coverage to described goal task and degree of aliasing distribution are merged, and obtain the distribution of described goal task and comprise:
Linear weighted function is carried out to first frequency of occurrences of institute's speech units and second frequency of occurrences, obtains the fusion frequency of occurrences of institute's speech units;
The fusion frequency of occurrences obtaining all voice units in described goal task distributes as described goal task.
Second object of the present invention is based on above-mentioned goal task distribution estimation method, provides one acoustic model adaptive approach more accurately and effectively.
The technical solution used in the present invention is: a kind of acoustic model adaptive approach, comprising:
Above-mentioned any one goal task distribution estimation method is utilized to obtain goal task distribution;
From candidate speech recognition result data, select self-adapting data, make the distribution of self-adapting data and described goal task distribute closest;
Utilize described self-adapting data to carry out the self-adaptative adjustment of model parameter to current acoustic model, obtain the acoustic model optimized.
Preferably, describedly from candidate speech recognition result data, select self-adapting data comprise:
The degree of confidence obtaining the first-selected voice identification result of described candidate speech recognition result data centralization is less than or equal to the voice identification result of the second confidence threshold value, forms the second low confidence data set;
Select low confidence data from described second low confidence data centralization, make the distribution of low confidence data and described goal task distribute closest;
Described low confidence data are manually marked, makes the low confidence data of carrying out artificial mark become a part of self-adapting data;
Supplement from first-selected voice identification result data centralization and select another part self-adapting data, make the distribution of described self-adapting data and described goal task distribute closest.
Preferably, selecting low confidence data from described second low confidence data centralization, making the distribution of low confidence data and described goal task distribute closest to comprising:
Described low confidence data are obtained the distribution of described low confidence data as described goal task.
Preferably, selecting low confidence data from described second low confidence data centralization, making the distribution of low confidence data and described goal task distribute closest to also comprising:
Select described low confidence data based on KL distance by greedy algorithm, wherein initial data set of selecting is empty set, and alternate data collection is the second low confidence data set.
Preferably, supplementing from first-selected voice identification result data centralization and select another part self-adapting data, making the distribution of described self-adapting data and described goal task distribute closest to comprising:
Obtain the distribution of described self-adapting data corresponding to described first-selected voice identification result data set as the distribution of described self-adapting data.
Preferably, supplementing from first-selected voice identification result data centralization and select another part self-adapting data, making the distribution of described self-adapting data and described goal task distribute closest to also comprising:
Select described another part self-adapting data based on KL distance by greedy algorithm, wherein, initial data set of selecting is made up of described low confidence data of carrying out artificial mark, and alternate data collection is described first-selected voice identification result data set.
3rd object of the present invention there is provided one goal task distribution more accurately and effectively estimating system.
The technical solution used in the present invention is: a kind of goal task distribution estimating system, comprising:
Coverage distributed acquisition module, for obtaining the distribution of described goal task corresponding to candidate speech recognition result data set, the coverage as described goal task distributes;
First low confidence data set acquisition module, the degree of confidence for obtaining the first-selected voice identification result of described candidate speech recognition result data centralization is less than or equal to the voice identification result of the first confidence threshold value, forms the first low confidence data set;
Degree of aliasing distributed acquisition module, for obtaining the distribution of described goal task corresponding to described first low confidence data set, the degree of aliasing as described goal task distributes; And,
Goal task distributed acquisition module, for merging the distribution of the coverage of described goal task and degree of aliasing distribution, obtains the distribution of described goal task.
Preferably, described coverage distributed acquisition module comprises:
Resolving cell, for being decomposed into each voice unit by described goal task;
First frequency of occurrences computing unit, the frequency of occurrences of speech units in each voice identification result of described candidate speech recognition result data set for calculating, as first frequency of occurrences of institute's speech units; And,
Coverage distribution statistics unit, for obtaining described first frequency of occurrences of all voice units in described goal task, as the distribution of described goal task corresponding to candidate speech recognition result data set.
Preferably, described degree of aliasing distributed acquisition module comprises:
Described resolving cell;
Second frequency of occurrences computing unit, the frequency of occurrences of speech units in each voice identification result of described first low confidence data set for calculating, as second frequency of occurrences of institute's speech units; And,
Degree of aliasing distribution statistics unit, for obtaining described second frequency of occurrences of all voice units in described goal task, as the distribution of described goal task corresponding to described first low confidence data set.
Preferably, described goal task distributed acquisition module comprises:
Integrated unit, for carrying out linear weighted function to first frequency of occurrences of institute's speech units and second frequency of occurrences, obtains the fusion frequency of occurrences of institute's speech units;
Goal task distribution statistics unit, distributes as described goal task for the fusion frequency of occurrences obtaining all voice units in described goal task.
4th object of the present invention is to provide one acoustic model adaptive system more accurately and effectively.
The technical solution used in the present invention is: a kind of acoustic model adaptive system, comprising:
Above-mentioned any one goal task distribution estimating system, for obtaining goal task distribution;
Self-adapting data Choosing module, for selecting self-adapting data from candidate speech recognition result data, makes the distribution of self-adapting data and described goal task distribute closest; And,
Acoustic model optimizes module, for utilizing described self-adapting data to carry out the self-adaptative adjustment of model parameter to current acoustic model, obtains the acoustic model optimized.
Preferably, described self-adapting data Choosing module comprises:
Second low confidence data set acquiring unit, the degree of confidence for obtaining the first-selected voice identification result of described candidate speech recognition result data centralization is less than or equal to the voice identification result of the second confidence threshold value, forms the second low confidence data set;
Low confidence data module of selection, for selecting low confidence data from described second low confidence data centralization, makes the distribution of low confidence data and described goal task distribute closest;
Artificial mark unit, for manually marking described low confidence data, makes the low confidence data of carrying out artificial mark become a part of self-adapting data;
High confidence level data module of selection, selecting another part self-adapting data for supplementing from first-selected voice identification result data centralization, making the distribution of described self-adapting data and described goal task distribute closest.
Preferably, described low confidence data module of selection is used for described low confidence data to input to described goal task distribution estimating system, to obtain the distribution of described low confidence data as described goal task.
Preferably, described low confidence data module of selection is used for selecting described low confidence data based on KL distance by greedy algorithm, and wherein initial data set of selecting is empty set, and alternate data collection is the second low confidence data set.
Preferably, described high confidence level data module of selection is for obtaining the distribution of described self-adapting data corresponding to described first-selected voice identification result data set as the distribution of described self-adapting data.
Preferably, described high confidence level data module of selection is used for selecting described another part self-adapting data based on KL distance by greedy algorithm, wherein, initial data set of selecting is made up of described low confidence data of carrying out artificial mark, and alternate data collection is described first-selected voice identification result data set.
Beneficial effect of the present invention is, first, goal task of the present invention distributes estimation and acoustics model self-adapting method and system based on candidate speech recognition result data set estimating target task distribution, not carry out based on artificial mark training data, so have ageing strong, the advantage of saving human cost; Moreover the degree of aliasing distribution of the goal task that goal task distribution estimation method of the present invention and system globe area obtain based on the voice identification result that recognition effect is poor, effectively can promote the performance of overall speech recognition system; Finally, acoustic model adaptive approach of the present invention and system are selected low confidence data and are manually marked and select the mode that high confidence level data carry out supplementing by adopting and realize efficiently selecting of self-adapting data.
Accompanying drawing explanation
Fig. 1 shows the process flow diagram of a kind of embodiment according to goal task distribution estimation method of the present invention;
Fig. 2 shows the process flow diagram of a kind of embodiment selected according to self-adapting data in acoustic model adaptive approach of the present invention;
Fig. 3 shows in Fig. 2 the process flow diagram of a kind of embodiment selecting low confidence data;
Fig. 4 shows in Fig. 2 the process flow diagram selecting high confidence level data a kind of embodiment as a supplement;
Fig. 5 shows a kind of frame principle figure implementing structure according to goal task distribution estimating system of the present invention;
Fig. 6 shows a kind of frame principle figure implementing structure according to acoustic model adaptive system of the present invention.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Being exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not limitation of the present invention being interpreted as.
As shown in Figure 1, goal task distribution estimation method of the present invention comprises:
Step S1: obtain the distribution of goal task corresponding to candidate speech recognition result data set, the coverage as described goal task distributes; At this, voice identification result can identify multiple voice identification result for concrete goal task, speech recognition system selects a highest voice identification result output of degree of confidence in doing from these voice identification results, selecting the voice identification result exported to be also referred to as is 1-Best voice identification result, the set be made up of all 1-Best voice identification results is then known as first-selected voice identification result data set, is namely to obtain goal task distribution based on first-selected voice identification result data set in prior art; Candidate speech recognition result data set is wherein the data set be made up of N-Best voice identification result stored in speech recognition system, N-Best voice identification result for a certain goal task is generally all voice identification results that speech recognition system obtains, certainly, this N-Best voice identification result also can be all recognition results of degree of confidence higher than setting threshold value, or is positioned at the voice identification result of top n from height to low sequence by degree of confidence.
Step S21: the degree of confidence obtaining the first-selected voice identification result of candidate speech recognition result data centralization is less than or equal to the voice identification result of the first confidence threshold value, forms the first low confidence data set; This first confidence threshold value can be selected according to practical application (namely concrete voice recognition tasks), at this, if the overall degree of confidence of voice identification result is on the low side, then can select the first less confidence threshold value, if the overall degree of confidence of voice identification result is higher, then can select the first larger confidence threshold value, under normal circumstances, the first confidence threshold value is selected in the scope of 0.5 ~ 0.8.
Step S22: obtain the distribution of goal task corresponding to the first low confidence data set, the degree of aliasing as described goal task distributes.
Step S3: the distribution of the coverage of goal task and degree of aliasing distribution are merged, obtains goal task distribution.
Obtain goal task in above-mentioned steps S1 can specifically comprise corresponding to the distribution of candidate speech recognition result data set:
Step S11: goal task is decomposed into each voice unit, this voice unit can be syllable, phoneme, also can be a word.
Step S12: calculate the frequency of occurrences of voice unit in each voice identification result of candidate speech recognition result data set, as first frequency of occurrences of voice unit; Wherein, a jth voice unit w jfirst frequency of occurrences computing method can be:
wherein, uii-th voice identification result of candidate speech recognition result data centralization, | U| is the number of candidate speech recognition result data centralization voice identification result, PP (w j| u i) be voice unit w jthe posterior probability occurred in i-th voice identification result.
Step S13: above-mentioned first frequency of occurrences obtaining all voice units in goal task as the distribution of goal task corresponding to candidate speech recognition result data set, by first frequency of occurrences set distribute as the coverage of goal task.
Certainly, goal task distribution estimation method of the present invention is also applicable to the distribution corresponding to candidate speech recognition result data set of the goal task that obtains based on other probabilistic methods.
In like manner, in above-mentioned steps S22, obtain described goal task and can specifically comprise corresponding to the distribution of described first low confidence data set:
Step S221: calculate the frequency of occurrences of voice unit in each voice identification result of the first low confidence data set, as second frequency of occurrences of institute's speech units.
Step S222: described second frequency of occurrences obtaining all voice units in goal task, as the distribution of goal task corresponding to the first low confidence data set.
Step S221 and step S222 represents by following formula:
P Con TD ( w j ) = Σ i = 1 | U | PP ( w j | w i ) × δ ( CM ( u i ) ≤ TH c ) Σ i = 1 | U | δ ( CM ( u i ) ≤ TH c ) , Wherein, for a jth voice unit w jsecond frequency of occurrences, δ (CM (u i)≤TH c) be an indicative function, as CM (u i)≤TH cfor true time functional value is 1, otherwise functional value is 0, CM (u i) be the degree of confidence of the first-selected recognition result that i-th voice identification result is corresponding, TH cfor above-mentioned first confidence threshold value, namely in computation process, pass through indicative function δ (CM (u i)≤TH c) choose the first low confidence data set.
On the basis of above-described embodiment, in above-mentioned steps S3, the coverage distribution of goal task and degree of aliasing distribution are merged, obtain goal task distribution and specifically comprise:
Step S31: to first frequency of occurrences of voice unit wj with second frequency of occurrences carry out linear weighted function, obtain voice unit w jfusion frequency of occurrences P tD(w j), concrete formula is:
P TD ( w j ) = α × P Occ TD ( w j ) + ( 1 - α ) × P Con TD ( w j ) , Wherein, α is weighting coefficient, and α span is (0,1), stress coverage distribution application scenario under, α span be [0.5,1), stress degree of aliasing distribute application scenario under, α span be (0,0.5].
Step S32: the fusion frequency of occurrences P obtaining all voice units in goal task tD(w j) distribute as goal task.
On the basis of above-mentioned goal task distribution estimation method, acoustic model adaptive approach of the present invention, comprising:
Step S1: obtain the distribution of goal task corresponding to candidate speech recognition result data set, the coverage as described goal task distributes.
Step S21: the degree of confidence obtaining the first-selected voice identification result of candidate speech recognition result data centralization is less than or equal to the voice identification result of the first confidence threshold value, forms the first low confidence data set.
Step S22: obtain the distribution of goal task corresponding to the first low confidence data set, the degree of aliasing as described goal task distributes.
Step S3: the distribution of the coverage of goal task and degree of aliasing distribution are merged, obtains goal task distribution.
Step S4: select self-adapting data from candidate speech recognition result data, makes the distribution of self-adapting data and described goal task distribute closest.
Step S5: utilize described self-adapting data to carry out the self-adaptative adjustment of model parameter to current acoustic model, obtains the acoustic model optimized.
The present invention also provides a kind of self-adapting data selection method efficiently for above-mentioned steps S4, specifically as shown in Figure 2, selects self-adapting data and comprise in step S4 from candidate speech recognition result data:
Step S41: the degree of confidence obtaining the first-selected voice identification result of candidate speech recognition result data centralization is less than or equal to the voice identification result of the second confidence threshold value, forms the second low confidence data set; At this, the selection principle of this second confidence threshold value and selection range under normal circumstances identical with the first confidence threshold value, but the two there is no relevance, namely the value of the first and second confidence threshold value is not limited to the value of the second and first confidence threshold value respectively yet.
Step S42: select low confidence data from the second low confidence data centralization, makes the goal task obtained in the distribution of low confidence data and step S3 distribute closest.
Step S43: manually mark low confidence data, makes the low confidence data of carrying out artificial mark become a part of self-adapting data.
Step S44: supplement from first-selected voice identification result data centralization and select another part self-adapting data, make the goal task obtained in the distribution of self-adapting data and step S3 distribute closest.At this, because first-selected voice identification result has higher degree of confidence, therefore can directly utilize without the need to manually marking.
As can be seen here, the low confidence data that the carrying out obtained in step S43 manually marks are carried out the work transformation matrix of acoustic model by the present invention together with another part self-adapting data obtained in step S44 (i.e. high confidence level data), artificial labeled data amount needed for this kind of method is less, and the self-adapting data picked out can be made to distribute close to goal task as much as possible.The low confidence data selected and the absolute quantity of high confidence level data can be determined according to concrete voice recognition tasks and available mark manpower, and relative scale therebetween can be controlled in 1:10 to 1:20 usually.
In order to improve the accurate and effective of the distribution of the low confidence data of acquisition, low confidence data are selected from the second low confidence data centralization in step S42, make the distribution of low confidence data and described goal task distribute closest to can be specially: distribution low confidence data being obtained described low confidence data as goal task, namely obtained the distribution of low confidence data according to the method obtaining goal task distribution by the mode merging coverage distribution and degree of aliasing and distribute.
In addition, low confidence data are selected from the second low confidence data centralization in step S42, the distribution of low confidence data and described goal task are distributed closest to adopting traditional being undertaken by greedy algorithm based on KL distance, and concrete steps as shown in Figure 3, comprising:
Step S421: be set to empty set by selecting data set, performs step S422 afterwards.
Step S422: judge whether the data volume having selected data centralization reaches default low confidence data volume, as otherwise perform step S423, then perform step S427 in this way.
Step S423: judge whether to travel through all voice identification results that alternate data is concentrated, as otherwise perform step S424, then perform step S426 in this way.
Step S424: concentrate next voice identification result of selection to put into from alternate data and select data centralization, perform step S425 afterwards.
Step S425: calculate and record the KL distance that the distribution of selecting data set and goal task distribute, recovering afterwards to have selected data set, then perform step S423.
Step S426: select to make KL select data apart from minimum voice identification result as this, put into and select data centralization, perform step S422 afterwards.
Step S427: end data is selected, export and select data set, these data having selected data centralization are the low confidence data picked out.
The present invention supplements from first-selected voice identification result data centralization and selects another part self-adapting data in step S44, the distribution of described self-adapting data and described goal task being distributed closest to by obtaining the distribution of self-adapting data corresponding to first-selected voice identification result data set as the distribution of self-adapting data, namely conventionally obtaining the distribution of self-adapting data.
In like manner, supplement from first-selected voice identification result data centralization in step S44 and select another part self-adapting data, the distribution of described self-adapting data and described goal task are distributed closest to also being undertaken by greedy algorithm based on KL distance, and concrete steps as shown in Figure 4, comprising:
Step S441: setting has been selected data set and has been made up of the low confidence data of the artificial mark of the carrying out obtained in step S43, and alternate data collection is first-selected voice identification result data set, performs step S442 afterwards.
Step S442: judge whether the data volume having selected data centralization reaches default high confidence level data volume, as otherwise perform step S443, then perform step S447 in this way.
Step S443: judge whether to travel through all voice identification results that alternate data is concentrated, as otherwise perform step S444, then perform step S446 in this way.
Step S444: concentrate next voice identification result of selection to put into from alternate data and select data centralization, perform step S445 afterwards.
Step S445: calculate and record the KL distance that the distribution of selecting data set and goal task distribute, recovering afterwards to have selected data set, then perform step S443.
Step S446: select to make KL select data apart from minimum voice identification result as this, put into and select data centralization, perform step S442 afterwards.
Step S447: end data is selected, export and select data set, these data having selected data centralization are the self-adapting data picked out.
Corresponding with above-mentioned goal task distribution estimation method, as shown in Figure 5, goal task estimating system of the present invention comprises coverage distributed acquisition module 1, first low confidence data set acquisition module 2, degree of aliasing distributed acquisition module 3 and goal task distributed acquisition module 4, this coverage distributed acquisition module 1 is for obtaining the distribution of goal task corresponding to candidate speech recognition result data set, and the coverage as goal task distributes; This first low confidence data set acquisition module 2 is less than or equal to the voice identification result of the first confidence threshold value for the degree of confidence obtaining the first-selected voice identification result of candidate speech recognition result data centralization, forms the first low confidence data set; This degree of aliasing distributed acquisition module 3 is for obtaining the distribution of goal task corresponding to the first low confidence data set, and the degree of aliasing as goal task distributes; This goal task distributed acquisition module 4, for merging the distribution of the coverage of described goal task and degree of aliasing distribution, obtains goal task distribution.
This coverage distributed acquisition module can comprise resolving cell, the first frequency of occurrences computing unit and coverage distribution statistics unit further, and this resolving cell is used for goal task to be decomposed into each voice unit; This first frequency of occurrences computing unit for calculating the frequency of occurrences of voice unit in each voice identification result of candidate speech recognition result data set, as first frequency of occurrences of voice unit; This coverage distribution statistics unit for obtaining first frequency of occurrences of all voice units in goal task, as the distribution of goal task corresponding to candidate speech recognition result data set.
In like manner, this degree of aliasing distributed acquisition module can comprise above-mentioned resolving cell, the second frequency of occurrences computing unit and degree of aliasing distribution statistics unit further, this second frequency of occurrences computing unit for calculating the frequency of occurrences of voice unit in each voice identification result of the first low confidence data set, as second frequency of occurrences of voice unit; This degree of aliasing distribution statistics unit for obtaining described second frequency of occurrences of all voice units in goal task, as the distribution of goal task corresponding to the first low confidence data set.
Based on the concrete structure of coverage distributed acquisition module and degree of aliasing distributed acquisition module, this goal task distributed acquisition module can comprise integrated unit and goal task distribution statistics unit further, this integrated unit is used for carrying out linear weighted function to first frequency of occurrences of voice unit and second frequency of occurrences, obtains the fusion frequency of occurrences of voice unit; This goal task distribution statistics unit exports as goal task distribution for the fusion frequency of occurrences obtaining all voice units in described goal task.
On the basis of above-mentioned goal task distribution estimating system, as shown in Figure 6, acoustic model adaptive system of the present invention comprises coverage distributed acquisition module 1, first low confidence data set acquisition module 2, degree of aliasing distributed acquisition module 3, goal task distributed acquisition module 4, self-adapting data Choosing module 5 and acoustics model optimization module 6, this coverage distributed acquisition module 1 is for obtaining the distribution of goal task corresponding to candidate speech recognition result data set, and the coverage as goal task distributes; This first low confidence data set acquisition module 2 is less than or equal to the voice identification result of the first confidence threshold value for the degree of confidence obtaining the first-selected voice identification result of candidate speech recognition result data centralization, forms the first low confidence data set; This degree of aliasing distributed acquisition module 3 is for obtaining the distribution of goal task corresponding to the first low confidence data set, and the degree of aliasing as goal task distributes; This goal task distributed acquisition module 4, for merging the distribution of the coverage of described goal task and degree of aliasing distribution, obtains goal task distribution; This self-adapting data Choosing module 5, for selecting self-adapting data from candidate speech recognition result data, makes the distribution of self-adapting data and goal task distribute closest; This acoustic model is optimized module 6 and is carried out the self-adaptative adjustment of model parameter for utilizing self-adapting data to current acoustic model, obtains the acoustic model optimized.
This self-adapting data Choosing module can comprise the second low confidence data set acquiring unit, low confidence data module of selection, artificial mark unit and high confidence level data module of selection further, this the second low confidence data set acquiring unit is less than or equal to the voice identification result of the second confidence threshold value for the degree of confidence obtaining the first-selected voice identification result of described candidate speech recognition result data centralization, forms the second low confidence data set; This low confidence data module of selection is used for selecting low confidence data from the second low confidence data centralization, makes the distribution of low confidence data and described goal task distribute closest; Manually mark unit is for manually marking low confidence data for this, and making to carry out the artificial low confidence data marked becomes a part of self-adapting data; This high confidence level data module of selection is used for supplementing from first-selected voice identification result data centralization selecting another part self-adapting data, makes the distribution of self-adapting data and described goal task distribute closest.
Above-mentioned low confidence data module of selection can be used for low confidence data to input in goal task distribution estimating system, to obtain the distribution of described low confidence data as goal task.Further, this low confidence data module of selection can be used for selecting low confidence data based on KL distance by greedy algorithm, and wherein initial data set of selecting is empty set, and alternate data collection is the second low confidence data set.
Above-mentioned high confidence level data module of selection can be used for obtaining the distribution of self-adapting data corresponding to first-selected voice identification result data set as the distribution of self-adapting data.Further, this high confidence level data module of selection can be used for selecting described another part self-adapting data based on KL distance by greedy algorithm, wherein, initial data set of selecting is made up of described low confidence data of carrying out artificial mark, and alternate data collection is described first-selected voice identification result data set.
Structure of the present invention, feature and action effect is described in detail above according to graphic shown embodiment; the foregoing is only preferred embodiment of the present invention; but the present invention does not limit practical range with shown in drawing; every change done according to conception of the present invention; or be revised as the Equivalent embodiments of equivalent variations; do not exceed yet instructions with diagram contain spiritual time, all should in protection scope of the present invention.

Claims (20)

1. a goal task distribution estimation method, is characterized in that, comprising:
Obtain the distribution of described goal task corresponding to candidate speech recognition result data set, the coverage as described goal task distributes;
The degree of confidence obtaining the first-selected voice identification result of described candidate speech recognition result data centralization is less than or equal to the voice identification result of the first confidence threshold value, forms the first low confidence data set;
Obtain the distribution of described goal task corresponding to described first low confidence data set, the degree of aliasing as described goal task distributes;
The distribution of the coverage of described goal task and degree of aliasing distribution are merged, obtains the distribution of described goal task.
2. goal task distribution estimation method according to claim 1, is characterized in that, the described goal task of described acquisition comprises corresponding to the distribution of candidate speech recognition result data set:
Described goal task is decomposed into each voice unit;
Calculate the frequency of occurrences of institute's speech units in each voice identification result of described candidate speech recognition result data set, as first frequency of occurrences of institute's speech units;
Obtain described first frequency of occurrences of all voice units in described goal task, as the distribution of described goal task corresponding to candidate speech recognition result data set.
3. goal task distribution estimation method according to claim 2, is characterized in that, the described goal task of described acquisition comprises corresponding to the distribution of described first low confidence data set;
Calculate the frequency of occurrences of institute's speech units in each voice identification result of described first low confidence data set, as second frequency of occurrences of institute's speech units;
Obtain described second frequency of occurrences of all voice units in described goal task, as the distribution of described goal task corresponding to described first low confidence data set.
4. goal task distribution estimation method according to claim 3, is characterized in that, the described distribution of the coverage to described goal task and degree of aliasing distribution are merged, and obtains the distribution of described goal task and comprises:
Linear weighted function is carried out to first frequency of occurrences of institute's speech units and second frequency of occurrences, obtains the fusion frequency of occurrences of institute's speech units;
The fusion frequency of occurrences obtaining all voice units in described goal task distributes as described goal task.
5. an acoustic model adaptive approach, is characterized in that, comprising:
Goal task distribution estimation method according to any one of claim 1 to 4 obtains goal task distribution;
From candidate speech recognition result data, select self-adapting data, make the distribution of self-adapting data and described goal task distribute closest;
Utilize described self-adapting data to carry out the self-adaptative adjustment of model parameter to current acoustic model, obtain the acoustic model optimized.
6. acoustic model adaptive approach according to claim 5, is characterized in that, describedly from candidate speech recognition result data, selects self-adapting data comprise:
The degree of confidence obtaining the first-selected voice identification result of described candidate speech recognition result data centralization is less than or equal to the voice identification result of the second confidence threshold value, forms the second low confidence data set;
Select low confidence data from described second low confidence data centralization, make the distribution of low confidence data and described goal task distribute closest;
Described low confidence data are manually marked, makes the low confidence data of carrying out artificial mark become a part of self-adapting data;
Supplement from first-selected voice identification result data centralization and select another part self-adapting data, make the distribution of described self-adapting data and described goal task distribute closest.
7. acoustic model adaptive approach according to claim 6, is characterized in that, selects low confidence data from described second low confidence data centralization, makes the distribution of low confidence data and described goal task distribute closest to comprising:
Described low confidence data are obtained the distribution of described low confidence data as described goal task.
8. acoustic model adaptive approach according to claim 7, is characterized in that, selects low confidence data from described second low confidence data centralization, makes the distribution of low confidence data and described goal task distribute closest to also comprising:
Select described low confidence data based on KL distance by greedy algorithm, wherein initial data set of selecting is empty set, and alternate data collection is the second low confidence data set.
9. acoustic model adaptive approach according to claim 6, is characterized in that, supplements select another part self-adapting data from first-selected voice identification result data centralization, makes the distribution of described self-adapting data and described goal task distribute closest to comprising:
Obtain the distribution of described self-adapting data corresponding to described first-selected voice identification result data set as the distribution of described self-adapting data.
10. acoustic model adaptive approach according to claim 9, is characterized in that, supplements select another part self-adapting data from first-selected voice identification result data centralization, makes the distribution of described self-adapting data and described goal task distribute closest to also comprising:
Select described another part self-adapting data based on KL distance by greedy algorithm, wherein, initial data set of selecting is made up of described low confidence data of carrying out artificial mark, and alternate data collection is described first-selected voice identification result data set.
11. 1 kinds of goal task distribution estimating systems, is characterized in that, comprising:
Coverage distributed acquisition module, for obtaining the distribution of described goal task corresponding to candidate speech recognition result data set, the coverage as described goal task distributes;
First low confidence data set acquisition module, the degree of confidence for obtaining the first-selected voice identification result of described candidate speech recognition result data centralization is less than or equal to the voice identification result of the first confidence threshold value, forms the first low confidence data set;
Degree of aliasing distributed acquisition module, for obtaining the distribution of described goal task corresponding to described first low confidence data set, the degree of aliasing as described goal task distributes; And,
Goal task distributed acquisition module, for merging the distribution of the coverage of described goal task and degree of aliasing distribution, obtains the distribution of described goal task.
12. goal task distribution estimating systems according to claim 11, it is characterized in that, described coverage distributed acquisition module comprises:
Resolving cell, for being decomposed into each voice unit by described goal task;
First frequency of occurrences computing unit, the frequency of occurrences of speech units in each voice identification result of described candidate speech recognition result data set for calculating, as first frequency of occurrences of institute's speech units; And,
Coverage distribution statistics unit, for obtaining described first frequency of occurrences of all voice units in described goal task, as the distribution of described goal task corresponding to candidate speech recognition result data set.
13. goal task distribution estimating systems according to claim 12, it is characterized in that, described degree of aliasing distributed acquisition module comprises:
Described resolving cell;
Second frequency of occurrences computing unit, the frequency of occurrences of speech units in each voice identification result of described first low confidence data set for calculating, as second frequency of occurrences of institute's speech units; And,
Degree of aliasing distribution statistics unit, for obtaining described second frequency of occurrences of all voice units in described goal task, as the distribution of described goal task corresponding to described first low confidence data set.
14. goal task distribution estimating systems according to claim 13, it is characterized in that, described goal task distributed acquisition module comprises:
Integrated unit, for carrying out linear weighted function to first frequency of occurrences of institute's speech units and second frequency of occurrences, obtains the fusion frequency of occurrences of institute's speech units;
Goal task distribution statistics unit, distributes as described goal task for the fusion frequency of occurrences obtaining all voice units in described goal task.
15. 1 kinds of acoustic model adaptive systems, is characterized in that, comprising:
According to claim 11 to the goal task distribution estimating system according to any one of 14, for obtaining goal task distribution;
Self-adapting data Choosing module, for selecting self-adapting data from candidate speech recognition result data, makes the distribution of self-adapting data and described goal task distribute closest; And,
Acoustic model optimizes module, for utilizing described self-adapting data to carry out the self-adaptative adjustment of model parameter to current acoustic model, obtains the acoustic model optimized.
16. acoustic model adaptive system according to claim 15, is characterized in that, described self-adapting data Choosing module comprises:
Second low confidence data set acquiring unit, the degree of confidence for obtaining the first-selected voice identification result of described candidate speech recognition result data centralization is less than or equal to the voice identification result of the second confidence threshold value, forms the second low confidence data set;
Low confidence data module of selection, for selecting low confidence data from described second low confidence data centralization, makes the distribution of low confidence data and described goal task distribute closest;
Artificial mark unit, for manually marking described low confidence data, makes the low confidence data of carrying out artificial mark become a part of self-adapting data;
High confidence level data module of selection, selecting another part self-adapting data for supplementing from first-selected voice identification result data centralization, making the distribution of described self-adapting data and described goal task distribute closest.
17. acoustic model adaptive systems according to claim 16, it is characterized in that, described low confidence data module of selection is used for described low confidence data to input to described goal task distribution estimating system, to obtain the distribution of described low confidence data as described goal task.
18. acoustic model adaptive systems according to claim 17, it is characterized in that, described low confidence data module of selection is used for selecting described low confidence data based on KL distance by greedy algorithm, wherein initial data set of selecting is empty set, and alternate data collection is the second low confidence data set.
19. acoustic model adaptive systems according to claim 16, it is characterized in that, described high confidence level data module of selection is for obtaining the distribution of described self-adapting data corresponding to described first-selected voice identification result data set as the distribution of described self-adapting data.
20. acoustic model adaptive systems according to claim 19, it is characterized in that, described high confidence level data module of selection is used for selecting described another part self-adapting data based on KL distance by greedy algorithm, wherein, initial data set of selecting is made up of described low confidence data of carrying out artificial mark, and alternate data collection is described first-selected voice identification result data set.
CN201410007278.1A 2014-01-07 2014-01-07 Objective task distribution estimation method and system and acoustic model self-adaptive method and system Pending CN104766611A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410007278.1A CN104766611A (en) 2014-01-07 2014-01-07 Objective task distribution estimation method and system and acoustic model self-adaptive method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410007278.1A CN104766611A (en) 2014-01-07 2014-01-07 Objective task distribution estimation method and system and acoustic model self-adaptive method and system

Publications (1)

Publication Number Publication Date
CN104766611A true CN104766611A (en) 2015-07-08

Family

ID=53648394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410007278.1A Pending CN104766611A (en) 2014-01-07 2014-01-07 Objective task distribution estimation method and system and acoustic model self-adaptive method and system

Country Status (1)

Country Link
CN (1) CN104766611A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108735199A (en) * 2018-04-17 2018-11-02 北京声智科技有限公司 A kind of adaptive training method and system of acoustic model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178896A (en) * 2007-12-06 2008-05-14 安徽科大讯飞信息科技股份有限公司 Unit selection voice synthetic method based on acoustics statistical model
CN101315733A (en) * 2008-07-17 2008-12-03 安徽科大讯飞信息科技股份有限公司 Self-adapting method aiming at computer language learning system pronunciation evaluation
CN101464896A (en) * 2009-01-23 2009-06-24 安徽科大讯飞信息科技股份有限公司 Voice fuzzy retrieval method and apparatus
KR20110010233A (en) * 2009-07-24 2011-02-01 고려대학교 산학협력단 Apparatus and method for speaker adaptation by evolutional learning, and speech recognition system using thereof
US20130054224A1 (en) * 2011-08-30 2013-02-28 Dublin City University Method and system for enhancing text alignment between a source language and a target language during statistical machine translation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178896A (en) * 2007-12-06 2008-05-14 安徽科大讯飞信息科技股份有限公司 Unit selection voice synthetic method based on acoustics statistical model
CN101315733A (en) * 2008-07-17 2008-12-03 安徽科大讯飞信息科技股份有限公司 Self-adapting method aiming at computer language learning system pronunciation evaluation
CN101464896A (en) * 2009-01-23 2009-06-24 安徽科大讯飞信息科技股份有限公司 Voice fuzzy retrieval method and apparatus
KR20110010233A (en) * 2009-07-24 2011-02-01 고려대학교 산학협력단 Apparatus and method for speaker adaptation by evolutional learning, and speech recognition system using thereof
US20130054224A1 (en) * 2011-08-30 2013-02-28 Dublin City University Method and system for enhancing text alignment between a source language and a target language during statistical machine translation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JI WU ET AL.: "《An Active Learning Approach to Task Adaptation》", 《12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION》 *
XIAODONG CUI ET AL.: "《Efficient adaptation text design based on the Kullback-Leibler measure》", 《ACOUSTICS,SPEECH,AND SIGNAL PROCESSING(ICASSP 2002)》 *
Z.H.HE ET AL.: "《A Combined Task Analysis Method for Data Selection in Mandarin Isolated Word Recognition System》", 《INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING(ISCSLP 2008)》 *
贺志阳等: "《基于任务分析的自适应数据挑选》", 《第十届全国人机语音通讯学术会议暨国际语音语言处理研讨会》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108735199A (en) * 2018-04-17 2018-11-02 北京声智科技有限公司 A kind of adaptive training method and system of acoustic model
CN108735199B (en) * 2018-04-17 2021-05-28 北京声智科技有限公司 Self-adaptive training method and system of acoustic model

Similar Documents

Publication Publication Date Title
CN110443288B (en) Trajectory similarity calculation method based on sequencing learning
EP1679694B1 (en) Confidence score for a spoken dialog system
EP2801091B1 (en) Method, apparatus and computer program product for joint use of speech and text-based features for sentiment detection
CN103674012B (en) Speech customization method and its device, audio recognition method and its device
JP4245617B2 (en) Feature amount correction apparatus, feature amount correction method, and feature amount correction program
CN103295575B (en) A kind of audio recognition method and client
CN106653031A (en) Voice wake-up method and voice interaction device
CN106157953A (en) continuous speech recognition method and system
EP1482469A3 (en) System, method and device for language education through a voice portal server
CN101710490A (en) Method and device for compensating noise for voice assessment
WO2008089362B1 (en) Point of reference directions
CN109916423A (en) Intelligent navigation equipment and its route planning method and automatic driving vehicle
CN105225665A (en) A kind of audio recognition method and speech recognition equipment
CN108197669B (en) Feature training method and device of convolutional neural network
CN104599002B (en) Method and equipment for predicting order value
CN103177721A (en) Voice recognition method and system
CN109616105A (en) A kind of noisy speech recognition methods based on transfer learning
CN104485108A (en) Noise and speaker combined compensation method based on multi-speaker model
CN102467542A (en) Method and device for acquiring user similarity as well as user recommendation method and system
CN110807358A (en) Big data positioning and checking system based on peripheral information
US20230047666A1 (en) Multimodal speech recognition method and system, and computer-readable storage medium
WO2009102526A4 (en) Methods for the identification of bubble point pressure
CN101447183A (en) Processing method of high-performance confidence level applied to speech recognition system
CN104766611A (en) Objective task distribution estimation method and system and acoustic model self-adaptive method and system
CN107121661B (en) Positioning method, device and system and server

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant after: Iflytek Co., Ltd.

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant before: Anhui USTC iFLYTEK Co., Ltd.

COR Change of bibliographic data
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150708