CN104978971A

CN104978971A - Oral evaluation method and system

Info

Publication number: CN104978971A
Application number: CN201410139305.0A
Authority: CN
Inventors: 陈进; 刘丹; 魏思; 胡国平; 胡郁; 刘庆峰
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2014-04-08
Filing date: 2014-04-08
Publication date: 2015-10-14
Anticipated expiration: 2034-04-08
Also published as: CN104978971B

Abstract

The invention relates to the voice signal processing field and discloses an oral evaluation method and system. The method comprises the following steps of receiving voice data to be evaluated; scoring the voice data through a first system to acquire a first scoring result; outputting the first scoring result if the first scoring result satisfies a first condition; otherwise, scoring the voice data through a second system to acquire a second scoring result; mixing the first scoring result and the second scoring result to acquire a first mixed scoring result if the second scoring result satisfies a second condition and outputting the first mixed scoring result; otherwise scoring the voice data through a third system to acquire a third scoring result; mixing the first scoring result, the second scoring result and the third scoring result to acquire a second mixed scoring result and outputting the second mixed scoring result. By means of the method provided in the invention, the system operation efficiency is greatly improved on the premise that the evaluating precision is guaranteed.

Description

A kind of oral evaluation method and system

Technical field

The present invention relates to field of voice signal, be specifically related to a kind of oral evaluation method and system.

Background technology

As the important medium of interpersonal communication, conversational language occupies extremely important status in real life.Along with the aggravation of socioeconomic development and the trend of globalization, people propose more and more higher requirement to the objectivity of the efficiency of language learning and language assessment, fairness and scale test.Traditional artificial spoken language proficiency evaluating method makes Faculty and Students at instructional blocks of time and is spatially very limited, and also there is the gap on many hardware and imbalance in teacher strength, teaching place, funds expenditure etc.; Artificial evaluation and test cannot avoid the individual deviation of evaluator self, thus can not ensure the unification of standards of grading, sometimes even accurately cannot reflect the true horizon of measured; And for extensive oral test, then need a large amount of human and material resources and financial support, limit assessment test that is regular, scale.For this reason, industry have developed some language teachings and evaluating system in succession.

Stable not for scoring artificial under large working strength, problem fair even not, spoken automatic judgment becomes trend of the times.Legacy system adopts and develops different language teachings and points-scoring system based on speech recognition technology, different points-scoring systems is each tool advantage because applied environment is different, for reducing the risk of machine scoring, existing oral evaluation system often adopts multisystem fusion method, the multiple different points-scoring system of these class methods is evaluated and tested separately, subsequently the score more runs data of obtained multisystem are merged, as the best result in mark as described in selecting, on average grade, finally marked.Legacy system is marked by comprehensive multisystem, high score data are commented low risk by effective reduction oral evaluation system, but exacerbate calculation resources consumption and evaluation and test time overhead, such as in the integration program adopting N number of system, its oral evaluation is always consuming time will be N times of single system.Especially, under the limited applied environment of calculation resources, evaluation and test efficiency is in urgent need to be improved.

Summary of the invention

The embodiment of the present invention provides a kind of oral evaluation method and system, on the basis ensureing evaluation and test performance, improves the operation efficiency of system.

For this reason, the invention provides following technical scheme:

A kind of oral evaluation method, comprising:

Receive speech data to be evaluated;

Utilize the first system to mark to described speech data, obtain the first appraisal result;

If described first appraisal result meets first condition, then export described first appraisal result;

Otherwise, utilize second system to mark to described speech data, obtain the second appraisal result;

If described second appraisal result meets second condition, then described first appraisal result and described second appraisal result are merged, obtain the first fusion appraisal result, then export described first and merge appraisal result;

Otherwise, utilize the 3rd system to mark to described speech data, obtain the 3rd appraisal result;

Described first appraisal result, the second appraisal result and the 3rd appraisal result are merged, obtains the second fusion appraisal result, then export described second and merge appraisal result.

Preferably, described method also comprises:

Pre-set described the first system, second system and the 3rd system; Or

The first system, second system and the 3rd system described in real-time selection.

Preferably, pre-set the first system, second system and the 3rd system described in comprise:

According to the recognition confidence in dependence test set and evaluation result, the system of selectivity optimum is as the first system;

Select with the system of described the first system complementation as second system.

Preferably, described in described real-time selection, the first system, second system and the 3rd system comprise:

According to system running environment or speech data feature, real-time selection the first system;

Preferably, described method also comprises:

According to different application scenarioss and demand, determine described first condition and second condition.

Preferably, described first condition is that described first appraisal result is higher than setting score value; Or described first condition is that the recognition confidence of described the first system is higher than setting thresholding.

Preferably, described second condition is that the difference of described first appraisal result and described second appraisal result is less than setting difference.

A kind of oral evaluation system, comprising:

Receiver module, for receiving speech data to be evaluated;

The first system module, for marking to described speech data, obtains the first appraisal result;

Judge module, for judging whether described first appraisal result meets first condition, if so, then sends described first appraisal result to output module, exports described first appraisal result to make described output module; Otherwise notice second system module is marked to described speech data;

Described second system module, for marking to described speech data, obtains the second appraisal result;

Described judge module, also for judging whether described second appraisal result meets second condition, if so, then notifies that the first Fusion Module merges described first appraisal result and the second appraisal result; Otherwise, notify that the 3rd system module is marked to described speech data;

First Fusion Module, sends output module to for merging described first appraisal result and the second appraisal result and merging appraisal result by obtain first, exports described first merge appraisal result to make described output module;

Described 3rd system module, for marking to described speech data, obtains the 3rd appraisal result;

Second Fusion Module, sends described output module to for merging described first appraisal result, the second appraisal result and the 3rd appraisal result and merging appraisal result by obtain second, exports described second merge appraisal result to make described output module;

Described output module, merges appraisal result or the second fusion appraisal result for exporting the first appraisal result, first.

Preferably, described system also comprises:

Module is set, for pre-setting described the first system, second system and the 3rd system; Or

Select module, for the first system, second system and the 3rd system described in real-time selection.

Preferably, describedly arrange module, specifically for according to the recognition confidence in dependence test set and evaluation result, the system of selectivity optimum as the first system, and is selected with the system of described the first system complementation as second system.

Preferably, described selection module, specifically for according to system running environment or speech data feature, real-time selection the first system, and select with the system of described the first system complementation as second system.

Preferably, described system also comprises:

Condition determination module, for according to different application scenarioss and demand, determines described first condition and second condition.

The oral evaluation method and system that the embodiment of the present invention provides, merge on the basis of oral evaluation in multisystem, add appraisal result arbitration functions, when appraisal result does not meet the demands, other evaluating system is selected again to mark further, and the multiple appraisal result obtained is merged, obtain final appraisal result.Therefore, on the basis ensureing evaluation and test performance, evaluation and test efficiency is greatly improved.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment below, apparently, the accompanying drawing that the following describes is only some embodiments recorded in the present invention, for those of ordinary skill in the art, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the process flow diagram of embodiment of the present invention oral evaluation method;

Fig. 2 is a kind of structural representation of embodiment of the present invention oral evaluation system.

Embodiment

In order to the scheme making those skilled in the art person understand the embodiment of the present invention better, below in conjunction with drawings and embodiments, the embodiment of the present invention is described in further detail.

Merge the lower problem of oral evaluation system evaluation efficiency for multisystem in prior art, the embodiment of the present invention provides a kind of oral evaluation method and system, first utilizes the first system to mark to speech data to be evaluated, obtains the first appraisal result; Then judge whether the first appraisal result meets first condition, if so, then export described first appraisal result, otherwise, utilize second system to mark to described speech data, obtain the second appraisal result; Judge whether the second appraisal result meets second condition subsequently, if, then described first appraisal result and described second appraisal result are merged, obtain the first fusion appraisal result and export as final appraisal result, otherwise, utilize the 3rd system to mark to described speech data, obtain the 3rd appraisal result; Finally described first appraisal result, the second appraisal result and the 3rd appraisal result are merged, obtain the second fusion appraisal result and export as final appraisal result.

As shown in Figure 1, be the process flow diagram of embodiment of the present invention oral evaluation method, comprise the following steps:

Step 101, receives speech data to be evaluated.

Step 102, utilizes the first system to mark to described speech data, obtains the first appraisal result.

Existing multisystem merges oral evaluation system and merges all single system appraisal result, therefore, does not have particular requirement to the sequencing that single system runs.This evaluating method is from the angle improving system operations efficiency, and in order to generation evaluation result fast as far as possible, the evaluation and test order of its each single system is quite important.The first system should be able to solve most evaluation and test problem more exactly.

Particularly, the system of selection for the first system comprises following two kinds:

1) pre-set, namely before system cloud gray model, just set the first system, no longer change in system cloud gray model.For the multiple evaluating systems pre-set, can select wherein any one evaluating system as the first system, also can according to the recognition confidence in dependence test set and evaluation result, select to have the system of optimal performance or most robust as the first system.

2) real-time selection, system can according to current operating environment, or according to the feature etc. of speech data to be evaluated, real-time selection has the system of optimal representation as the first system.Such as: DNN(DeepNeural Network, deep-neural-network) there is stronger expression ability for training data, add low signal-to-noise ratio training data and the DNN recognition system obtained has certain anti-noise ability, if the running environment of current system is relatively poor, or data SNR to be evaluated is on the low side, can the system that identifies based on DNN of real-time selection as the first system, otherwise select the system that identifies based on BN (Bottle-Neck) or based on GMM-HMM(Gaussian Mixture Model-Hidden Markov Model, gauss hybrid models-Hidden Markov Model (HMM)) system that identifies is as the first system.

It should be noted that, different with the foundation of real-time selection two kinds of method choice the first systems owing to pre-setting, so the scene be suitable for is also different.For the application scenarios of known speech evaluating data division characteristic, system current operating situation, the method for real-time selection the first system, on the basis ensureing evaluation and test effect, can obtain final evaluation result quickly; For the application scenarios of the evaluation result on existing dependence test collection, the method pre-setting the first system (such as based on the system that Bottle Neck identifies) has more advantage.

Step 103, judges whether the first appraisal result meets first condition.

Described first condition can be determined according to different application scenarioss and demand, such as, can be that described first appraisal result is higher than setting score value; Or the recognition confidence of described the first system is higher than setting thresholding.

By judging whether the first appraisal result meets first condition, high score data effectively can be avoided to comment low problem, and then improve evaluation and test precision.

In the present embodiment, in order to reduce, high score data are commented low risk, whether first condition can be chosen as the first appraisal result higher than 80 points (centesimal system); In order to improve the efficiency that multisystem merges, first condition can be chosen as the recognition confidence of the first system simply whether higher than setting thresholding.

Further, if the first appraisal result meets first condition, then perform step 109, export described first appraisal result; Otherwise, perform step 104.

Step 104, utilizes second system to mark to described speech data, obtains the second appraisal result.

Different speech recognition systems is based on different acoustic features, as based on PLP(PerceptualLinear Predictive, perception linear prediction) acoustic model of feature, or adopt different acoustic models, as the neural network acoustic model based on DNN, even adopt different decoding processes such as Viterbi to search for, speech data is decoded.

Because different speech recognition systems has different decoding advantages, often have certain complementarity between its recognition result, therefore second system should have corresponding supplementary function to the first system, to improve the accuracy of evaluation and test.

System of selection for second system pre-sets and real-time selection two kinds equally, specifically with reference to the system of selection of the first system, no longer can describe in detail herein.

Step 105, judges whether described second appraisal result meets second condition.

Described second condition can be determined according to different application scenarioss and demand equally, such as, can be that the difference of described first appraisal result and described second appraisal result is less than and sets difference etc.

By judging whether the second appraisal result meets second condition, individual voice evaluating system exception or the abnormal score abnormal conditions caused of evaluating object can be reduced to a certain extent, and then improve evaluation and test precision.

In the present embodiment; in order to reduce the abnormal score abnormal conditions caused of individual voice evaluating system exception or evaluating object, the difference that described second condition can be set to described first appraisal result and described second appraisal result is less than and sets difference (such as 4%).

It should be noted that, in actual applications, the corresponding speech evaluating system being applicable to the examination of this type can also be loaded according to difference type of taking an examination, by comparing the evaluation result of same type evaluating system to different evaluating object, judge system operation situation in advance, the abnormal score abnormal conditions caused of removal system, thus the operation efficiency of further raising system and evaluation and test precision.

Further, if the second appraisal result meets second condition, then perform step 106; Otherwise, perform step 107.

Step 106, merges described first appraisal result and the second appraisal result, obtains the first fusion appraisal result, then performs step 109, exports first and merges appraisal result.

In the present embodiment, in order to effectively reduce, high score data are commented the low appraisal result abnormal conditions caused, multiple amalgamation mode can be taked, such as, get appraisal result higher in the first appraisal result and the second appraisal result and merge appraisal result as first, other amalgamation mode can certainly be adopted, this embodiment of the present invention is not limited.

In practical application, for having run the pre-evaluation and test data of complementary stronger the first system (such as Bottle Neck) and second system (such as GMM-HMM), if the appraisal result of the first system and second system is more or less the same (such as difference is no more than 4%), then do not need operation the 3rd system, directly described first appraisal result and the second appraisal result are merged, obtain the first fusion appraisal result; And it can be used as final appraisal result to export.

Step 107, utilizes the 3rd system to mark to described speech data, obtains the 3rd appraisal result.

Described 3rd system has certain complementarity (integrality, accuracy, fluency etc.) to described the first system and second system.

Because different speech evaluating systems have employed different recognizers or acoustic model, often have different recognition results, corresponding evaluation and test score is also not quite similar, and also there is certain complementarity between appraisal result.

3rd system alternatives is pre-set and real-time selection two kinds equally, specifically with reference to the system of selection of the first system, no longer can describe in detail herein.

Step 108, merges described first appraisal result, the second appraisal result and the 3rd appraisal result, obtains the second fusion appraisal result.

In order to effectively reduce, high score data are commented the low appraisal result abnormal conditions caused, the highest appraisal result of getting in the first appraisal result, the second appraisal result and the 3rd appraisal result merges appraisal result as second.

It should be noted that, in practical application, under different application scenarioss, different system appraisal result fusion methods can be selected.Such as, for formal examination, in order to reduce, high score data are commented low risk, the method that system score merges is get the mxm. of the first appraisal result, the second appraisal result and the 3rd appraisal result.And for other application scenarios, such as mock examination, machine indirect labor scoring etc., in order to the comprehensive condition of conservative estimation individuality, the mean value that the method that system score merges can choose the first appraisal result, the second appraisal result and the 3rd appraisal result merges appraisal result as second.

Step 109, exports appraisal result.

The oral evaluation method that the embodiment of the present invention provides, merge on the basis of oral evaluation in multisystem, add appraisal result arbitration functions, when appraisal result does not meet the demands, other evaluating system is selected again to mark further, and the multiple appraisal result obtained is merged, obtain final appraisal result.Not only ensure that the accuracy of evaluation result, and drastically increase evaluation and test efficiency.

Correspondingly, the embodiment of the present invention also provides a kind of oral evaluation system, as shown in Figure 2, is a kind of structural representation of this system.

In this embodiment, described system comprises: receiver module 201, the first system module 202, judge module 203, second system module 204, first Fusion Module the 205, three system module 206, second Fusion Module 207, and output module 208.Wherein:

Receiver module 201, for receiving speech data to be evaluated.

The first system module 202, for marking to described speech data, obtains the first appraisal result.

Judge module 203, for judging whether described first appraisal result meets first condition, if so, then sending described first appraisal result to output module 208, exporting described first appraisal result to make described output module 208; Otherwise notice second system module 204 is marked to described speech data.

Above-mentioned first condition can be determined according to different application scenarioss and demand, such as, can be that described first appraisal result is higher than setting score value; Or the recognition confidence of described the first system is higher than setting thresholding etc.

Judge module 203, by judging whether the first appraisal result meets first condition, can effectively be avoided high score data to comment low problem, and then improves test and appraisal precision.

In the present embodiment, in order to reduce, high score data are commented low risk, whether first condition can be chosen as the first appraisal result higher than 80 points (centesimal system); In order to improve the efficiency that multisystem merges, whether first condition simply can be chosen as the recognition confidence of the first system module higher than setting thresholding.

Further, if the first appraisal result meets first condition, then judge module 203 notifies that output module 208 exports described first appraisal result; Otherwise notice second system module 204 is evaluated and tested described speech data.

Second system module 204, for marking to described speech data, obtains the second appraisal result.

Further, judge module 203, also for judging whether described second appraisal result meets second condition, if so, then notifies that the first Fusion Module 205 merges described first appraisal result and the second appraisal result; Otherwise, notify that the 3rd system module 206 is marked to described speech data.

Equally, above-mentioned second condition also can be determined according to different application scenarioss and demand, such as, can be that the difference of described first appraisal result and described second appraisal result is less than and sets difference etc.

First Fusion Module 205, sends output module 208 to for merging described first appraisal result and the second appraisal result and merging appraisal result by obtain first, exports described first merge appraisal result to make described output module 208.

The concrete amalgamation mode of the first Fusion Module 205 to the first appraisal result and the second appraisal result can have multiple, such as gets appraisal result higher in the first appraisal result and the second appraisal result and merges appraisal result as first, or weighted mean etc.In practical application, for having run complementary stronger above-mentioned the first system module 202(such as based on the system that Bottle Neck identifies) and second system module 204(such as based on the system of GMM-HMM identification) pre-evaluation and test data, if the first system module 202 is more or less the same (such as difference is no more than 4%) with the appraisal result of second system module 204, the 3rd system module 206 is then no longer needed to mark to described speech data, described first appraisal result and the second appraisal result directly merge by the first Fusion Module 205, obtain the first fusion appraisal result, and send output module 208 to as final appraisal result.

3rd system module 206, for marking to described speech data, obtains the 3rd appraisal result.

Described 3rd system module 206 has certain complementarity (integrality, accuracy, fluency etc.) to described the first system module 202 and second system module 204.When described second appraisal result does not meet second condition, the 3rd system module is evaluated and tested pre-evaluation and test data.

It should be noted that, because different speech evaluating systems have employed different recognizers or acoustic model, often to have different recognition results, corresponding evaluation and test score is also not quite similar, and also there is certain complementarity between appraisal result.

Second Fusion Module 207, sends output module 208 to for merging described first appraisal result, the second appraisal result and the 3rd appraisal result and merging appraisal result by obtain second, exports the second fusion appraisal result to make output module 208.

Equally, second Fusion Module 207 also can adopt multiple amalgamation mode to merge the first appraisal result, the second appraisal result and the 3rd appraisal result, such as, in order to effectively reduce, high score data are commented the low appraisal result abnormal conditions caused, the highest appraisal result that second Fusion Module 207 is got in the first appraisal result, the second appraisal result and the 3rd appraisal result merges appraisal result as second, or is weighted on average these three appraisal result.Certainly, under different application scenarioss, the second Fusion Module 207 can select different fusion methods.Such as, for formal examination, in order to reduce, high score data are commented low risk, the mxm. that the second Fusion Module 207 can get the first appraisal result, the second appraisal result and the 3rd appraisal result merges appraisal result as second.And for other application scenarios, such as mock examination, machine indirect labor scoring etc., in order to the comprehensive condition of conservative estimation individuality, the mean value that the second Fusion Module 207 can get the first appraisal result, the second appraisal result and the 3rd appraisal result merges appraisal result as second.

Output module 208, merges appraisal result or the second fusion appraisal result for exporting the first appraisal result, first.

Mention above, above-mentioned first condition and second condition can be determined according to different application scenarioss and demand, correspondingly, conveniently systematic difference, in the evaluating system of the embodiment of the present invention, condition determination module (not shown) can also be provided with, for according to different application scenarioss and demand, determine described first condition and second condition.Like this, under different applied environments, easily corresponding first condition and second condition can be set by described condition determination module.

In addition, in actual applications, above-mentioned the first system module 202, second system module 204 and the 3rd system module 206 are loaded with dissimilar oral evaluation system respectively, for convenience, are referred to as the first system, second system and the 3rd system respectively.And the first system, second system and the 3rd system can be determined according to applied environment and characteristic voice to be evaluated.For this reason, in the oral evaluation system of the embodiment of the present invention, also can comprise further: module is set or selects module (not shown).Wherein:

Described module is set, for pre-setting described the first system, second system and the 3rd system, such as can according to the recognition confidence in dependence test set and evaluation result, the system of selectivity optimum as the first system, and is selected with the system of described the first system complementation as second system.

Described selection module, for the first system, second system and the 3rd system described in real-time selection, such as can according to system running environment or speech data feature, real-time selection the first system, and select with the system of described the first system complementation as second system.

In addition, it should be noted that, in actual applications, can multiple dissimilar single evaluating system is preset in the oral evaluation system of the embodiment of the present invention, select each system module to need the system loaded by user by the above-mentioned module that arranges.Certainly, according to the thought of the embodiment of the present invention, also can merge the appraisal result of more triangular web, obtain appraisal result more accurately, this embodiment of the present invention is not limited.

The oral evaluation system that the embodiment of the present invention provides, merge on the basis of oral evaluation in multisystem, add appraisal result arbitration functions, when appraisal result does not meet the demands, other evaluating system is selected again to mark further, and the multiple appraisal result obtained is merged, obtain final appraisal result.Not only ensure that the accuracy of evaluation result, and drastically increase evaluation and test efficiency.

Between each embodiment in this instructions identical similar part mutually see, for system embodiment, because it is substantially similar to embodiment of the method, so describe fairly simple, relevant part illustrates see the part of embodiment of the method.System embodiment described above is only schematic, the wherein said module illustrated as separating component can or may not be physically separates, parts as module display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of module wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.Those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.

All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions that microprocessor or digital signal processor (DSP) can be used in practice to realize according to the some or all parts in the oral evaluation system of the embodiment of the present invention.The present invention can also be embodied as part or all equipment for performing method as described herein or device program (such as, computer program and computer program).

Being described in detail the embodiment of the present invention above, applying embodiment herein to invention has been elaboration, the explanation of above embodiment just understands method and apparatus of the present invention for helping; Meanwhile, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims

1. an oral evaluation method, is characterized in that, comprising:

Receive speech data to be evaluated;

2. method according to claim 1, is characterized in that, described method also comprises:

Pre-set described the first system, second system and the 3rd system; Or

3. method according to claim 2, is characterized in that, described in pre-set the first system, second system and the 3rd system and comprise:

4. method according to claim 2, is characterized in that, described in described real-time selection, the first system, second system and the 3rd system comprise:

5. the method according to any one of Claims 1-4, is characterized in that, described method also comprises:

6. method according to claim 5, is characterized in that, described first condition is that described first appraisal result is higher than setting score value; Or described first condition is that the recognition confidence of described the first system is higher than setting thresholding.

7. method according to claim 5, is characterized in that, described second condition is that the difference of described first appraisal result and described second appraisal result is less than setting difference.

8. an oral evaluation system, is characterized in that, comprising:

Receiver module, for receiving speech data to be evaluated;

9. system according to claim 8, is characterized in that, described system also comprises:

10. system according to claim 9, is characterized in that, comprising:

Describedly arrange module, specifically for according to the recognition confidence in dependence test set and evaluation result, the system of selectivity optimum as the first system, and is selected with the system of described the first system complementation as second system.

11. systems according to claim 9, is characterized in that, comprising:

Described selection module, specifically for according to system running environment or speech data feature, real-time selection the first system, and select with the system of described the first system complementation as second system.

System described in 12. any one of according to Claim 8 to 11, it is characterized in that, described system also comprises: