CN106340297A

CN106340297A - Speech recognition method and system based on cloud computing and confidence calculation

Info

Publication number: CN106340297A
Application number: CN201610840519.XA
Authority: CN
Inventors: 李志�; 田宗贵
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2016-09-21
Filing date: 2016-09-21
Publication date: 2017-01-18

Abstract

The invention discloses a speech recognition method based on cloud calculation and confidence calculation, relating to the technical field of speech recognition. The method comprises the steps of (S1) a local speech recognition system and a cloud speech recognition system receive speech signals, (S2) the local speech recognition system obtains a local speech recognition result, and the cloud speech recognition system obtains a cloud speech recognition result, (S31) carrying out confidence evaluation on the local speech recognition result, and obtaining the confidence of the local speech recognition result, (S32) carrying out confidence evaluation on the cloud speech recognition result, and obtaining the confidence of the cloud speech recognition result, (S4) comparing the confidence of the local speech recognition result and the confidence of the cloud speech recognition result, and outputting a speech recognition result with high confidence is outputted. The invention also discloses a speech recognition system based on cloud calculation and confidence calculation. By using the method with the combination of cloud and local speech recognition is used, and the quality of the speech recognition can be improved.

Description

A kind of audio recognition method based on cloud computing and confidence calculations and system

Technical field

The present invention relates to the technical field of speech recognition is and in particular to a kind of voice based on cloud computing and confidence calculations Recognition methodss and system.

Background technology

Progress with science and the development of technology, speech recognition technology has reached its maturity, and is just progressively becoming information skill The key technology of man-machine interface in art.Multiple voice recognizer makes speech recognition either on discrimination or recognition efficiency All have a distinct increment.In recent years, speech recognition technology is also gradually commonly used in every field.However, traditional voice is known Other technology carries out speech recognition using local voice identification software mostly, and so resulting in the speech recognition algorithm in software is no Method changes.And different speech recognition algorithms certainly will have difference for the speech recognition effect of different phonetic entry environment Different.For example in complicated noise, there is the noise in various sources.Under such noise circumstance, the language of original operational excellence The discrimination of sound identifying system may be a greater impact.If the method that software adopts template training, due to training sample and The mismatch of sample planting modes on sink characteristic, then the recognition performance of software will drastically decline, the shortcoming of existing voice identifying system be with Its speech recognition performance of the change of environment also can drastically decline, its adaptability and the suitability not high it is impossible to meet multiple in the case of Speech recognition demand.Therefore, how to allow speech recognition system is with a wide range of applications to be just particularly important with the suitability.

Chinese patent application cn201310163915.x discloses a kind of update method of speech recognition apparatus, device and is System, comprising: receive voice input signal；Using local voice identification equipment, speech recognition is carried out to voice input signal, obtain Local voice recognition result；Obtain optimal identification result as from local voice recognition result and high in the clouds voice identification result Whole voice identification result, wherein high in the clouds voice identification result are to carry out speech recognition in local voice equipment to voice input signal While, using high in the clouds speech recognition apparatus, speech recognition acquisition is carried out to voice input signal；Anti- in conjunction with the user obtaining Feedforward information and final voice identification result determine whether the reliability of local voice recognition result meets requirement；Local when determining When the reliability of voice identification result is unsatisfactory for requiring, using high in the clouds speech recognition apparatus, local voice identification equipment is carried out more Newly.Apply high in the clouds speech recognition apparatus in the technical scheme of this patent application publication and carry out speech recognition, but voice is known The lifting of other effect is inconspicuous, and need to determine the reliability of voice identification result in conjunction with the feedback information of user, needs user Carry out result selection, make the operating procedure of user more loaded down with trivial details, be unfavorable for lifting experience.

Content of the invention

For the deficiencies in the prior art, the purpose of the present invention aims to provide a kind of language based on cloud computing and confidence calculations Voice recognition method and system, carry out, using cloud computing mode, method that the identification of speech recognition and local voice combines so that language Sound identification equipment or system can effectively adapt to multiple voice input environment, improve the quality of speech recognition.

For achieving the above object, the present invention adopts the following technical scheme that

A kind of audio recognition method based on cloud computing and confidence calculations, includes following steps:

S1, local speech recognition system and high in the clouds speech recognition system receive voice signal respectively；

S2, local speech recognition system draw local voice recognition result, and high in the clouds speech recognition system draws high in the clouds voice Recognition result；

S31, confidence level evaluation and test is carried out to local voice recognition result, draw the confidence level of local voice recognition result；

S32, confidence level evaluation and test is carried out to high in the clouds voice identification result, draw the confidence level of high in the clouds voice identification result；

S4, the confidence level of the confidence level of local voice recognition result and high in the clouds voice identification result is compared, will put The higher voice identification result of reliability is exported.

Further, it is provided with different speech recognition modelings in the speech recognition system of high in the clouds, high in the clouds voice in step s2 Identifying system draws different plan high in the clouds voice identification results based on different speech recognition modelings, and the content of step s32 comprises Have:

S321, confidence level evaluation and test is carried out to different plan high in the clouds voice identification results, draw and intend high in the clouds language corresponding to different The confidence level of sound recognition result；

S322, the confidence levels that will intend high in the clouds voice identification result corresponding to difference be compared, and confidence level highest is intended High in the clouds voice identification result is exported as high in the clouds voice identification result.

Further, different speech recognition modelings includes the speech recognition set up based on different speech recognition algorithms Model, also include the speech recognition modeling set up based on different speech recognition algorithm combinations, different speech recognition modeling Corresponding to different phonetic entry environment.

Further, before carrying out step s2, first carry out step s20:

S20, local speech recognition system and high in the clouds speech recognition system are carried out at noise reduction to the voice signal receiving respectively Reason.

Further, in step s20, high in the clouds speech recognition system is entered to voice signal using different voice de-noising models Row noise reduction process, this different voice de-noising model is set up based on different phonetic entry environment, this different voice de-noising mould Type and different speech recognition modelings correspond, and the voice signal completing noise reduction process is sent to by high in the clouds speech recognition system Speech recognition modeling corresponding to same phonetic entry environment.

A kind of speech recognition system based on cloud computing and confidence calculations, includes:

Local speech recognition system, for receiving voice signal and drawing local voice recognition result；

High in the clouds speech recognition system, for receiving voice signal and drawing high in the clouds voice identification result；

Confidence level evaluates and tests module, using certainty factor algebra, local voice recognition result and high in the clouds voice identification result is carried out Confidence level is evaluated and tested；

Data processing module, the confidence level of the confidence level of local voice recognition result and high in the clouds voice identification result is carried out Compare, and export the higher voice identification result of confidence level.

Further, different high in the clouds speech recognition submodules are included in the speech recognition system of high in the clouds:

In the speech recognition system of high in the clouds, in different high in the clouds speech recognition submodules, include different speech recognition moulds Type, high in the clouds speech recognition submodule is used for receiving voice signal and drawing plan high in the clouds voice identification result；

Confidence level evaluates and tests module, using certainty factor algebra, local voice recognition result and plan high in the clouds voice identification result is entered Row confidence level is evaluated and tested；

Data processing module, by the confidence intending high in the clouds voice identification result of different high in the clouds speech recognition submodule output Degree is compared, and confidence level highest is intended high in the clouds voice identification result as high in the clouds voice identification result；Local voice is known The confidence level of other result and the confidence level of high in the clouds voice identification result are compared, and export the higher speech recognition knot of confidence level Really.

Further, local voice noise reduction module and high in the clouds voice de-noising module, local voice noise reduction module are also included For voice signal being carried out with noise reduction process, again the voice signal completing noise reduction process being sent to local speech recognition system, High in the clouds voice de-noising module is used for voice signal is carried out with noise reduction process, again the voice signal completing noise reduction process is sent to cloud End speech recognition system.

Further, different high in the clouds voice de-noising submodules are included in high in the clouds voice de-noising module, different high in the clouds Different voice de-noising models are included, this different voice de-noising model is defeated based on different voices in voice de-noising submodule Enter environment and set up, this different voice de-noising model is corresponded from different speech recognition modelings.

The beneficial effects of the present invention is: language is synchronously identified with local speech recognition system using high in the clouds speech recognition system Sound, wherein high in the clouds speech recognition system are to include multiple speech recognition modelings corresponding to different input environments, from various languages Preferentially export in sound recognition result, so that speech recognition apparatus or system can effectively adapt to multiple voice input ring Border, effectively improves the quality of speech recognition；Using certainty factor algebra, various voice identification results are evaluated, improve voice and know The reliability of other result；Combine, in confidence level valuation, the information not being fully utilized in legacy speech recognition systems, thus subtracting The entropy of little speech recognition system, more accurately judges correcting errors of recognition result, thus improving the systematic function of speech recognition.

Brief description

Fig. 1 is the flow chart in the present invention based on cloud computing and the audio recognition method of confidence calculations.

Specific embodiment

Below, in conjunction with accompanying drawing and specific embodiment, the present invention is described further:

Embodiment 1

As shown in figure 1, a kind of audio recognition method based on cloud computing and confidence calculations, include following steps:

S20, local speech recognition system and high in the clouds speech recognition system are carried out at noise reduction to the voice signal receiving respectively Reason, wherein high in the clouds speech recognition system carries out noise reduction process using different voice de-noising models to voice signal, and this is different Voice de-noising model is set up based on different phonetic entry environment；

S2, local speech recognition system draw local voice recognition result, and high in the clouds speech recognition system is based on different languages Sound identification model draws different high in the clouds voice identification results, and this different speech recognition modeling includes knowing based on different voices Other algorithm and set up speech recognition modeling, also include the speech recognition mould set up based on different speech recognition algorithm combinations Type, different speech recognition modelings corresponds to different phonetic entry environment, different voice de-noising models and different voices Identification model corresponds, and the voice signal completing noise reduction process is sent to corresponding voice by different voice de-noising models to be known Other model；

S322, the confidence levels that will intend high in the clouds voice identification result corresponding to difference be compared, and confidence level highest is intended High in the clouds voice identification result is exported as high in the clouds voice identification result；

S4, it is less than setting value then direct output high in the clouds voice identification result when the confidence level of local voice recognition result；If The confidence level of local voice recognition result reaches setting value, by the confidence level of local voice recognition result and high in the clouds speech recognition knot The confidence level of fruit is compared, and voice identification result higher for confidence level is exported.

Embodiment 2

Local voice noise reduction module, for carrying out noise reduction process, believing the voice completing noise reduction process to voice signal Number it is sent to local speech recognition system；

High in the clouds voice de-noising module, includes different high in the clouds voice de-noising submodules, and different high in the clouds voice de-noisings is sub Different voice de-noising models are included, this different voice de-noising model is built based on different phonetic entry environment in module Vertical, this different voice de-noising model is corresponded from different speech recognition modelings, for carrying out at noise reduction to voice signal Manage, again the voice signal completing noise reduction process be sent to high in the clouds speech recognition system；

Local speech recognition system, for receiving the voice signal being derived from local voice noise reduction module and drawing local language Sound recognition result；

High in the clouds speech recognition system, includes different high in the clouds speech recognition submodules, and different high in the clouds speech recognitions is sub Different speech recognition modelings are included, this different speech recognition modeling is included based on different speech recognition algorithms in module And the speech recognition modeling set up, also include the speech recognition modeling set up based on different speech recognition algorithm combinations, no Same speech recognition modeling corresponds to different phonetic entry environment, different voice de-noising models and different speech recognition moulds Type corresponds, and different sound identification modules receives the voice messaging from corresponding voice de-noising model and draws plan high in the clouds language Sound recognition result；

Embodiment 3

It is based on the audio recognition method of confidence calculations or based in embodiment 2 based on cloud computing based in embodiment 1 Cloud computing and the speech recognition system of confidence calculations, in the present embodiment, different speech recognition algorithms includes template matching calculation Method, probabilistic model algorithm and artificial neural network algorithm, wherein:

Template matching algorithm, extracts the characteristic vector that can fully describe phonic signal character in the training stage and is formed Feature vector sequence, and be optimized, show that a characteristic vector set carrys out expressing feature vector sequence, with this feature vector set Cooperate as template；In use, extracting the characteristic vector of voice to be identified, and form the characteristic vector sequence of voice to be identified Row, the feature vector sequence of the feature vector sequence of voice to be identified and template is contrasted, and by matching degree highest The corresponding voice signal of template is as the voice identification result based on template matching algorithm；

Probabilistic model algorithm, extracts the characteristic vector that can fully describe phonic signal character in the training stage, according to The regularity of distribution in feature space for this feature vector forms mathematical model；In use, extracting the feature of voice to be identified Vector, speech characteristic vector to be identified is contrasted with mathematical model in the regularity of distribution of feature space, is calculated similarity, And using corresponding for corresponding for similarity highest mathematical model voice signal as the voice identification result based on probabilistic model algorithm.

Embodiment 4

It is based on the audio recognition method of confidence calculations or based in embodiment 2 based on cloud computing based in embodiment 1 Cloud computing and the speech recognition system of confidence calculations, include for setting up the information of confidence level Valuation Modelling in the present embodiment: 1) mark (trace) of viterbi decoding information and hidden Markov model (hmm): state alignment information, state duration (segment length), likelihood score；2) to alternative hvpothesis h₁And anti-word modelModeling；3) the online rubbish that competition candidate result is constituted Model；4) the clear and definite filler model to foundation of pronouncing outside knowledge by mistake and vocabulary or filler model；5) word lattice density.Confidence level valuation mould Type is segmented into rule-based comprehensive and based on statistical model synthesis to the synthesis of voice messaging, wherein rule-based comprehensive Close and in different cognitive phase applications different information source, confidence level is estimated respectively, with its emphasis point is the total of experience Knot, the formation of rule and adjustment；Statistical model includes linear model and generalized linear model:

Definition event a occur occasionality beThe collection of all information is combined into x,

q(c_i=1/x) it is to true probability p (c_i=1/x) estimation, then the linear model of confidence level be:

\log [o o d s (c_{i} = 1 / x)] = \log \frac{q (c_{i} = 1 / x)}{1 - q (c_{i} = 1 / x)} = σ_{i} t_{i} x_{i},

Wherein, x_iIngredient for x, i.e. x_i∈x；c_iFor confidence level label: c_i=0 (identification mistake)；c_i=1 (identification Correctly).The generalized linear model of confidence level is:

\log [o o d s (c_{i} = 1 / x)] = \log \frac{q (c_{i} = 1 / x)}{1 - q (c_{i} = 1 / x)} = σ_{i} g_{i} (x_{i}),

Separately, define w_jT () is the optimal score of the front t observed quantity (t frame) reaching state j in search procedure, γ_i(o_t) Confidence score for t frame state i:

w_{j} (t) = \max_{i} {w_{i} (t - 1) γ_{i} (o_{t})},

{logγ}_{i} (o_{t}) = σ_{k = 1}^{3} {logv}_{i k} (o_{t}),

Wherein, logv_ik(k=1,2,3) represents likelihood score, segment length and likelihood ratio 3 category information respectively:

logv_i2(o_t)=k₂Logw (d),

logv_i3(o_t)=k₃Logw (cm),

Wherein, a_ijAnd b_j(o_t) it is respectively the transition probability of speech recognition modeling and output probability, k_iRepresent to different characteristic The weight coefficient of information, w (cm) is likelihood ratio information, the computational methods of w (cm):

If log-likelihood ratio isFootmark c and a represents this speech recognition modeling and certain phase respectively Anti- speech recognition modeling, then have:

\log w (c m) = l o g \frac{1}{1 + \exp {- t (l l r + u)}},

Wherein, t is normal number, and u is constant, and the value of w (cm) is necessarily between 0～1.If current speech identification model is seemingly When so degree is higher than the likelihood score of phase inverse model, llr > 0, close to 1；Otherwise close to 0.T and u is used for decay and the position of control function Put, its value is determined by experiment.

The confidence level of (as phoneme, syllable, whole word and whole word) in different levels can be calculated by above method respectively Valuation.

It will be apparent to those skilled in the art that can technical scheme as described above and design, make other various Corresponding change and deformation, and all these change and deformation all should belong to the protection domain of the claims in the present invention Within.

Claims

1. a kind of audio recognition method based on cloud computing and confidence calculations is it is characterised in that include following steps:

S2, local speech recognition system draw local voice recognition result, and high in the clouds speech recognition system draws high in the clouds speech recognition Result；

S4, the confidence level of the confidence level of local voice recognition result and high in the clouds voice identification result is compared, by confidence level Higher voice identification result is exported.

2. the audio recognition method based on cloud computing and confidence calculations as claimed in claim 1 is it is characterised in that high in the clouds language It is provided with different speech recognition modelings, in step s2, high in the clouds speech recognition system is known based on different voices in sound identifying system Other model draws different plan high in the clouds voice identification results, and the content of step s32 includes:

S321, confidence level evaluation and test is carried out to different plan high in the clouds voice identification results, draw and know corresponding to different high in the clouds voices of intending The confidence level of other result；

3. the audio recognition method based on cloud computing and confidence calculations as claimed in claim 2 is it is characterised in that different Speech recognition modeling that speech recognition modeling includes setting up based on different speech recognition algorithms, also include based on different languages The speech recognition modeling that sound recognizer combines and sets up, different speech recognition modelings corresponds to different phonetic entry rings Border.

4. the audio recognition method based on cloud computing and confidence calculations as claimed in claim 3 is it is characterised in that carrying out Before step s2, first carry out step s20:

S20, local speech recognition system and high in the clouds speech recognition system carry out noise reduction process to the voice signal receiving respectively.

5. the audio recognition method based on cloud computing and confidence calculations as claimed in claim 4 is it is characterised in that step In s20, high in the clouds speech recognition system carries out noise reduction process using different voice de-noising models to voice signal, this different language Sound noise reduction model is based on the different foundation of phonetic entry environment, this different voice de-noising model and different speech recognition modelings Correspond, the voice signal completing noise reduction process is sent to corresponding to same phonetic entry environment high in the clouds speech recognition system Speech recognition modeling.

6. a kind of speech recognition system based on cloud computing and confidence calculations is it is characterised in that include:

Confidence level evaluates and tests module, carries out confidence using certainty factor algebra to local voice recognition result and high in the clouds voice identification result Degree evaluation and test；

Data processing module, the confidence level of the confidence level of local voice recognition result and high in the clouds voice identification result is compared Relatively, and export the higher voice identification result of confidence level.

7. the speech recognition system based on cloud computing and confidence calculations as claimed in claim 6 is it is characterised in that high in the clouds language Different high in the clouds speech recognition submodules are included in sound identifying system:

In the speech recognition system of high in the clouds, in different high in the clouds speech recognition submodules, include different speech recognition modelings, cloud End speech recognition submodule is used for receiving voice signal and drawing plan high in the clouds voice identification result；

Confidence level evaluates and tests module, using certainty factor algebra, local voice recognition result and plan high in the clouds voice identification result is put Reliability is evaluated and tested；

Data processing module, the confidence level intending high in the clouds voice identification result of different high in the clouds speech recognition submodule output is entered Row compares, and confidence level highest is intended high in the clouds voice identification result as high in the clouds voice identification result；Local voice is identified knot The confidence level of fruit is compared with the confidence level of high in the clouds voice identification result, and exports the higher voice identification result of confidence level.

8. the speech recognition system based on cloud computing and confidence calculations as claimed in claim 7 is it is characterised in that different Speech recognition modeling that speech recognition modeling includes setting up based on different speech recognition algorithms, also include based on different languages The speech recognition modeling that sound recognizer combines and sets up, different speech recognition modelings corresponds to different phonetic entry rings Border.

9. the speech recognition system based on cloud computing and confidence calculations as claimed in claim 8 is it is characterised in that also include There are local voice noise reduction module and high in the clouds voice de-noising module, local voice noise reduction module is used for voice signal is carried out at noise reduction Manage, again the voice signal completing noise reduction process be sent to local speech recognition system, high in the clouds voice de-noising module is used for language Message number carries out noise reduction process, again the voice signal completing noise reduction process is sent to high in the clouds speech recognition system.

10. the speech recognition system based on cloud computing and confidence calculations as claimed in claim 9 is it is characterised in that high in the clouds Include different high in the clouds voice de-noising submodules in voice de-noising module, include in different high in the clouds voice de-noising submodules Different voice de-noising models, this different voice de-noising model is set up based on different phonetic entry environment, and this is different Voice de-noising model is corresponded from different speech recognition modelings.