CN1801326A

CN1801326A - Method for adaptively improving speech recognition rate by means of gain

Info

Publication number: CN1801326A
Application number: CNA2004101046579A
Authority: CN
Inventors: 徐波; 谢传泉; 张东泉; 普剑涛; 张亮; 张建
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2004-12-31
Filing date: 2004-12-31
Publication date: 2006-07-12
Anticipated expiration: 2024-12-31
Also published as: CN100369113C

Abstract

The invention relates to a method for improving speech recognition rate with gain self-adapting. The invention is characterized in that: by evaluating noise, adjusting recording gain and adjusting port detection parameters to improve speech rate. The inventive method comprises steps of: 1, evaluating background noise; 2, adjusting recording gain according to the evaluated background noise in step 1; taking port detection and speech recognition on the basis of step 1 and step 2.

Description

Utilize gain-adaptive to improve the method for phonetic recognization rate

Technical field

The present invention relates to the automatic speech recognition field, particularly a kind of method of utilizing gain-adaptive to improve phonetic recognization rate.

Background technology

Speech recognition technology has obtained huge development in recent years, and popularizes in a large number in built-in fields such as mobile phones, and wherein phonetic dialing program becomes one of indispensable function of high-end smartphones especially gradually.The use phonetic dialing is convenient, and is quick, need not remember loaded down with trivial details telephone number, also do not need the contact person that searches from screen page up page or leaf, improved people's communication efficient greatly.

But present various audio recognition method and product are all performed poor in noisy environment, mainly have the problem of the following aspects: the one, and the false triggering problem under the noisy environment is thought the background sound of making a lot of noise by mistake to loquitur easily.The 2nd, the discrimination problem under the noisy environment: signal to noise ratio (S/N ratio) is relatively low, thereby causes discrimination not high.The 3rd, the voice under the noisy environment finish decision problem: the background sound of making a lot of noise is thought by mistake always spoken easily, thereby all can't finish for a long time.The 4th, the channel saturation problem under the noisy environment: channel was saturated easily when external environment condition was made a lot of noise, thereby made the recording data distortion, influenced speech recognition.

Summary of the invention

The objective of the invention is to: propose the evaluation process method under a kind of new noisy environment,, and guarantee to compare high recognition so that in various noisy environments, still can use this technology of speech recognition.In noisy environment, improve the method for mobile phone speech discrimination, the essential characteristic of invention is by the analysis and assessment to background noise in noisy environment, the recording gain of corresponding adjusting mobile phone, adjust the end-point detecting method parameter, saturated to reach the supression passage, increase signal to noise ratio (S/N ratio), and then improve phonetic recognization rate and task completion rate.

The technical essential of this invention is, in the noise environment, the user understands and conscious mobile phone spoken near mouth, perhaps improves sound.Therefore, the assessment of background noise is provided with corresponding microphone gain according to this moment, simultaneously adjustable side point detecting method parameter.If environment is very noisy, then reduce gain, otherwise improve gain, compacting noise that like this can maximum possible improves signal to noise ratio (S/N ratio).

Utilize gain-adaptive to improve the method for phonetic recognization rate, it is characterized in that, regulate the recording gain, adjust the end-point detecting method parameter and improve phonetic recognization rate by noise assessment.

Set different yield values according to the background that assessment is come out, method is as follows: in every kind of noisy environment, by regulating the recording gain, all gains background sound assessed value is down tested and write down to the method that provides with claim 2 respectively, therefrom selects then to make background noise assess the yield value of value minimum.So just obtain the corresponding tables of a background type and yield value, the adjusting of gain is shown to carry out according to this.

This method mainly is made up of following continuous flow process: the background sound assessment, and the recording gain is adjusted, and end-point detection is discerned.Specify as follows:

1. background sound appraisal procedure

This method is used to assess current background environment noise.The rectangular window that is N=960 (0.125 second) with the mutual length that does not superpose is divided into the K section to voice, and calculates the sub belt energy E of each section _i(i=0,1 ..., K).Then the computing formula of the ground unrest when detecting i block of speech is: B=[E _i+ (ω-1) B]/ω, wherein ω is a positive integer, we get ω=10.Make equal difference according to the B value scope of calculating and distribute, background environment is divided into following type:

Quiet environment: as office

Conventional environment: as outdoor

Noise environment: on motorbus

2. recording gain adjusting method

In the background of making a lot of noise, user's custom is spoken up, and the user can speak usually gently in quiet environment.In user's presence, can guarantee certain signal to noise ratio (S/N ratio) like this, and then guarantee discrimination.But have following problem: voice improves a little and just causes passage saturated easily under the noise environment, data distortion, thus reduce discrimination.On the other hand, the fluctuation of energy of background sound alters a great deal, and brings difficulty for the initial end judgement of voice.

The purpose that the recording gain is adjusted is to make to guarantee that in any environment the background noise under the record keeps a stable level, thereby solves the above-mentioned problem.The key step of method is as follows: at first given mobile phone must be able to be provided with the recording gain.In every kind of noisy environment that assessment is come out in step 1, by regulating the recording gain, use the same method the background sound assessed value of testing respectively and writing down under all gains.Therefrom select then to make background noise assess the yield value of value minimum.So just obtain the corresponding tables of a background type and yield value, the adjusting of gain is shown to carry out according to this.This table of mobile phone for different model is incomplete same, need obtain by this method test.Be the measured data (the gain-adjusted scope of this mobile phone is 1 to 30) on certain mobile phone below:

Background type	Yield value
Background type	Yield value	Quiet environment	17
Conventional environment	4	Quiet environment	17
Conventional environment	4	The noise environment	1

3. end-point detecting method

This end-point detecting method adopts different detection methods respectively to the head and the tail end points of voice.

Previous studies shows, determine that accurately the tail point of voice is more difficult, particularly for some voice ending that has been weakened, is easy to be lost, thereby causes identification error.Therefore we adopt speech recognition process to decide the tail point of voice: when system detects voice, and optimal path has arrived the quiet model of suffix and has kept continuous 0.375 second, judge that then voice finish, test shows, this method near 100%, is better than any tail point detection method to the detection accuracy rate of tail point.

The method based on sub belt energy is adopted in the detection of voice head-end, and the computing formula of decision threshold is: T=λ B, and wherein λ is a constant, selects λ=14 here, B is the ground unrest value of assessment.Decision method: the energy of supposing the current block voice is E, at first upgrades the value B of ground unrest; When E＜=T, expression is a ground unrest, then and up-to-date two voice (0.25 second) are preserved; When E＞T, expression detects voice, owing to also preserved 0.25 second voice, therefore the voice head-end is pushed away forward 0.25 second, to guarantee that voice are not lost.

The invention has the advantages that: can effectively solve the serial problem of using speech recognition to run in the noisy environment, improve discrimination and task completion rate.

Example

Before televisor, subway is recorded a batch data on the road respectively, and Fig. 1 is to use the task completion rate correlation data (number percent data) before and after this method.

Following proper noun: " noise adaptive ", " SEA (Smart EnvironmentAdaptation) ", it shows as in various noisy environments can both effectively improve phonetic recognization rate.

Utilize the user to speak and be accustomed to, reduce the recording gain in the noise environment, it is saturated to restrain passage.

Description of drawings

Fig. 1 is to use task completion rate correlation data (number percent data) figure before and after the inventive method.

Fig. 2 is that the gain-adaptive that utilizes of the present invention improves the phonetic recognization rate method flow diagram.

Embodiment

Fig. 1 is to use the task completion rate correlation data (number percent data) before and after this method.

Before televisor, subway is recorded a batch data comparative result on the road respectively.In various noisy environments, can both effectively improve phonetic recognization rate.

The gain-adaptive that utilizes of Fig. 2 improves the phonetic recognization rate method, and its step is as follows:

Step S1, the assessment background noise;

Step S2, the background noise type adjustment recording of assessing out according to step S1 gains;

Step S3 at step S1, carries out end-point detection and speech recognition on the basis of S2.

Claims

1. a method of utilizing gain-adaptive to improve phonetic recognization rate is characterized in that by noise assessment, regulates the recording gain, adjusts the end-point detecting method parameter and improves phonetic recognization rate.

2. background sound appraisal procedure according to claim 1, it is characterized in that background environment being classified according to historical noise and current noise, concrete grammar is divided into the K section to voice for the rectangular window that is N=960 (0.125 second) with the mutual length that does not superpose, and calculates the sub belt energy E of each section _i(i=0,1 ..., K), then the computing formula of the ground unrest when detecting i block of speech is: B=[E _i+ (ω-1) B]/ω, wherein ω is a positive integer, we get ω=10, and background noise is divided into three types: quiet environment, as office, conventional environment, as outdoor, the noise environment is on motorbus.

3. recording gain adjusting method according to claim 1, it is characterized in that setting different yield values according to assessing the background of coming out, method is as follows: in every kind of noisy environment, by regulating the recording gain, the background sound assessed value under all gains is tested and write down to the method that provides with claim 2 respectively, select therefrom then to make background noise assess the yield value of value minimum that so just obtain the corresponding tables of a background type and yield value, the adjusting of gain is shown to carry out according to this.

4. end-point detecting method according to claim 1 is characterized in that the head and the tail end points adopts different detection methods, and the parameter regulation of coming out according to background evaluation.Adopt speech recognition process to decide the tail point of voice: when system detects voice, and optimal path arrived the quiet model of suffix and kept continuous 0.375 second, judges that then voice finish.The method based on sub belt energy is adopted in the detection of voice head-end, and the computing formula of decision threshold is: T=λ B, and wherein λ is a constant, selects λ=14 here, B is the ground unrest value of assessment.

5. the gain-adaptive that utilizes according to claim 1 improves the method for phonetic recognization rate, and its concrete steps are as follows:

Step S1, the assessment background noise;