CN1801326A - Method for adaptively improving speech recognition rate by means of gain - Google Patents

Method for adaptively improving speech recognition rate by means of gain Download PDF

Info

Publication number
CN1801326A
CN1801326A CNA2004101046579A CN200410104657A CN1801326A CN 1801326 A CN1801326 A CN 1801326A CN A2004101046579 A CNA2004101046579 A CN A2004101046579A CN 200410104657 A CN200410104657 A CN 200410104657A CN 1801326 A CN1801326 A CN 1801326A
Authority
CN
China
Prior art keywords
gain
noise
background
voice
environment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2004101046579A
Other languages
Chinese (zh)
Other versions
CN100369113C (en
Inventor
徐波
谢传泉
张东泉
普剑涛
张亮
张建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CNB2004101046579A priority Critical patent/CN100369113C/en
Publication of CN1801326A publication Critical patent/CN1801326A/en
Application granted granted Critical
Publication of CN100369113C publication Critical patent/CN100369113C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention relates to a method for improving speech recognition rate with gain self-adapting. The invention is characterized in that: by evaluating noise, adjusting recording gain and adjusting port detection parameters to improve speech rate. The inventive method comprises steps of: 1, evaluating background noise; 2, adjusting recording gain according to the evaluated background noise in step 1; taking port detection and speech recognition on the basis of step 1 and step 2.

Description

Utilize gain-adaptive to improve the method for phonetic recognization rate
Technical field
The present invention relates to the automatic speech recognition field, particularly a kind of method of utilizing gain-adaptive to improve phonetic recognization rate.
Background technology
Speech recognition technology has obtained huge development in recent years, and popularizes in a large number in built-in fields such as mobile phones, and wherein phonetic dialing program becomes one of indispensable function of high-end smartphones especially gradually.The use phonetic dialing is convenient, and is quick, need not remember loaded down with trivial details telephone number, also do not need the contact person that searches from screen page up page or leaf, improved people's communication efficient greatly.
But present various audio recognition method and product are all performed poor in noisy environment, mainly have the problem of the following aspects: the one, and the false triggering problem under the noisy environment is thought the background sound of making a lot of noise by mistake to loquitur easily.The 2nd, the discrimination problem under the noisy environment: signal to noise ratio (S/N ratio) is relatively low, thereby causes discrimination not high.The 3rd, the voice under the noisy environment finish decision problem: the background sound of making a lot of noise is thought by mistake always spoken easily, thereby all can't finish for a long time.The 4th, the channel saturation problem under the noisy environment: channel was saturated easily when external environment condition was made a lot of noise, thereby made the recording data distortion, influenced speech recognition.
Summary of the invention
The objective of the invention is to: propose the evaluation process method under a kind of new noisy environment,, and guarantee to compare high recognition so that in various noisy environments, still can use this technology of speech recognition.In noisy environment, improve the method for mobile phone speech discrimination, the essential characteristic of invention is by the analysis and assessment to background noise in noisy environment, the recording gain of corresponding adjusting mobile phone, adjust the end-point detecting method parameter, saturated to reach the supression passage, increase signal to noise ratio (S/N ratio), and then improve phonetic recognization rate and task completion rate.
The technical essential of this invention is, in the noise environment, the user understands and conscious mobile phone spoken near mouth, perhaps improves sound.Therefore, the assessment of background noise is provided with corresponding microphone gain according to this moment, simultaneously adjustable side point detecting method parameter.If environment is very noisy, then reduce gain, otherwise improve gain, compacting noise that like this can maximum possible improves signal to noise ratio (S/N ratio).
Utilize gain-adaptive to improve the method for phonetic recognization rate, it is characterized in that, regulate the recording gain, adjust the end-point detecting method parameter and improve phonetic recognization rate by noise assessment.
Set different yield values according to the background that assessment is come out, method is as follows: in every kind of noisy environment, by regulating the recording gain, all gains background sound assessed value is down tested and write down to the method that provides with claim 2 respectively, therefrom selects then to make background noise assess the yield value of value minimum.So just obtain the corresponding tables of a background type and yield value, the adjusting of gain is shown to carry out according to this.
This method mainly is made up of following continuous flow process: the background sound assessment, and the recording gain is adjusted, and end-point detection is discerned.Specify as follows:
1. background sound appraisal procedure
This method is used to assess current background environment noise.The rectangular window that is N=960 (0.125 second) with the mutual length that does not superpose is divided into the K section to voice, and calculates the sub belt energy E of each section i(i=0,1 ..., K).Then the computing formula of the ground unrest when detecting i block of speech is: B=[E i+ (ω-1) B]/ω, wherein ω is a positive integer, we get ω=10.Make equal difference according to the B value scope of calculating and distribute, background environment is divided into following type:
Quiet environment: as office
Conventional environment: as outdoor
Noise environment: on motorbus
2. recording gain adjusting method
In the background of making a lot of noise, user's custom is spoken up, and the user can speak usually gently in quiet environment.In user's presence, can guarantee certain signal to noise ratio (S/N ratio) like this, and then guarantee discrimination.But have following problem: voice improves a little and just causes passage saturated easily under the noise environment, data distortion, thus reduce discrimination.On the other hand, the fluctuation of energy of background sound alters a great deal, and brings difficulty for the initial end judgement of voice.
The purpose that the recording gain is adjusted is to make to guarantee that in any environment the background noise under the record keeps a stable level, thereby solves the above-mentioned problem.The key step of method is as follows: at first given mobile phone must be able to be provided with the recording gain.In every kind of noisy environment that assessment is come out in step 1, by regulating the recording gain, use the same method the background sound assessed value of testing respectively and writing down under all gains.Therefrom select then to make background noise assess the yield value of value minimum.So just obtain the corresponding tables of a background type and yield value, the adjusting of gain is shown to carry out according to this.This table of mobile phone for different model is incomplete same, need obtain by this method test.Be the measured data (the gain-adjusted scope of this mobile phone is 1 to 30) on certain mobile phone below:
Background type Yield value
Quiet environment 17
Conventional environment 4
The noise environment 1
3. end-point detecting method
This end-point detecting method adopts different detection methods respectively to the head and the tail end points of voice.
Previous studies shows, determine that accurately the tail point of voice is more difficult, particularly for some voice ending that has been weakened, is easy to be lost, thereby causes identification error.Therefore we adopt speech recognition process to decide the tail point of voice: when system detects voice, and optimal path has arrived the quiet model of suffix and has kept continuous 0.375 second, judge that then voice finish, test shows, this method near 100%, is better than any tail point detection method to the detection accuracy rate of tail point.
The method based on sub belt energy is adopted in the detection of voice head-end, and the computing formula of decision threshold is: T=λ B, and wherein λ is a constant, selects λ=14 here, B is the ground unrest value of assessment.Decision method: the energy of supposing the current block voice is E, at first upgrades the value B of ground unrest; When E<=T, expression is a ground unrest, then and up-to-date two voice (0.25 second) are preserved; When E>T, expression detects voice, owing to also preserved 0.25 second voice, therefore the voice head-end is pushed away forward 0.25 second, to guarantee that voice are not lost.
The invention has the advantages that: can effectively solve the serial problem of using speech recognition to run in the noisy environment, improve discrimination and task completion rate.
Example
Before televisor, subway is recorded a batch data on the road respectively, and Fig. 1 is to use the task completion rate correlation data (number percent data) before and after this method.
Following proper noun: " noise adaptive ", " SEA (Smart EnvironmentAdaptation) ", it shows as in various noisy environments can both effectively improve phonetic recognization rate.
Utilize the user to speak and be accustomed to, reduce the recording gain in the noise environment, it is saturated to restrain passage.
Description of drawings
Fig. 1 is to use task completion rate correlation data (number percent data) figure before and after the inventive method.
Fig. 2 is that the gain-adaptive that utilizes of the present invention improves the phonetic recognization rate method flow diagram.
Embodiment
Fig. 1 is to use the task completion rate correlation data (number percent data) before and after this method.
Before televisor, subway is recorded a batch data comparative result on the road respectively.In various noisy environments, can both effectively improve phonetic recognization rate.
The gain-adaptive that utilizes of Fig. 2 improves the phonetic recognization rate method, and its step is as follows:
Step S1, the assessment background noise;
Step S2, the background noise type adjustment recording of assessing out according to step S1 gains;
Step S3 at step S1, carries out end-point detection and speech recognition on the basis of S2.

Claims (5)

1. a method of utilizing gain-adaptive to improve phonetic recognization rate is characterized in that by noise assessment, regulates the recording gain, adjusts the end-point detecting method parameter and improves phonetic recognization rate.
2. background sound appraisal procedure according to claim 1, it is characterized in that background environment being classified according to historical noise and current noise, concrete grammar is divided into the K section to voice for the rectangular window that is N=960 (0.125 second) with the mutual length that does not superpose, and calculates the sub belt energy E of each section i(i=0,1 ..., K), then the computing formula of the ground unrest when detecting i block of speech is: B=[E i+ (ω-1) B]/ω, wherein ω is a positive integer, we get ω=10, and background noise is divided into three types: quiet environment, as office, conventional environment, as outdoor, the noise environment is on motorbus.
3. recording gain adjusting method according to claim 1, it is characterized in that setting different yield values according to assessing the background of coming out, method is as follows: in every kind of noisy environment, by regulating the recording gain, the background sound assessed value under all gains is tested and write down to the method that provides with claim 2 respectively, select therefrom then to make background noise assess the yield value of value minimum that so just obtain the corresponding tables of a background type and yield value, the adjusting of gain is shown to carry out according to this.
4. end-point detecting method according to claim 1 is characterized in that the head and the tail end points adopts different detection methods, and the parameter regulation of coming out according to background evaluation.Adopt speech recognition process to decide the tail point of voice: when system detects voice, and optimal path arrived the quiet model of suffix and kept continuous 0.375 second, judges that then voice finish.The method based on sub belt energy is adopted in the detection of voice head-end, and the computing formula of decision threshold is: T=λ B, and wherein λ is a constant, selects λ=14 here, B is the ground unrest value of assessment.
5. the gain-adaptive that utilizes according to claim 1 improves the method for phonetic recognization rate, and its concrete steps are as follows:
Step S1, the assessment background noise;
Step S2, the background noise type adjustment recording of assessing out according to step S1 gains;
Step S3 at step S1, carries out end-point detection and speech recognition on the basis of S2.
CNB2004101046579A 2004-12-31 2004-12-31 Method for adaptively improving speech recognition rate by means of gain Active CN100369113C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004101046579A CN100369113C (en) 2004-12-31 2004-12-31 Method for adaptively improving speech recognition rate by means of gain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004101046579A CN100369113C (en) 2004-12-31 2004-12-31 Method for adaptively improving speech recognition rate by means of gain

Publications (2)

Publication Number Publication Date
CN1801326A true CN1801326A (en) 2006-07-12
CN100369113C CN100369113C (en) 2008-02-13

Family

ID=36811273

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004101046579A Active CN100369113C (en) 2004-12-31 2004-12-31 Method for adaptively improving speech recognition rate by means of gain

Country Status (1)

Country Link
CN (1) CN100369113C (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102394062A (en) * 2011-10-26 2012-03-28 华南理工大学 Method and system for automatically identifying voice recording equipment source
CN102857650A (en) * 2012-08-29 2013-01-02 苏州佳世达电通有限公司 Method for dynamically regulating voice
CN103002380A (en) * 2011-09-13 2013-03-27 索尼公司 Information processing apparatus and information processing method
CN103280215A (en) * 2013-05-28 2013-09-04 北京百度网讯科技有限公司 Audio frequency feature library establishing method and device
CN104064197A (en) * 2014-06-20 2014-09-24 哈尔滨工业大学深圳研究生院 Method for improving speech recognition robustness on basis of dynamic information among speech frames
TWI463859B (en) * 2008-01-18 2014-12-01 Chi Mei Comm Systems Inc Portable electronic device
CN104505095A (en) * 2014-12-22 2015-04-08 上海语知义信息技术有限公司 Voice control system and voice control method for alarm clock
CN104900237A (en) * 2015-04-24 2015-09-09 上海聚力传媒技术有限公司 Method, device and system for denoising audio information
CN105355197A (en) * 2015-10-30 2016-02-24 百度在线网络技术(北京)有限公司 Gain processing method and device for speech recognition system
CN107086043A (en) * 2014-03-12 2017-08-22 华为技术有限公司 The method and apparatus for detecting audio signal
CN109448705A (en) * 2018-10-17 2019-03-08 珠海格力电器股份有限公司 Voice segmentation method and device, computer device and readable storage medium
CN110867184A (en) * 2019-10-23 2020-03-06 张家港市祥隆五金厂 Voice intelligent terminal equipment
CN118230767A (en) * 2024-05-22 2024-06-21 深圳市创达电子有限公司 USB audio optimization method and system with self-adaptive sound environment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030091180A1 (en) * 1998-12-23 2003-05-15 Patrik Sorqvist Adaptive signal gain controller, system, and method
CN1181466C (en) * 2001-12-17 2004-12-22 中国科学院自动化研究所 Speech sound signal terminal point detecting method based on sub belt energy and characteristic detecting technique
TWI245259B (en) * 2002-12-20 2005-12-11 Ibm Sensor based speech recognizer selection, adaptation and combination

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI463859B (en) * 2008-01-18 2014-12-01 Chi Mei Comm Systems Inc Portable electronic device
CN103002380A (en) * 2011-09-13 2013-03-27 索尼公司 Information processing apparatus and information processing method
CN103002380B (en) * 2011-09-13 2017-08-15 索尼公司 Message processing device and information processing method
CN102394062A (en) * 2011-10-26 2012-03-28 华南理工大学 Method and system for automatically identifying voice recording equipment source
CN102857650A (en) * 2012-08-29 2013-01-02 苏州佳世达电通有限公司 Method for dynamically regulating voice
CN103280215A (en) * 2013-05-28 2013-09-04 北京百度网讯科技有限公司 Audio frequency feature library establishing method and device
CN103280215B (en) * 2013-05-28 2016-03-23 北京百度网讯科技有限公司 A kind of audio frequency feature library method for building up and device
US11417353B2 (en) 2014-03-12 2022-08-16 Huawei Technologies Co., Ltd. Method for detecting audio signal and apparatus
US10818313B2 (en) 2014-03-12 2020-10-27 Huawei Technologies Co., Ltd. Method for detecting audio signal and apparatus
CN107086043A (en) * 2014-03-12 2017-08-22 华为技术有限公司 The method and apparatus for detecting audio signal
CN104064197B (en) * 2014-06-20 2017-05-17 哈尔滨工业大学深圳研究生院 Method for improving speech recognition robustness on basis of dynamic information among speech frames
CN104064197A (en) * 2014-06-20 2014-09-24 哈尔滨工业大学深圳研究生院 Method for improving speech recognition robustness on basis of dynamic information among speech frames
CN104505095A (en) * 2014-12-22 2015-04-08 上海语知义信息技术有限公司 Voice control system and voice control method for alarm clock
CN104900237A (en) * 2015-04-24 2015-09-09 上海聚力传媒技术有限公司 Method, device and system for denoising audio information
CN105355197A (en) * 2015-10-30 2016-02-24 百度在线网络技术(北京)有限公司 Gain processing method and device for speech recognition system
CN105355197B (en) * 2015-10-30 2020-01-07 百度在线网络技术(北京)有限公司 Gain processing method and device for voice recognition system
CN109448705A (en) * 2018-10-17 2019-03-08 珠海格力电器股份有限公司 Voice segmentation method and device, computer device and readable storage medium
CN109448705B (en) * 2018-10-17 2021-01-29 珠海格力电器股份有限公司 Voice segmentation method and device, computer device and readable storage medium
CN110867184A (en) * 2019-10-23 2020-03-06 张家港市祥隆五金厂 Voice intelligent terminal equipment
CN118230767A (en) * 2024-05-22 2024-06-21 深圳市创达电子有限公司 USB audio optimization method and system with self-adaptive sound environment

Also Published As

Publication number Publication date
CN100369113C (en) 2008-02-13

Similar Documents

Publication Publication Date Title
US11657823B2 (en) Channel-compensated low-level features for speaker recognition
JP5089772B2 (en) Apparatus and method for detecting voice activity
US9524735B2 (en) Threshold adaptation in two-channel noise estimation and voice activity detection
Moattar et al. A simple but efficient real-time voice activity detection algorithm
CN103578470B (en) A kind of processing method and system of telephonograph data
CN1160698C (en) Endpointing of speech in noisy signal
CN1801326A (en) Method for adaptively improving speech recognition rate by means of gain
Bou-Ghazale et al. A robust endpoint detection of speech for noisy environments with application to automatic speech recognition
JPH09325790A (en) Method and device for processing voice
CN105306673A (en) Mobile terminal and automatic scene mode adjustment method thereof
CN1742322A (en) Noise reduction and audio-visual speech activity detection
EP3516652B1 (en) Channel-compensated low-level features for speaker recognition
CN111312291A (en) Signal-to-noise ratio detection method, system, mobile terminal and storage medium
CN113593599A (en) Method for removing noise signal in voice signal
WO2017128910A1 (en) Method, apparatus and electronic device for determining speech presence probability
CN110689905A (en) Voice activity detection system for video conference system
CN114566152B (en) Voice endpoint detection method based on deep learning
TW200811833A (en) Detection method for voice activity endpoint
CN112165558B (en) Method and device for detecting double-talk state, storage medium and terminal equipment
US20080228477A1 (en) Method and Device For Processing a Voice Signal For Robust Speech Recognition
CN112216285B (en) Multi-user session detection method, system, mobile terminal and storage medium
TWI756817B (en) Voice activity detection device and method
Ramírez et al. A new voice activity detector using subband order-statistics filters for robust speech recognition
Low An insight into the rise time of exponential smoothing for speech enhancement methods
Potamitis et al. Impulsive noise suppression using neural networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20060712

Assignee: The purple winter of Beijing is voice technology company limited with keen determination

Assignor: Institute of Automation, Chinese Academy of Sciences

Contract record no.: 2015110000014

Denomination of invention: Method for adaptively improving speech recognition rate by means of gain

Granted publication date: 20080213

License type: Common License

Record date: 20150519

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20060712

Assignee: Taro Technology (Hangzhou) Co., Ltd.

Assignor: The purple winter of Beijing is voice technology company limited with keen determination

Contract record no.: 2015110000050

Denomination of invention: Method for adaptively improving speech recognition rate by means of gain

Granted publication date: 20080213

License type: Common License

Record date: 20151130

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model