CN105575406A - Noise robustness detection method based on likelihood ratio test - Google Patents

Noise robustness detection method based on likelihood ratio test Download PDF

Info

Publication number
CN105575406A
CN105575406A CN201610008285.2A CN201610008285A CN105575406A CN 105575406 A CN105575406 A CN 105575406A CN 201610008285 A CN201610008285 A CN 201610008285A CN 105575406 A CN105575406 A CN 105575406A
Authority
CN
China
Prior art keywords
noise
signal
speech
frame
likelihood ratio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610008285.2A
Other languages
Chinese (zh)
Inventor
李为
朱杰
包旭雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yinjiami Technology Co Ltd
Original Assignee
Shenzhen Yinjiami Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yinjiami Technology Co Ltd filed Critical Shenzhen Yinjiami Technology Co Ltd
Priority to CN201610008285.2A priority Critical patent/CN105575406A/en
Publication of CN105575406A publication Critical patent/CN105575406A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information

Abstract

The invention discloses a noise robustness detection method based on a likelihood ratio test. The method makes improvements on three aspects, estimation of a signal to noise ratio, robustness setting for a threshold, and elimination of trailing distortion, so as to enable the proposed algorithm to have better detection performance in the low signal to noise ratio environment and especially in the unstationary noise environment, compared with the prior art. The method has a similar voice boundary detection correct rate with a multi-observation likelihood ratio test algorithm based on harmonic wave characteristics, and has higher voice detection accuracy than the multi-observation likelihood ratio test algorithm based on harmonic wave characteristics, thus proving that the method is better in the performance than a traditional method. At the same time, the method has similar performance under the signal to noise ratio of 15dB and 25dB, thus proving that the method has better robustness for the noise. The noise robustness detection method based on a likelihood ratio test can be used as an important effective method for front end preprocessing for a voice identification or voiceprint identification system in the practical environment, and has a good application value.

Description

A kind of detection method of the noise robustness based on likelihood ratio test
Technical field
The present invention relates to speech processes and signal transacting field, refer in particular to a kind of detection method of the noise robustness based on likelihood ratio test.
Background technology
Speech terminals detection (VAD) is the very crucial part of in speech processes correlation technique, and the speech/non-speech that it not only can be used in speech enhan-cement detects, and can be applicable in the process such as feature extraction and voice signal dereverberation.Existing speech sound signal terminal point detection algorithm is mainly divided into three major types: the end-point detecting method based on time domain, the end-point detecting method based on frequency field and the end-point detecting method based on modeling statistics.
In practical application, high-precision speech terminals detection has extremely important effect to follow-up speech enhan-cement, end-point detection, speech recognition or Application on Voiceprint Recognition.But, existing speech terminals detection technology still also exists some problem and shortage, especially under actual channel environment, because the spectrum signature of voice signal voiceless sound and fricative composition and noise have very large similarity, and existing most of end-point detection algorithm is all the differentiation realizing voice and noise based on the syllable characteristic of voice own, therefore detect end points process in, may lose the initial sound of voice or ending sound cause truncation effect.Meanwhile, most of algorithm cannot all voice messagings of complete reservation, and when signal to noise ratio (S/N ratio) reduces, detection perform also will obviously decline.
Summary of the invention
Technical matters to be solved by this invention is the defect for prior art, there is provided and to arrange from the robustness of the estimation of signal to noise ratio (S/N ratio), threshold value respectively and hangover is eliminated three aspects and improved, make the algorithm that proposes under low signal-to-noise ratio environment, especially have the detection method of the noise robustness based on likelihood ratio test of better detection perform under nonstationary noise environment relative to existing algorithm.
The present invention is for solving the problems of the technologies described above by the following technical solutions: a kind of detection method of the noise robustness based on likelihood ratio test, comprises the following steps:
S1, by S filter, speech enhan-cement is carried out to Noisy Speech Signal, to weaken the impact of the noise signal in noisy speech on clean speech, and improve the smooth performance of the noise signal after filtering; Noisy Speech Signal after S filter speech enhan-cement clean speech and interference noise
Wherein, n is time-sampling index, and now can possess statistical iteration by the clean speech signal after S filter and interference noise and average is the characteristic of zero;
S2, carry out Fourier transform to noisy speech, the frequency spectrum factor of the frequency spectrum factor superposition interference noise of Noisy Speech Signal after filtering by clean speech signal on spectrum domain obtains;
(1)
Wherein, , with for the Short-time Fourier factor of every frame signal, m is frame index, and k is each frequency range value in frame, with represent non-speech frame and speech frame respectively;
S3, calculating likelihood ratio, when the probability density of clean speech signal and noise signal all meets Gaussian distribution, observation signal ? with under probability density function be
(2)
Wherein for the power spectrum of voice signal, for the power spectrum of noise signal;
The likelihood ratio of this frame kth frequency range is:
(3)
Wherein, , represent prior weight and posteriori SNR respectively, and prior weight and posteriori SNR following relation is there is in direct decision-making estimator:
(4)
Wherein the noise power spectrum of one frame;
S4, carry out noise estimation, and set threshold value , by threshold value determine that compared with the value of likelihood ratio present frame is voice segments or non-speech segment, when the value of likelihood ratio is greater than threshold value, judge that this frame is as speech frame for the first time, and when the value of likelihood ratio is less than threshold value, just assert that this frame is non-speech frame, specifically can be represented by following formula:
(5)
Wherein, K is frequency band sum; with represent non-speech frame and speech frame respectively;
S5, determine decision rule, the log-likelihood ratio of m frame is:
represent with centered by continuous 2M+1 frame, then the decision rule being object with this 2M+1 log-likelihood ratio is:
Wherein for the log-likelihood ratio under kth frequency range wherein , observation signal is existed with probability substitute into wherein obtain:
Prior weight posteriori SNR is obtained by maximum-likelihood estimation, that is:
Therefore, the value of log-likelihood ratio depends on that noise energy is composed ;
S6, hangover are eliminated, and when signal to noise ratio (S/N ratio) is low, noise energy is composed become large, by reducing the threshold value chosen reduce voiced segments probability of miscarriage of justice; Otherwise, by increasing threshold value come to mate with high s/n ratio signal;
Noisy speech power spectrum is composed by signals with noise power spectrum smoothly obtain, smoothing factor for time-frequency correlation function, then:
Its the noise based on minimum statistics is utilized to estimate just to obtain every frame signal minimal noise power spectrum .Relevant threshold value is composed to described noise energy for:
Wherein it is a constant coefficient of this threshold value.
The present invention adopts above technical scheme compared with prior art, there is following technique effect: vad algorithm proposed by the invention has similar SBR accuracy to the MOLRT algorithm based on harmonic characteristic, but than the MOLRT algorithm based on harmonic characteristic, there are more excellent many VAcc; The vad algorithm that the present invention proposes has similar performance under the signal to noise ratio (S/N ratio) of 15dB with 25dB, illustrates that this method has good robustness to noise.
Accompanying drawing explanation
Fig. 1 (a) is the schematic diagram of clean speech.
Fig. 1 (b) is the VAD result schematic diagram based on Sohn.
Fig. 1 (c) is the VAD result schematic diagram based on Tan.
Fig. 1 (d) is the VAD result schematic diagram based on the method for the invention.
Fig. 2 (a) is the section level Performance comparision under different signal to noise ratio (S/N ratio).
Fig. 2 (b) is the frame level Performance comparision under different signal to noise ratio (S/N ratio).
Fig. 2 (c) is the correct number of speech frame under different signal to noise ratio (S/N ratio).
Fig. 3 is the speech terminals detection block schematic illustration based on speech enhan-cement in the present invention.
Embodiment
Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:
As shown in Figure 3, the technical scheme that the present invention takes is as follows: a kind of detection method of the noise robustness based on likelihood ratio test, comprises the following steps:
S1, by S filter, speech enhan-cement is carried out to Noisy Speech Signal, to weaken the impact of the noise signal in noisy speech on clean speech, and improve the smooth performance of the noise signal after filtering; Noisy Speech Signal after S filter speech enhan-cement clean speech and interference noise
Wherein, n is time-sampling index, and now can possess statistical iteration by the clean speech signal after S filter and interference noise and average is the characteristic of zero;
S2, carry out Fourier transform to noisy speech, the frequency spectrum factor of the frequency spectrum factor superposition interference noise of Noisy Speech Signal after filtering by clean speech signal on spectrum domain obtains;
(1)
Wherein, , with for the Short-time Fourier factor of every frame signal, m is frame index, and k is each frequency range value in frame, with represent non-speech frame and speech frame respectively;
S3, calculating likelihood ratio, when the probability density of clean speech signal and noise signal all meets Gaussian distribution, observation signal ? with under probability density function be
(2)
Wherein for the power spectrum of voice signal, for the power spectrum of noise signal;
The likelihood ratio of this frame kth frequency range is:
(3)
Wherein, , represent prior weight and posteriori SNR respectively, and prior weight and posteriori SNR following relation is there is in direct decision-making estimator:
(4)
Wherein the noise power spectrum of one frame;
S4, carry out noise estimation, and set threshold value , by threshold value determine that compared with the value of likelihood ratio present frame is voice segments or non-speech segment, when the value of likelihood ratio is greater than threshold value, judge that this frame is as speech frame for the first time, and when the value of likelihood ratio is less than threshold value, just assert that this frame is non-speech frame, specifically can be represented by following formula:
(5)
Wherein, K is frequency band sum; with represent non-speech frame and speech frame respectively;
S5, determine decision rule, the log-likelihood ratio of m frame is:
represent with centered by continuous 2M+1 frame, then the decision rule being object with this 2M+1 log-likelihood ratio is:
Wherein for the log-likelihood ratio under kth frequency range wherein , observation signal is existed with probability substitute into wherein obtain:
Prior weight posteriori SNR is obtained by maximum-likelihood estimation, that is:
Therefore, the value of log-likelihood ratio depends on that noise energy is composed ;
S6, hangover are eliminated, and when signal to noise ratio (S/N ratio) is low, noise energy is composed become large, by reducing the threshold value chosen reduce voiced segments probability of miscarriage of justice; Otherwise, by increasing threshold value come to mate with high s/n ratio signal;
Noisy speech power spectrum is composed by signals with noise power spectrum smoothly obtain, smoothing factor for time-frequency correlation function, then:
Its the noise based on minimum statistics is utilized to estimate just to obtain every frame signal minimal noise power spectrum , compose relevant threshold value to described noise energy for:
Wherein it is a constant coefficient of this threshold value.
Particularly, noisy speech of the present invention by clean speech and interference noise superposition obtains:
Wherein, n is time-sampling index.
Assuming that clean speech and interference noise have statistical iteration and average is the feature of zero, and the Fourier transform of noisy speech can be expressed as
(1)
Wherein, , with for the Short-time Fourier factor of every frame signal, m is frame index, and k is each frequency range value in frame, with represent non-speech frame and speech frame respectively.Suppose that the probability density of clean speech signal and noise signal all meets Gaussian distribution, so observation signal ? with under probability density function be:
(2)
Wherein with be respectively the power spectrum of voice signal and noise signal.So likelihood ratio (LR) value of this frame kth frequency range is just:
(3)
Wherein , represent prior weight and posteriori SNR respectively, and prior weight and posteriori SNR following relation is there is in direct decision-making (DD) estimator:
(4)
Suppose to set a threshold value compare with the value of LR and determine that present frame is voice segments or non-speech segment, meet:
(5)
Wherein, K is frequency band sum, can find out that the value of LR and prior weight, posteriori SNR have close relationship from formula (5), when posteriori SNR is very large, namely time, then therefore the value of LR also becomes very large, ; And work as posteriori SNR time, prior weight has just become the key parameter calculating LR.
The FB(flow block) of the whole speech terminals detection system based on speech enhan-cement as shown in Figure 3, from the log-likelihood ratio (LLR) of the known m frame of above-mentioned derivation is:
(6)
Suppose represent with centered by continuous 2M+1 frame, then the decision rule being object with this 2M+1 LLRs is:
(7)
For the log-likelihood ratio under kth frequency range wherein observation signal can exist by we with probability substitute into wherein obtain:
(8)
This is because prior weight can be obtained by maximum likelihood (ML) algorithm for estimating according to posteriori SNR:
Therefore, log-likelihood ratio LLR can be regarded as posteriori SNR simply function, namely the value of LLR depends on that noise energy is composed .
On the other hand, when signal to noise ratio (S/N ratio) is very low, i.e. noise energy spectrum when becoming large, need the threshold value that less reduce voiced segments probability of miscarriage of justice; Otherwise need large threshold value come to mate with strong signal-noise ratio signal.As can be seen from analysis above, log-likelihood ratio LLR depends primarily on the degree of accuracy of noise energy spectrum.Therefore the minimal noise energy spectrum of threshold value and present frame is set up certain to contact, vad algorithm not only can be made to have better robustness for various signal to noise ratio (S/N ratio) environment, simultaneously because estimate that the minimal noise energy spectrum obtained is less than and ensure that the redundance that voiced segments is correctly estimated.
Suppose energy spectrum it is signals with noise power spectrum smoothly obtain, smoothing factor a time-frequency correlation function, then:
(9)
Now, the noise based on minimum statistics that external author can be utilized to propose is estimated just to obtain every frame signal minimal noise power spectrum .
The threshold value that definition noise energy spectrum is relevant for:
(12)
Wherein it is a constant coefficient of this threshold value.
Verify the performance of proposed VAD method: in an experiment, adopt the clean language material of non-broadcasting recorded, totally 2906, sampling rate is fs=8kHz; This language material is mixed to get the noisy speech under different signal to noise ratio (S/N ratio) from steady, nonstationary noise; Wherein stationary noise comes from collection under actual environment and recording, and nonstationary noise (automobile noise and babble noise) comes from http://www.freesound.com and http://spib.rice.edu/spib/data/signals/noise/babble.html respectively.With length be the Hanning window of 200 as analysis window and analysis window, frequency range sum K=256; In noise is estimated, smoothing factor , the speech probability p of priori ( )=p ( ), and make in equation (10) , continuous print LLR number is 2M+1=17.
Although experimenter's operating characteristic (ROC) curve is a general method in the performance verification of vad algorithm, but the method can only judge VAD performance in frame level, namely it can only emphasize the speech/non-speech frame that correctly have estimated how many frames, is but at a complete loss as to what to do to the judgement of voice segments/non-speech segment.Such as, in the vad algorithm of Sohn, its ROC curve does relatively perfect, but in practical situations both, the VAD method based on Sohn but there will be a lot of fragment.With a noisy speech, this situation is described, as shown in Figure 1.
From Fig. 1 (a) to Fig. 1 (d), we can see the integrality adopting the method for Sohn can not ensure voice segments under the environment of low signal-to-noise ratio, there will be much tiny fragment; The performance of Tan in this is more better.But too much the existence of fractionlet makes these two kinds of methods can not ensure automatic speech recognition effective application in a noisy environment.Therefore, the present invention, in order to verify the validity of vad algorithm, not only considers the performance of frame level, simultaneously the performance of the section of considering level.
Fig. 2 (a) to Fig. 2 (c) shows the VAD result under the different signal to noise ratio (S/N ratio) of stationary noise; From Fig. 2 (c), we can see, algorithm in this paper is in the correct number of the detection of speech frame vAD method that is upper and Sohn is similar to, and is better than the VAD method of Tan proposition; Fig. 2 (a) then describes and will be better than other two kinds of methods far away in the detection of voice segments/non-speech segment.
It is as follows that speech frame under automobile noise environment correctly detects number schematic table:
Performance comparision schematic table under different nonstationary noise is as follows:
Above-mentioned two forms give the performance of different vad algorithm under different nonstationary noise; Can see that the method for Sohn has best effect in the detection of number of speech frames from first form, then as noted above, the correctness overemphasizing the speech frame of frame can not illustrate that this vad algorithm is optimum; Can draw to draw a conclusion from second form:
Vad algorithm proposed by the invention has similar SBR accuracy to the MOLRT algorithm based on harmonic characteristic, but have more excellent many VAcc than the MOLRT algorithm based on harmonic characteristic, this also just describes the end-point detecting method proposed in the present invention and has better performance than traditional method.
The vad algorithm that the present invention proposes has similar performance under the signal to noise ratio (S/N ratio) of 15dB with 25dB, and this has also just said that vad algorithm of the present invention has good robustness to noise.
By reference to the accompanying drawings embodiments of the present invention are explained in detail above, but the present invention is not limited to above-mentioned embodiment, in the ken that those of ordinary skill in the art possess, can also makes a variety of changes under the prerequisite not departing from present inventive concept.

Claims (1)

1., based on a detection method for the noise robustness of likelihood ratio test, comprise the following steps:
S1, by S filter, speech enhan-cement is carried out to Noisy Speech Signal, to weaken the impact of the noise signal in noisy speech on clean speech, and improve the smooth performance of the noise signal after filtering; Noisy Speech Signal after S filter speech enhan-cement clean speech and interference noise
Wherein, n is time-sampling index, and now can possess statistical iteration by the clean speech signal after S filter and interference noise and average is the characteristic of zero;
S2, carry out Fourier transform to noisy speech, the frequency spectrum factor of the frequency spectrum factor superposition interference noise of Noisy Speech Signal after filtering by clean speech signal on spectrum domain obtains;
(1)
Wherein, , with for the Short-time Fourier factor of every frame signal, m is frame index, and k is each frequency range value in frame, with represent non-speech frame and speech frame respectively;
S3, calculating likelihood ratio, when the probability density of clean speech signal and noise signal all meets Gaussian distribution, observation signal ? with under probability density function be
(2)
Wherein for the power spectrum of voice signal, for the power spectrum of noise signal;
The likelihood ratio of this frame kth frequency range is:
(3)
Wherein, , represent prior weight and posteriori SNR respectively, and prior weight and posteriori SNR following relation is there is in direct decision-making estimator:
(4)
Wherein the noise power spectrum of one frame;
S4, carry out noise estimation, and set threshold value , by threshold value determine that compared with the value of likelihood ratio present frame is voice segments or non-speech segment, when the value of likelihood ratio is greater than threshold value, judge that this frame is as speech frame for the first time, and when the value of likelihood ratio is less than threshold value, just assert that this frame is non-speech frame, specifically can be represented by following formula:
(5)
Wherein, K is frequency band sum; with represent non-speech frame and speech frame respectively;
S5, determine decision rule, the log-likelihood ratio of m frame is:
represent with centered by continuous 2M+1 frame, then the decision rule being object with this 2M+1 log-likelihood ratio is:
Wherein for the log-likelihood ratio under kth frequency range wherein , observation signal is existed with probability substitute into wherein obtain:
Prior weight posteriori SNR is obtained by maximum-likelihood estimation, that is:
Therefore, the value of log-likelihood ratio depends on that noise energy is composed ;
S6, hangover are eliminated, and when signal to noise ratio (S/N ratio) is low, noise energy is composed become large, by reducing the threshold value chosen reduce voiced segments probability of miscarriage of justice; Otherwise, by increasing threshold value come to mate with high s/n ratio signal;
Noisy speech power spectrum is composed by signals with noise power spectrum smoothly obtain, smoothing factor for time-frequency correlation function, then:
Its the noise based on minimum statistics is utilized to estimate just to obtain every frame signal minimal noise power spectrum , compose relevant threshold value to described noise energy for:
Wherein it is a constant coefficient of this threshold value.
CN201610008285.2A 2016-01-07 2016-01-07 Noise robustness detection method based on likelihood ratio test Pending CN105575406A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610008285.2A CN105575406A (en) 2016-01-07 2016-01-07 Noise robustness detection method based on likelihood ratio test

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610008285.2A CN105575406A (en) 2016-01-07 2016-01-07 Noise robustness detection method based on likelihood ratio test

Publications (1)

Publication Number Publication Date
CN105575406A true CN105575406A (en) 2016-05-11

Family

ID=55885457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610008285.2A Pending CN105575406A (en) 2016-01-07 2016-01-07 Noise robustness detection method based on likelihood ratio test

Country Status (1)

Country Link
CN (1) CN105575406A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105788607A (en) * 2016-05-20 2016-07-20 中国科学技术大学 Speech enhancement method applied to dual-microphone array
CN106356071A (en) * 2016-08-30 2017-01-25 广州市百果园网络科技有限公司 Noise detection method and device
CN107393550A (en) * 2017-07-14 2017-11-24 深圳永顺智信息科技有限公司 Method of speech processing and device
CN107484080A (en) * 2016-05-30 2017-12-15 奥迪康有限公司 The method of apparatus for processing audio and signal to noise ratio for estimation voice signal
CN109378002A (en) * 2018-10-11 2019-02-22 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of voice print verification
CN112908310A (en) * 2021-01-20 2021-06-04 宁波方太厨具有限公司 Voice instruction recognition method and system in intelligent electric appliance
US20210327448A1 (en) * 2018-12-18 2021-10-21 Tencent Technology (Shenzhen) Company Limited Speech noise reduction method and apparatus, computing device, and computer-readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080765A (en) * 2005-05-09 2007-11-28 株式会社东芝 Voice activity detection apparatus and method
CN101308653A (en) * 2008-07-17 2008-11-19 安徽科大讯飞信息科技股份有限公司 End-point detecting method applied to speech identification system
CN103632677A (en) * 2013-11-27 2014-03-12 腾讯科技(成都)有限公司 Method and device for processing voice signal with noise, and server
CN103730124A (en) * 2013-12-31 2014-04-16 上海交通大学无锡研究院 Noise robustness endpoint detection method based on likelihood ratio test
CN104464728A (en) * 2014-11-26 2015-03-25 河海大学 Speech enhancement method based on Gaussian mixture model (GMM) noise estimation
CN105023572A (en) * 2014-04-16 2015-11-04 王景芳 Noised voice end point robustness detection method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080765A (en) * 2005-05-09 2007-11-28 株式会社东芝 Voice activity detection apparatus and method
CN101308653A (en) * 2008-07-17 2008-11-19 安徽科大讯飞信息科技股份有限公司 End-point detecting method applied to speech identification system
CN103632677A (en) * 2013-11-27 2014-03-12 腾讯科技(成都)有限公司 Method and device for processing voice signal with noise, and server
CN103730124A (en) * 2013-12-31 2014-04-16 上海交通大学无锡研究院 Noise robustness endpoint detection method based on likelihood ratio test
CN105023572A (en) * 2014-04-16 2015-11-04 王景芳 Noised voice end point robustness detection method
CN104464728A (en) * 2014-11-26 2015-03-25 河海大学 Speech enhancement method based on Gaussian mixture model (GMM) noise estimation

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105788607A (en) * 2016-05-20 2016-07-20 中国科学技术大学 Speech enhancement method applied to dual-microphone array
CN107484080A (en) * 2016-05-30 2017-12-15 奥迪康有限公司 The method of apparatus for processing audio and signal to noise ratio for estimation voice signal
CN107484080B (en) * 2016-05-30 2021-07-16 奥迪康有限公司 Audio processing apparatus and method for estimating signal-to-noise ratio of sound signal
CN106356071A (en) * 2016-08-30 2017-01-25 广州市百果园网络科技有限公司 Noise detection method and device
CN106356071B (en) * 2016-08-30 2019-10-25 广州市百果园网络科技有限公司 A kind of noise detecting method and device
CN107393550A (en) * 2017-07-14 2017-11-24 深圳永顺智信息科技有限公司 Method of speech processing and device
CN107393550B (en) * 2017-07-14 2021-03-19 深圳永顺智信息科技有限公司 Voice processing method and device
CN109378002A (en) * 2018-10-11 2019-02-22 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of voice print verification
CN109378002B (en) * 2018-10-11 2024-05-07 平安科技(深圳)有限公司 Voiceprint verification method, voiceprint verification device, computer equipment and storage medium
US20210327448A1 (en) * 2018-12-18 2021-10-21 Tencent Technology (Shenzhen) Company Limited Speech noise reduction method and apparatus, computing device, and computer-readable storage medium
CN112908310A (en) * 2021-01-20 2021-06-04 宁波方太厨具有限公司 Voice instruction recognition method and system in intelligent electric appliance

Similar Documents

Publication Publication Date Title
CN105575406A (en) Noise robustness detection method based on likelihood ratio test
CN103730124A (en) Noise robustness endpoint detection method based on likelihood ratio test
CN109643552B (en) Robust noise estimation for speech enhancement in variable noise conditions
Moattar et al. A simple but efficient real-time voice activity detection algorithm
CN105023572A (en) Noised voice end point robustness detection method
US20080059163A1 (en) Method and apparatus for noise suppression, smoothing a speech spectrum, extracting speech features, speech recognition and training a speech model
CN103646649A (en) High-efficiency voice detecting method
CN106653062A (en) Spectrum-entropy improvement based speech endpoint detection method in low signal-to-noise ratio environment
KR20130024156A (en) Apparatus and method for eliminating noise
CN112133322A (en) Speech enhancement method based on noise classification optimization IMCRA algorithm
Moattar et al. A new approach for robust realtime voice activity detection using spectral pattern
CN105513614A (en) Voice activation detection method based on noise power spectrum density Gamma distribution statistical model
Lee et al. Dynamic noise embedding: Noise aware training and adaptation for speech enhancement
CN103745729A (en) Audio de-noising method and audio de-noising system
CN114023353A (en) Transformer fault classification method and system based on cluster analysis and similarity calculation
Fan et al. Speech noise estimation using enhanced minima controlled recursive averaging
CN113838476B (en) Noise estimation method and device for noisy speech
Tang et al. Speech Recognition in High Noise Environment.
CN113744725A (en) Training method of voice endpoint detection model and voice noise reduction method
Li et al. Sub-band based log-energy and its dynamic range stretching for robust in-car speech recognition
Yoon et al. Speech enhancement based on speech/noise-dominant decision
Stadtschnitzer et al. Reliable voice activity detection algorithms under adverse environments
Liu et al. Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability
Górriz et al. Bispectrum estimators for voice activity detection and speech recognition
Baek et al. Mean normalization of power function based cepstral coefficients for robust speech recognition in noisy environment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160511