CN105575406A - Noise robustness detection method based on likelihood ratio test - Google Patents
Noise robustness detection method based on likelihood ratio test Download PDFInfo
- Publication number
- CN105575406A CN105575406A CN201610008285.2A CN201610008285A CN105575406A CN 105575406 A CN105575406 A CN 105575406A CN 201610008285 A CN201610008285 A CN 201610008285A CN 105575406 A CN105575406 A CN 105575406A
- Authority
- CN
- China
- Prior art keywords
- noise
- signal
- speech
- frame
- likelihood ratio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
Abstract
The invention discloses a noise robustness detection method based on a likelihood ratio test. The method makes improvements on three aspects, estimation of a signal to noise ratio, robustness setting for a threshold, and elimination of trailing distortion, so as to enable the proposed algorithm to have better detection performance in the low signal to noise ratio environment and especially in the unstationary noise environment, compared with the prior art. The method has a similar voice boundary detection correct rate with a multi-observation likelihood ratio test algorithm based on harmonic wave characteristics, and has higher voice detection accuracy than the multi-observation likelihood ratio test algorithm based on harmonic wave characteristics, thus proving that the method is better in the performance than a traditional method. At the same time, the method has similar performance under the signal to noise ratio of 15dB and 25dB, thus proving that the method has better robustness for the noise. The noise robustness detection method based on a likelihood ratio test can be used as an important effective method for front end preprocessing for a voice identification or voiceprint identification system in the practical environment, and has a good application value.
Description
Technical field
The present invention relates to speech processes and signal transacting field, refer in particular to a kind of detection method of the noise robustness based on likelihood ratio test.
Background technology
Speech terminals detection (VAD) is the very crucial part of in speech processes correlation technique, and the speech/non-speech that it not only can be used in speech enhan-cement detects, and can be applicable in the process such as feature extraction and voice signal dereverberation.Existing speech sound signal terminal point detection algorithm is mainly divided into three major types: the end-point detecting method based on time domain, the end-point detecting method based on frequency field and the end-point detecting method based on modeling statistics.
In practical application, high-precision speech terminals detection has extremely important effect to follow-up speech enhan-cement, end-point detection, speech recognition or Application on Voiceprint Recognition.But, existing speech terminals detection technology still also exists some problem and shortage, especially under actual channel environment, because the spectrum signature of voice signal voiceless sound and fricative composition and noise have very large similarity, and existing most of end-point detection algorithm is all the differentiation realizing voice and noise based on the syllable characteristic of voice own, therefore detect end points process in, may lose the initial sound of voice or ending sound cause truncation effect.Meanwhile, most of algorithm cannot all voice messagings of complete reservation, and when signal to noise ratio (S/N ratio) reduces, detection perform also will obviously decline.
Summary of the invention
Technical matters to be solved by this invention is the defect for prior art, there is provided and to arrange from the robustness of the estimation of signal to noise ratio (S/N ratio), threshold value respectively and hangover is eliminated three aspects and improved, make the algorithm that proposes under low signal-to-noise ratio environment, especially have the detection method of the noise robustness based on likelihood ratio test of better detection perform under nonstationary noise environment relative to existing algorithm.
The present invention is for solving the problems of the technologies described above by the following technical solutions: a kind of detection method of the noise robustness based on likelihood ratio test, comprises the following steps:
S1, by S filter, speech enhan-cement is carried out to Noisy Speech Signal, to weaken the impact of the noise signal in noisy speech on clean speech, and improve the smooth performance of the noise signal after filtering; Noisy Speech Signal after S filter speech enhan-cement
clean speech
and interference noise
Wherein, n is time-sampling index, and now can possess statistical iteration by the clean speech signal after S filter and interference noise and average is the characteristic of zero;
S2, carry out Fourier transform to noisy speech, the frequency spectrum factor of the frequency spectrum factor superposition interference noise of Noisy Speech Signal after filtering by clean speech signal on spectrum domain obtains;
(1)
Wherein,
,
with
for the Short-time Fourier factor of every frame signal, m is frame index, and k is each frequency range value in frame,
with
represent non-speech frame and speech frame respectively;
S3, calculating likelihood ratio, when the probability density of clean speech signal and noise signal all meets Gaussian distribution, observation signal
?
with
under probability density function be
(2)
Wherein
for the power spectrum of voice signal,
for the power spectrum of noise signal;
The likelihood ratio of this frame kth frequency range is:
(3)
Wherein,
,
represent prior weight and posteriori SNR respectively, and prior weight
and posteriori SNR
following relation is there is in direct decision-making estimator:
(4)
Wherein
the noise power spectrum of one frame;
S4, carry out noise estimation, and set threshold value
, by threshold value
determine that compared with the value of likelihood ratio present frame is voice segments or non-speech segment, when the value of likelihood ratio is greater than threshold value, judge that this frame is as speech frame for the first time, and when the value of likelihood ratio is less than threshold value, just assert that this frame is non-speech frame, specifically can be represented by following formula:
(5)
Wherein, K is frequency band sum;
with
represent non-speech frame and speech frame respectively;
S5, determine decision rule, the log-likelihood ratio of m frame is:
represent with
centered by continuous 2M+1 frame, then the decision rule being object with this 2M+1 log-likelihood ratio is:
Wherein
for the log-likelihood ratio under kth frequency range wherein
, observation signal is existed
with
probability substitute into wherein obtain:
Prior weight
posteriori SNR is obtained by maximum-likelihood estimation, that is:
Therefore, the value of log-likelihood ratio depends on that noise energy is composed
;
S6, hangover are eliminated, and when signal to noise ratio (S/N ratio) is low, noise energy is composed
become large, by reducing the threshold value chosen
reduce voiced segments probability of miscarriage of justice; Otherwise, by increasing threshold value
come to mate with high s/n ratio signal;
Noisy speech power spectrum is composed
by signals with noise power spectrum
smoothly obtain, smoothing factor
for time-frequency correlation function, then:
Its
the noise based on minimum statistics is utilized to estimate just to obtain every frame signal minimal noise power spectrum
.Relevant threshold value is composed to described noise energy
for:
Wherein
it is a constant coefficient of this threshold value.
The present invention adopts above technical scheme compared with prior art, there is following technique effect: vad algorithm proposed by the invention has similar SBR accuracy to the MOLRT algorithm based on harmonic characteristic, but than the MOLRT algorithm based on harmonic characteristic, there are more excellent many VAcc; The vad algorithm that the present invention proposes has similar performance under the signal to noise ratio (S/N ratio) of 15dB with 25dB, illustrates that this method has good robustness to noise.
Accompanying drawing explanation
Fig. 1 (a) is the schematic diagram of clean speech.
Fig. 1 (b) is the VAD result schematic diagram based on Sohn.
Fig. 1 (c) is the VAD result schematic diagram based on Tan.
Fig. 1 (d) is the VAD result schematic diagram based on the method for the invention.
Fig. 2 (a) is the section level Performance comparision under different signal to noise ratio (S/N ratio).
Fig. 2 (b) is the frame level Performance comparision under different signal to noise ratio (S/N ratio).
Fig. 2 (c) is the correct number of speech frame under different signal to noise ratio (S/N ratio).
Fig. 3 is the speech terminals detection block schematic illustration based on speech enhan-cement in the present invention.
Embodiment
Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:
As shown in Figure 3, the technical scheme that the present invention takes is as follows: a kind of detection method of the noise robustness based on likelihood ratio test, comprises the following steps:
S1, by S filter, speech enhan-cement is carried out to Noisy Speech Signal, to weaken the impact of the noise signal in noisy speech on clean speech, and improve the smooth performance of the noise signal after filtering; Noisy Speech Signal after S filter speech enhan-cement
clean speech
and interference noise
Wherein, n is time-sampling index, and now can possess statistical iteration by the clean speech signal after S filter and interference noise and average is the characteristic of zero;
S2, carry out Fourier transform to noisy speech, the frequency spectrum factor of the frequency spectrum factor superposition interference noise of Noisy Speech Signal after filtering by clean speech signal on spectrum domain obtains;
(1)
Wherein,
,
with
for the Short-time Fourier factor of every frame signal, m is frame index, and k is each frequency range value in frame,
with
represent non-speech frame and speech frame respectively;
S3, calculating likelihood ratio, when the probability density of clean speech signal and noise signal all meets Gaussian distribution, observation signal
?
with
under probability density function be
(2)
Wherein
for the power spectrum of voice signal,
for the power spectrum of noise signal;
The likelihood ratio of this frame kth frequency range is:
(3)
Wherein,
,
represent prior weight and posteriori SNR respectively, and prior weight
and posteriori SNR
following relation is there is in direct decision-making estimator:
(4)
Wherein
the noise power spectrum of one frame;
S4, carry out noise estimation, and set threshold value
, by threshold value
determine that compared with the value of likelihood ratio present frame is voice segments or non-speech segment, when the value of likelihood ratio is greater than threshold value, judge that this frame is as speech frame for the first time, and when the value of likelihood ratio is less than threshold value, just assert that this frame is non-speech frame, specifically can be represented by following formula:
(5)
Wherein, K is frequency band sum;
with
represent non-speech frame and speech frame respectively;
S5, determine decision rule, the log-likelihood ratio of m frame is:
represent with
centered by continuous 2M+1 frame, then the decision rule being object with this 2M+1 log-likelihood ratio is:
Wherein
for the log-likelihood ratio under kth frequency range wherein
, observation signal is existed
with
probability substitute into wherein obtain:
Prior weight
posteriori SNR is obtained by maximum-likelihood estimation, that is:
Therefore, the value of log-likelihood ratio depends on that noise energy is composed
;
S6, hangover are eliminated, and when signal to noise ratio (S/N ratio) is low, noise energy is composed
become large, by reducing the threshold value chosen
reduce voiced segments probability of miscarriage of justice; Otherwise, by increasing threshold value
come to mate with high s/n ratio signal;
Noisy speech power spectrum is composed
by signals with noise power spectrum
smoothly obtain, smoothing factor
for time-frequency correlation function, then:
Its
the noise based on minimum statistics is utilized to estimate just to obtain every frame signal minimal noise power spectrum
, compose relevant threshold value to described noise energy
for:
Wherein
it is a constant coefficient of this threshold value.
Particularly, noisy speech of the present invention
by clean speech
and interference noise
superposition obtains:
Wherein, n is time-sampling index.
Assuming that clean speech and interference noise have statistical iteration and average is the feature of zero, and the Fourier transform of noisy speech can be expressed as
(1)
Wherein,
,
with
for the Short-time Fourier factor of every frame signal, m is frame index, and k is each frequency range value in frame,
with
represent non-speech frame and speech frame respectively.Suppose that the probability density of clean speech signal and noise signal all meets Gaussian distribution, so observation signal
?
with
under probability density function be:
(2)
Wherein
with
be respectively the power spectrum of voice signal and noise signal.So likelihood ratio (LR) value of this frame kth frequency range is just:
(3)
Wherein
,
represent prior weight and posteriori SNR respectively, and prior weight
and posteriori SNR
following relation is there is in direct decision-making (DD) estimator:
(4)
Suppose to set a threshold value
compare with the value of LR and determine that present frame is voice segments or non-speech segment, meet:
(5)
Wherein, K is frequency band sum, can find out that the value of LR and prior weight, posteriori SNR have close relationship from formula (5), when posteriori SNR is very large, namely
time, then therefore the value of LR also becomes very large,
; And work as posteriori SNR
time, prior weight has just become the key parameter calculating LR.
The FB(flow block) of the whole speech terminals detection system based on speech enhan-cement as shown in Figure 3, from the log-likelihood ratio (LLR) of the known m frame of above-mentioned derivation is:
(6)
Suppose
represent with
centered by continuous 2M+1 frame, then the decision rule being object with this 2M+1 LLRs is:
(7)
For the log-likelihood ratio under kth frequency range wherein
observation signal can exist by we
with
probability substitute into wherein obtain:
(8)
This is because prior weight
can be obtained by maximum likelihood (ML) algorithm for estimating according to posteriori SNR:
Therefore, log-likelihood ratio LLR can be regarded as posteriori SNR simply
function, namely the value of LLR depends on that noise energy is composed
.
On the other hand, when signal to noise ratio (S/N ratio) is very low, i.e. noise energy spectrum
when becoming large, need the threshold value that less
reduce voiced segments probability of miscarriage of justice; Otherwise need large threshold value
come to mate with strong signal-noise ratio signal.As can be seen from analysis above, log-likelihood ratio LLR depends primarily on the degree of accuracy of noise energy spectrum.Therefore the minimal noise energy spectrum of threshold value and present frame is set up certain to contact, vad algorithm not only can be made to have better robustness for various signal to noise ratio (S/N ratio) environment, simultaneously because estimate that the minimal noise energy spectrum obtained is less than
and ensure that the redundance that voiced segments is correctly estimated.
Suppose energy spectrum
it is signals with noise power spectrum
smoothly obtain, smoothing factor
a time-frequency correlation function, then:
(9)
Now, the noise based on minimum statistics that external author can be utilized to propose is estimated just to obtain every frame signal minimal noise power spectrum
.
The threshold value that definition noise energy spectrum is relevant
for:
(12)
Wherein
it is a constant coefficient of this threshold value.
Verify the performance of proposed VAD method: in an experiment, adopt the clean language material of non-broadcasting recorded, totally 2906, sampling rate is fs=8kHz; This language material is mixed to get the noisy speech under different signal to noise ratio (S/N ratio) from steady, nonstationary noise; Wherein stationary noise comes from collection under actual environment and recording, and nonstationary noise (automobile noise and babble noise) comes from http://www.freesound.com and http://spib.rice.edu/spib/data/signals/noise/babble.html respectively.With length be the Hanning window of 200 as analysis window and analysis window, frequency range sum K=256; In noise is estimated, smoothing factor
, the speech probability p of priori (
)=p (
), and make in equation (10)
, continuous print LLR number is 2M+1=17.
Although experimenter's operating characteristic (ROC) curve is a general method in the performance verification of vad algorithm, but the method can only judge VAD performance in frame level, namely it can only emphasize the speech/non-speech frame that correctly have estimated how many frames, is but at a complete loss as to what to do to the judgement of voice segments/non-speech segment.Such as, in the vad algorithm of Sohn, its ROC curve does relatively perfect, but in practical situations both, the VAD method based on Sohn but there will be a lot of fragment.With a noisy speech, this situation is described, as shown in Figure 1.
From Fig. 1 (a) to Fig. 1 (d), we can see the integrality adopting the method for Sohn can not ensure voice segments under the environment of low signal-to-noise ratio, there will be much tiny fragment; The performance of Tan in this is more better.But too much the existence of fractionlet makes these two kinds of methods can not ensure automatic speech recognition effective application in a noisy environment.Therefore, the present invention, in order to verify the validity of vad algorithm, not only considers the performance of frame level, simultaneously the performance of the section of considering level.
Fig. 2 (a) to Fig. 2 (c) shows the VAD result under the different signal to noise ratio (S/N ratio) of stationary noise; From Fig. 2 (c), we can see, algorithm in this paper is in the correct number of the detection of speech frame
vAD method that is upper and Sohn is similar to, and is better than the VAD method of Tan proposition; Fig. 2 (a) then describes and will be better than other two kinds of methods far away in the detection of voice segments/non-speech segment.
It is as follows that speech frame under automobile noise environment correctly detects number schematic table:
Performance comparision schematic table under different nonstationary noise is as follows:
Above-mentioned two forms give the performance of different vad algorithm under different nonstationary noise; Can see that the method for Sohn has best effect in the detection of number of speech frames from first form, then as noted above, the correctness overemphasizing the speech frame of frame can not illustrate that this vad algorithm is optimum; Can draw to draw a conclusion from second form:
Vad algorithm proposed by the invention has similar SBR accuracy to the MOLRT algorithm based on harmonic characteristic, but have more excellent many VAcc than the MOLRT algorithm based on harmonic characteristic, this also just describes the end-point detecting method proposed in the present invention and has better performance than traditional method.
The vad algorithm that the present invention proposes has similar performance under the signal to noise ratio (S/N ratio) of 15dB with 25dB, and this has also just said that vad algorithm of the present invention has good robustness to noise.
By reference to the accompanying drawings embodiments of the present invention are explained in detail above, but the present invention is not limited to above-mentioned embodiment, in the ken that those of ordinary skill in the art possess, can also makes a variety of changes under the prerequisite not departing from present inventive concept.
Claims (1)
1., based on a detection method for the noise robustness of likelihood ratio test, comprise the following steps:
S1, by S filter, speech enhan-cement is carried out to Noisy Speech Signal, to weaken the impact of the noise signal in noisy speech on clean speech, and improve the smooth performance of the noise signal after filtering; Noisy Speech Signal after S filter speech enhan-cement
clean speech
and interference noise
Wherein, n is time-sampling index, and now can possess statistical iteration by the clean speech signal after S filter and interference noise and average is the characteristic of zero;
S2, carry out Fourier transform to noisy speech, the frequency spectrum factor of the frequency spectrum factor superposition interference noise of Noisy Speech Signal after filtering by clean speech signal on spectrum domain obtains;
(1)
Wherein,
,
with
for the Short-time Fourier factor of every frame signal, m is frame index, and k is each frequency range value in frame,
with
represent non-speech frame and speech frame respectively;
S3, calculating likelihood ratio, when the probability density of clean speech signal and noise signal all meets Gaussian distribution, observation signal
?
with
under probability density function be
(2)
Wherein
for the power spectrum of voice signal,
for the power spectrum of noise signal;
The likelihood ratio of this frame kth frequency range is:
(3)
Wherein,
,
represent prior weight and posteriori SNR respectively, and prior weight
and posteriori SNR
following relation is there is in direct decision-making estimator:
(4)
Wherein
the noise power spectrum of one frame;
S4, carry out noise estimation, and set threshold value
, by threshold value
determine that compared with the value of likelihood ratio present frame is voice segments or non-speech segment, when the value of likelihood ratio is greater than threshold value, judge that this frame is as speech frame for the first time, and when the value of likelihood ratio is less than threshold value, just assert that this frame is non-speech frame, specifically can be represented by following formula:
(5)
Wherein, K is frequency band sum;
with
represent non-speech frame and speech frame respectively;
S5, determine decision rule, the log-likelihood ratio of m frame is:
represent with
centered by continuous 2M+1 frame, then the decision rule being object with this 2M+1 log-likelihood ratio is:
Wherein
for the log-likelihood ratio under kth frequency range wherein
, observation signal is existed
with
probability substitute into wherein obtain:
Prior weight
posteriori SNR is obtained by maximum-likelihood estimation, that is:
Therefore, the value of log-likelihood ratio depends on that noise energy is composed
;
S6, hangover are eliminated, and when signal to noise ratio (S/N ratio) is low, noise energy is composed
become large, by reducing the threshold value chosen
reduce voiced segments probability of miscarriage of justice; Otherwise, by increasing threshold value
come to mate with high s/n ratio signal;
Noisy speech power spectrum is composed
by signals with noise power spectrum
smoothly obtain, smoothing factor
for time-frequency correlation function, then:
Its
the noise based on minimum statistics is utilized to estimate just to obtain every frame signal minimal noise power spectrum
, compose relevant threshold value to described noise energy
for:
Wherein
it is a constant coefficient of this threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610008285.2A CN105575406A (en) | 2016-01-07 | 2016-01-07 | Noise robustness detection method based on likelihood ratio test |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610008285.2A CN105575406A (en) | 2016-01-07 | 2016-01-07 | Noise robustness detection method based on likelihood ratio test |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105575406A true CN105575406A (en) | 2016-05-11 |
Family
ID=55885457
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610008285.2A Pending CN105575406A (en) | 2016-01-07 | 2016-01-07 | Noise robustness detection method based on likelihood ratio test |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105575406A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105788607A (en) * | 2016-05-20 | 2016-07-20 | 中国科学技术大学 | Speech enhancement method applied to dual-microphone array |
CN106356071A (en) * | 2016-08-30 | 2017-01-25 | 广州市百果园网络科技有限公司 | Noise detection method and device |
CN107393550A (en) * | 2017-07-14 | 2017-11-24 | 深圳永顺智信息科技有限公司 | Method of speech processing and device |
CN107484080A (en) * | 2016-05-30 | 2017-12-15 | 奥迪康有限公司 | The method of apparatus for processing audio and signal to noise ratio for estimation voice signal |
CN109378002A (en) * | 2018-10-11 | 2019-02-22 | 平安科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of voice print verification |
CN112908310A (en) * | 2021-01-20 | 2021-06-04 | 宁波方太厨具有限公司 | Voice instruction recognition method and system in intelligent electric appliance |
US20210327448A1 (en) * | 2018-12-18 | 2021-10-21 | Tencent Technology (Shenzhen) Company Limited | Speech noise reduction method and apparatus, computing device, and computer-readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101080765A (en) * | 2005-05-09 | 2007-11-28 | 株式会社东芝 | Voice activity detection apparatus and method |
CN101308653A (en) * | 2008-07-17 | 2008-11-19 | 安徽科大讯飞信息科技股份有限公司 | End-point detecting method applied to speech identification system |
CN103632677A (en) * | 2013-11-27 | 2014-03-12 | 腾讯科技(成都)有限公司 | Method and device for processing voice signal with noise, and server |
CN103730124A (en) * | 2013-12-31 | 2014-04-16 | 上海交通大学无锡研究院 | Noise robustness endpoint detection method based on likelihood ratio test |
CN104464728A (en) * | 2014-11-26 | 2015-03-25 | 河海大学 | Speech enhancement method based on Gaussian mixture model (GMM) noise estimation |
CN105023572A (en) * | 2014-04-16 | 2015-11-04 | 王景芳 | Noised voice end point robustness detection method |
-
2016
- 2016-01-07 CN CN201610008285.2A patent/CN105575406A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101080765A (en) * | 2005-05-09 | 2007-11-28 | 株式会社东芝 | Voice activity detection apparatus and method |
CN101308653A (en) * | 2008-07-17 | 2008-11-19 | 安徽科大讯飞信息科技股份有限公司 | End-point detecting method applied to speech identification system |
CN103632677A (en) * | 2013-11-27 | 2014-03-12 | 腾讯科技(成都)有限公司 | Method and device for processing voice signal with noise, and server |
CN103730124A (en) * | 2013-12-31 | 2014-04-16 | 上海交通大学无锡研究院 | Noise robustness endpoint detection method based on likelihood ratio test |
CN105023572A (en) * | 2014-04-16 | 2015-11-04 | 王景芳 | Noised voice end point robustness detection method |
CN104464728A (en) * | 2014-11-26 | 2015-03-25 | 河海大学 | Speech enhancement method based on Gaussian mixture model (GMM) noise estimation |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105788607A (en) * | 2016-05-20 | 2016-07-20 | 中国科学技术大学 | Speech enhancement method applied to dual-microphone array |
CN107484080A (en) * | 2016-05-30 | 2017-12-15 | 奥迪康有限公司 | The method of apparatus for processing audio and signal to noise ratio for estimation voice signal |
CN107484080B (en) * | 2016-05-30 | 2021-07-16 | 奥迪康有限公司 | Audio processing apparatus and method for estimating signal-to-noise ratio of sound signal |
CN106356071A (en) * | 2016-08-30 | 2017-01-25 | 广州市百果园网络科技有限公司 | Noise detection method and device |
CN106356071B (en) * | 2016-08-30 | 2019-10-25 | 广州市百果园网络科技有限公司 | A kind of noise detecting method and device |
CN107393550A (en) * | 2017-07-14 | 2017-11-24 | 深圳永顺智信息科技有限公司 | Method of speech processing and device |
CN107393550B (en) * | 2017-07-14 | 2021-03-19 | 深圳永顺智信息科技有限公司 | Voice processing method and device |
CN109378002A (en) * | 2018-10-11 | 2019-02-22 | 平安科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of voice print verification |
CN109378002B (en) * | 2018-10-11 | 2024-05-07 | 平安科技(深圳)有限公司 | Voiceprint verification method, voiceprint verification device, computer equipment and storage medium |
US20210327448A1 (en) * | 2018-12-18 | 2021-10-21 | Tencent Technology (Shenzhen) Company Limited | Speech noise reduction method and apparatus, computing device, and computer-readable storage medium |
CN112908310A (en) * | 2021-01-20 | 2021-06-04 | 宁波方太厨具有限公司 | Voice instruction recognition method and system in intelligent electric appliance |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105575406A (en) | Noise robustness detection method based on likelihood ratio test | |
CN103730124A (en) | Noise robustness endpoint detection method based on likelihood ratio test | |
CN109643552B (en) | Robust noise estimation for speech enhancement in variable noise conditions | |
Moattar et al. | A simple but efficient real-time voice activity detection algorithm | |
CN105023572A (en) | Noised voice end point robustness detection method | |
US20080059163A1 (en) | Method and apparatus for noise suppression, smoothing a speech spectrum, extracting speech features, speech recognition and training a speech model | |
CN103646649A (en) | High-efficiency voice detecting method | |
CN106653062A (en) | Spectrum-entropy improvement based speech endpoint detection method in low signal-to-noise ratio environment | |
KR20130024156A (en) | Apparatus and method for eliminating noise | |
CN112133322A (en) | Speech enhancement method based on noise classification optimization IMCRA algorithm | |
Moattar et al. | A new approach for robust realtime voice activity detection using spectral pattern | |
CN105513614A (en) | Voice activation detection method based on noise power spectrum density Gamma distribution statistical model | |
Lee et al. | Dynamic noise embedding: Noise aware training and adaptation for speech enhancement | |
CN103745729A (en) | Audio de-noising method and audio de-noising system | |
CN114023353A (en) | Transformer fault classification method and system based on cluster analysis and similarity calculation | |
Fan et al. | Speech noise estimation using enhanced minima controlled recursive averaging | |
CN113838476B (en) | Noise estimation method and device for noisy speech | |
Tang et al. | Speech Recognition in High Noise Environment. | |
CN113744725A (en) | Training method of voice endpoint detection model and voice noise reduction method | |
Li et al. | Sub-band based log-energy and its dynamic range stretching for robust in-car speech recognition | |
Yoon et al. | Speech enhancement based on speech/noise-dominant decision | |
Stadtschnitzer et al. | Reliable voice activity detection algorithms under adverse environments | |
Liu et al. | Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability | |
Górriz et al. | Bispectrum estimators for voice activity detection and speech recognition | |
Baek et al. | Mean normalization of power function based cepstral coefficients for robust speech recognition in noisy environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160511 |