CN103021421A - Multilevel screening detecting recognizing method for shots - Google Patents

Multilevel screening detecting recognizing method for shots Download PDF

Info

Publication number
CN103021421A
CN103021421A CN2012105740037A CN201210574003A CN103021421A CN 103021421 A CN103021421 A CN 103021421A CN 2012105740037 A CN2012105740037 A CN 2012105740037A CN 201210574003 A CN201210574003 A CN 201210574003A CN 103021421 A CN103021421 A CN 103021421A
Authority
CN
China
Prior art keywords
frame
signal
short
template signal
mfcc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012105740037A
Other languages
Chinese (zh)
Inventor
张涛
苏春玲
陈志�
王晓晨
蔡晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN2012105740037A priority Critical patent/CN103021421A/en
Publication of CN103021421A publication Critical patent/CN103021421A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

Disclosed is a multilevel screening detecting recognizing method for shots. The method includes: selecting a single shot template signal for framing; extracting feature coefficient of MFCC (Mel-frequency cepstral coefficient) of the template signal; selecting a to-be-tested signal for framing; calculating short-time energy and average short-time zero-crossing rate of the to-be-tested signal at the current frame and judging; when continuous effective frames equals to three-seconds those of the template signal, using the front two-seconds part of the continuous effective frames as a target section, and allowing the rest one-second part to participate judging of the next frame, extracting feature coefficient of MFCC from frames in the target section, if the matching distance between the feature coefficient of MFCC of the template signal and the feature coefficient of MFCC of the to-be-tested signal is smaller than a threshold obtained by training, judging the target section as a target signal, and otherwise, not judging the target section as the target signal. Time domain feature parameters, MFCC and DTW (dynamic time warping) are well combined, and system calculation quantity and recognition rate are taken into account at the same time.

Description

The multistage screening that is used for shot detects recognition methods
Technical field
The present invention relates to a kind of shot and detect recognition methods.Particularly relate to a kind of multistage screening for shot and detect recognition methods.
Background technology
Sound is ubiquitous, and the detection of sound is the important content of sound research field with identification always.About detection and the identification of sound, can be divided into two aspects: non-speech recognition system and speech recognition system.More deep for the detection Study of recognition of voice also has the system and method for comparative maturity.In the time of aspect the research non-voice, can use for reference algorithm and the technology of voice aspect, two systems all are made of characteristic parameter extraction algorithm and pattern matching algorithm substantially.
Aspect characteristic parameter extraction, can for detection of characteristic parameter have a lot, can classify from three aspects of time domain, frequency domain and homomorphism (cepstrum).Time domain charactreristic parameter comprises: short signal energy, the average zero-crossing rate of short signal, signal short-time autocorrelation function and average magnitude difference function.The characteristics of time domain charactreristic parameter are that extraction algorithm is all uncomplicated, but shortcoming is limited to the distinguishing ability of signal, and the scope of application has end-point detection and voice to divide frame.The frequency domain character parameter comprises: Fourier transform, discrete cosine transform, linear prediction analysis.Frequency domain character parameter and human auditory system have certain relation, but the frequency domain character parameter is applicable to additive signal, and be bad for the product composite signal processing power of complexity.Homomorphism characteristic parameter: linear prediction cepstrum coefficient coefficient and Mel frequency cepstral coefficient (Mel frequency cepstrum coefficient, MFCC).Nonlinear system analysis is got up very difficult, need to carry out homomorphic analysis, manages that nonlinear problem is converted into linear problem and processes.
At pattern match and model training technical elements, main technology can be summarized as: dynamic time technology (the Dynamic Time Warping that reforms, DTW), hidden Markov model (hidden Markov model, HMM) and artificial neural network.In these three kinds of technology, DTW is a kind of pattern match and model training technology early, its applied dynamic programming method has successfully solved the difficult problem that duration did not wait when the voice signal property argument sequence compared, low and the discrimination of its algorithm complex also has good performance for some particular aspects, has especially obtained superperformance in alone word voice identification.
For the sound detection of accident, such as shot, input signal is similar to the isolated word in the voice, and the needed matching template of system is less.Be used for this type of identification, DTW algorithm and HMM algorithm are under identical environmental baseline, recognition effect is more or less the same, but the HMM algorithm is more complex, the important HMM of being embodied in algorithm and need to provide a large amount of speech datas in the training stage, by the getable model parameter of repeatedly calculating, and need and outer calculating hardly in the training of DTW algorithm.So the DTW algorithm is very briefer to this input signal, when being similar to tone signal and template fewer sound being identified again, all be well suited for aspect algorithm complex and the discrimination, can obtain good effect.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of multistage screening that is used for shot that can detect fast and accurately the public place shot to detect recognition methods.
The technical solution adopted in the present invention is: a kind of multistage screening for shot detects recognition methods, comprises the steps:
1) from 8KHz~48KHz, determines a sample frequency, choose the template signal of the single shot corresponding with this sample frequency, divide frame to process to this template signal;
2) characteristic coefficient of the cepstrum feature parameter MFCC of extraction template signal;
3) choose the sample frequency measured signal identical with the sample frequency described in the step 1), and carry out with step 1) in template signal divide the frame identical minute frame of counting to process;
4) calculate short-time energy and the short-time average zero-crossing rate of measured signal present frame, if short-time energy and short-time average zero-crossing rate the two one of satisfy corresponding decision condition, just the present frame of measured signal as valid frame and preserve, enter step 5); If but the two is neither to satisfy condition has in first three frame of measured signal present frame and satisfy condition, also smoothly be this present frame valid frame and preservation, enter step 5); If do not satisfy condition in first three frame, then present frame is invalid frame, enters step 6);
5) when continuous available frame count equals the frame number of 3/2 template signal, front 2/2 part identical with the frame number of template signal in this continuous effective frame as target phase, is entered step 7), all the other 1/2 parts are returned the judgement that step 4) participates in next frame;
6) satisfy when the available frame count of before this invalid frame, preserving: during the frame number of the frame number<available frame count of 1/2 template signal<3/2 template signal, the valid frame that this is continuous is as target phase, enter step 7), otherwise step 4) is returned in the data zero clearing that will preserve;
7) frame in the target phase is extracted the characteristic coefficient of cepstrum feature parameter MFCC, if the threshold value that the matching distance of the characteristic coefficient of the characteristic coefficient of the cepstrum feature parameter MFCC of template signal and the cepstrum feature parameter MFCC of signal to be detected draws less than training is then thought echo signal with this target phase; Otherwise, judge that this target phase is not echo signal.
Each frame of template signal described in the step 1) is divided into 256~1024 points.
Corresponding decision condition described in the step 4) is, the short-time energy of every frame is greater than the minimum value of the short-time energy of setting, and the short-time average zero-crossing rate of every frame is within the scope of setting.
The judgement that described in the step 5) all the other 1/2 part is returned step 4) participation next frame is, with the continuous effective frame of this 1/2 part as the present frame front in the step 4).
Multistage screening for shot of the present invention detects recognition methods, by multistage screening with a plurality of decision thresholds are set, time domain charactreristic parameter, cepstrum feature parameter and DTW algorithm is well combined, and has taken into account system-computed amount and discrimination.Detection algorithm of the present invention compares MFCC﹠amp at the operand that detects; The DTW algorithm is little a lot, and is high more a lot of than the algorithm of only using short-time energy and short-time average zero-crossing rate combination in the accuracy that detects.The present invention can be applicable to the warning system that the public place shot detects, and lower operand is easy to realize at hardware platform, and robustness can guarantee again the accuracy and the validity that detect preferably.
Description of drawings
Fig. 1 is the partial detection synoptic diagram of the parameter that adopts of the present invention;
It is large that Fig. 2 is that loss becomes, the false drop rate partial detection synoptic diagram that diminishes;
Fig. 3 is that loss diminishes, and fallout ratio becomes most of testing result synoptic diagram.
Among the figure, solid box represents manual annotation results; The dotted line frame represents the algorithm testing result.
Embodiment
Below in conjunction with embodiment and accompanying drawing the multistage screening detection recognition methods for shot of the present invention is made a detailed description.
Multistage screening for shot of the present invention detects recognition methods, it is the shot detection for the public place, because the shot that occurs in the public place can be fewer, so can carry out hierarchical detection to the signal that gathers, can utilize first short-time energy and short-time average zero-crossing rate to carry out the first order detects, again the result who satisfies condition is carried out the second level and detect, at last the testing result of the second level is carried out the detection of the third level.
Multistage screening for shot of the present invention detects recognizer, comprises the steps:
1) from 8KHz~48KHz, determines a sample frequency, choose the template signal of the single shot corresponding with this sample frequency, divide frame to process to this template signal; Each frame of described template signal is divided into 256~1024 points.
According to fs(48KHz) sample frequency choose template signal, quantified precision is 16, and with the sampled point of a fixed qty (1024) as a frame, template signal is divided into a plurality of frames.
2) characteristic coefficient of the cepstrum feature parameter MFCC of extraction template signal;
Obtain respectively the characteristic coefficient of cepstrum feature parameter MFCC on the N rank (N generally gets 12) of each frame of template signal.In the prior art, the extraction of the characteristic coefficient of cepstrum feature parameter MFCC is by WangBing Xi, Qu Dan, Peng Xuan. practical speech recognition basis [M]. National Defense Industry Press, 2005. and Li Fuhai, Ma Jinwen, Huang Dezhi.MFCC and SVM Basedon Recognition of Chinese Vowels[C] //CIS 2005, Part II, LNAI 3802.[s.l.]: [s.n.], the computing method that provide among the 2005:812-819 are calculated.The leaching process of the characteristic coefficient of cepstrum feature parameter MFCC is roughly: at first the voice signal behind minute frame is done discrete fourier and change, obtain spectrum distribution information.Ask again spectrum amplitude square, obtain energy spectrum.With the triangular filter group of energy spectrum by one group of Mel yardstick, and calculate the logarithm energy S(m that each bank of filters is exported), obtain the MFCC characteristic coefficient through discrete cosine transform again.
3) choose the sample frequency measured signal identical with the sample frequency described in the step 1), and carry out with step 1) in template signal divide the frame identical minute frame of counting to process;
4) calculate short-time energy and the short-time average zero-crossing rate of measured signal present frame, if short-time energy and short-time average zero-crossing rate the two one of satisfy corresponding decision condition, just the present frame of measured signal as valid frame and preserve, enter step 5); If but the two is neither to satisfy condition has in first three frame of measured signal present frame and satisfy condition, also smoothly be this present frame valid frame and preservation, enter step 5); If do not satisfy condition in first three frame, then present frame is invalid frame, enters step 5);
Described corresponding decision condition is, the short-time energy of every frame is greater than the minimum value of the short-time energy of setting, and the short-time average zero-crossing rate of every frame is within the scope of setting.
As, the short-time energy of establishing every frame is energy, and the short-time average zero-crossing rate of every frame is zcr_num, and the minimum threshold of setting short-time energy is EN_MIN, and the up and down thresholding of short-time average zero-crossing rate is respectively ZCR1, ZCR2.As energy〉when EN_MIN or ZCR1<zcr_num<ZCR2, with present frame as valid frame and preserve; When the two does not satisfy condition, satisfy condition if having in first three frame of present frame, then present frame is also smoothed for valid frame and preserve.
5) when continuous available frame count equals the frame number of 3/2 template signal, front 2/2 part identical with the frame number of template signal in this continuous effective frame as target phase, is entered step 6), all the other 1/2 parts are returned the judgement that step 4) participates in next frame; The judgement that described all the other 1/2 parts are returned step 4) participation next frame is, with the continuous effective frame of this 1/2 part as present frame front in the step 4).
When the available frame count of preserving before this invalid frame satisfies: during the frame number of the frame number<available frame count of 1/2 template signal<3/2 template signal, the valid frame that this is continuous is as target phase, enter step 6), otherwise step 4) is returned in the data zero clearing that will preserve;
As, the frame number of establishing the continuous effective frame is fra_num, and setting the template frame number is tem_num, and the minimum threshold of continuous effective frame frame number is FRA_MIN.When fra_num<FRA_MIN, be judged to corresponding frame invalid and with the data zero clearing of preserving; When fra_num reaches tem_num+FRA_MIN, front tem_num frame as a target phase, is carried out the detection of next stage, simultaneously with the former frames of rear FRA_MIN frame as next section; When FRA_MIN<fra_num<tem_num+FRA_MIN, directly it is carried out the analysis of next stage as a target.
6) frame in the target phase is extracted the characteristic coefficient of cepstrum feature parameter MFCC, if the threshold value that the matching distance of the characteristic coefficient of the characteristic coefficient of the cepstrum feature parameter MFCC of template signal and the cepstrum feature parameter MFCC of signal to be detected draws less than training is then thought echo signal with this target phase; Otherwise, judge that this target phase is not echo signal.
That is, establish template signal cepstrum feature parameter MFCC characteristic coefficient and be dist by the matching distance of the characteristic coefficient of the detected cepstrum feature parameter MFCC that may target phase in the second level, setting the threshold value that training draws is GUN_MAX.When dist<GUN_MAX, determine that it is the object event shot; Otherwise be judged to non-object event.
Because the frame number to the continuous effective frame has minimum requirements, for fear of failing to judge, reduce loss, so when (first order) step 4) is judged valid frame, adopt level and smooth mechanism, valid frame can be smoothly following closely three frame invalid frames, the effective like this length that guarantees target phase greatly reduces loss, makes Detection accuracy of the present invention higher.
It is template that a pure single shot is got in experiment, and sample signal is 11 frames (tem_num=11), and sample frequency is 48000Hz, and each sampling point 16bit, every frame sign are 1024 sampled points.
Measured signal is one section continuous voice signal that the complex environments such as music, voice and braking automobile are arranged, and has 1953 frames, has carried out respectively the detection of manual mark and program.Set EN_MIN=53, ZCR1=65, ZCR2=100, FRA_MIN=6, GUN_MAX=4525.The partial test result schematic diagram as shown in Figure 1.Testing result is added up, and can calculate total undetected frame number is 87, then loss
Figure BDA00002640586600041
Total false retrieval frame number is 237, then fallout ratio β = 237 1953 × 100 % = 12.14 % .
By different parameter threshold values is set, can obtain different losss and fallout ratio.Loss and fallout ratio are a pair of this that long parameters that disappear, and both can not reach optimum simultaneously, only have as the case may be, select optimal parameter of suitable present case.If set EN_MIN=55, ZCR1=68, ZCR2=95, FRA_MIN=6, GUN_MAX=4520, then detected as a result loss α can become greatly, and fallout ratio β can diminish.The partial detection synoptic diagram as shown in Figure 2.Testing result is added up, and can calculate total undetected frame number is 203, then loss α=10.39%; Total false retrieval frame number is 152, then fallout ratio β=7.78%.If set EN_MIN=50, ZCR1=60, ZCR2=105, FRA_MIN=6, GUN_MAX=4530, then detected as a result loss α can diminish, and it is large that fallout ratio β can become.The partial detection synoptic diagram as shown in Figure 3.Testing result is added up, and can calculate total undetected frame number is 82, then loss α=4.20%; Total false retrieval frame number is 268, then fallout ratio β=13.72%.
Can be found out by above-mentioned experiment, the present invention not only on operand than traditional MFCC﹠amp; The DTW algorithm is little a lot, and passes through the detection of the target phase of the first order (step 4)), the second level (step 5) and step 6)), well finds the terminal of shot, like this so that matching result is more accurate, makes the detection discrimination higher.Because shot belongs to danger signal, and is larger on the impact of safety for the detection loss of this sound, can find out that from experimental result testing result of the present invention also is more prone to non-echo signal is judged as echo signal.So as seen, the present invention not only is easy to transplant and realization at hardware such as DSP and ARM, and has certain robustness, guarantees the accuracy and the validity that detect.

Claims (4)

1. a multistage screening that is used for shot detects recognition methods, it is characterized in that, comprises the steps:
1) from 8KHz~48KHz, determines a sample frequency, choose the template signal of the single shot corresponding with this sample frequency, divide frame to process to this template signal;
2) characteristic coefficient of the cepstrum feature parameter MFCC of extraction template signal;
3) choose the sample frequency measured signal identical with the sample frequency described in the step 1), and carry out with step 1) in template signal divide the frame identical minute frame of counting to process;
4) calculate short-time energy and the short-time average zero-crossing rate of measured signal present frame, if short-time energy and short-time average zero-crossing rate the two one of satisfy corresponding decision condition, just the present frame of measured signal as valid frame and preserve, enter step 5); If but the two is neither to satisfy condition has in first three frame of measured signal present frame and satisfy condition, also smoothly be this present frame valid frame and preservation, enter step 5); If do not satisfy condition in first three frame, then present frame is invalid frame, enters step 6);
5) when continuous available frame count equals the frame number of 3/2 template signal, front 2/2 part identical with the frame number of template signal in this continuous effective frame as target phase, is entered step 7), all the other 1/2 parts are returned the judgement that step 4) participates in next frame;
6) satisfy when the available frame count of before this invalid frame, preserving: during the frame number of the frame number<available frame count of 1/2 template signal<3/2 template signal, the valid frame that this is continuous is as target phase, enter step 7), otherwise step 4) is returned in the data zero clearing that will preserve;
7) frame in the target phase is extracted the characteristic coefficient of cepstrum feature parameter MFCC, if the threshold value that the matching distance of the characteristic coefficient of the characteristic coefficient of the cepstrum feature parameter MFCC of template signal and the cepstrum feature parameter MFCC of signal to be detected draws less than training is then thought echo signal with this target phase; Otherwise, judge that this target phase is not echo signal.
2. the multistage screening for shot according to claim 1 detects recognition methods, it is characterized in that each frame of the template signal described in the step 1) is divided into 256~1024 points.
3. the multistage screening for shot according to claim 1 detects recognition methods, it is characterized in that, corresponding decision condition described in the step 4) is, the short-time energy of every frame is greater than the minimum value of the short-time energy of setting, and the short-time average zero-crossing rate of every frame is within the scope of setting.
4. the multistage screening for shot according to claim 1 detects recognition methods, it is characterized in that, the judgement that described in the step 5) all the other 1/2 part is returned step 4) participation next frame is, with the continuous effective frame of this 1/2 part as the present frame front in the step 4).
CN2012105740037A 2012-12-24 2012-12-24 Multilevel screening detecting recognizing method for shots Pending CN103021421A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012105740037A CN103021421A (en) 2012-12-24 2012-12-24 Multilevel screening detecting recognizing method for shots

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012105740037A CN103021421A (en) 2012-12-24 2012-12-24 Multilevel screening detecting recognizing method for shots

Publications (1)

Publication Number Publication Date
CN103021421A true CN103021421A (en) 2013-04-03

Family

ID=47969951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012105740037A Pending CN103021421A (en) 2012-12-24 2012-12-24 Multilevel screening detecting recognizing method for shots

Country Status (1)

Country Link
CN (1) CN103021421A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105424170A (en) * 2015-11-03 2016-03-23 中国人民解放军国防科学技术大学 Shot detection counting method and system
CN105679313A (en) * 2016-04-15 2016-06-15 福建新恒通智能科技有限公司 Audio recognition alarm system and method
CN106251861A (en) * 2016-08-05 2016-12-21 重庆大学 A kind of abnormal sound in public places detection method based on scene modeling
CN107665712A (en) * 2017-09-06 2018-02-06 中国科学院声学研究所北海研究站 A kind of marine organisms recognition methods based on dynamic time warping
CN112133326A (en) * 2020-09-08 2020-12-25 东南大学 Gunshot data amplification and detection method based on antagonistic neural network
CN113488071A (en) * 2021-07-16 2021-10-08 河南牧原智能科技有限公司 Pig cough recognition method, device, equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101064043A (en) * 2006-04-29 2007-10-31 上海优浪信息科技有限公司 Sound-groove gate inhibition system and uses thereof
CN101483416A (en) * 2009-01-20 2009-07-15 杭州火莲科技有限公司 Response balance processing method for voice
CN101834982A (en) * 2010-05-28 2010-09-15 上海交通大学 Hierarchical screening method of violent videos based on multiplex mode
US8160877B1 (en) * 2009-08-06 2012-04-17 Narus, Inc. Hierarchical real-time speaker recognition for biometric VoIP verification and targeting

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101064043A (en) * 2006-04-29 2007-10-31 上海优浪信息科技有限公司 Sound-groove gate inhibition system and uses thereof
CN101483416A (en) * 2009-01-20 2009-07-15 杭州火莲科技有限公司 Response balance processing method for voice
US8160877B1 (en) * 2009-08-06 2012-04-17 Narus, Inc. Hierarchical real-time speaker recognition for biometric VoIP verification and targeting
CN101834982A (en) * 2010-05-28 2010-09-15 上海交通大学 Hierarchical screening method of violent videos based on multiplex mode

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105424170A (en) * 2015-11-03 2016-03-23 中国人民解放军国防科学技术大学 Shot detection counting method and system
CN105424170B (en) * 2015-11-03 2018-07-06 中国人民解放军国防科学技术大学 A kind of shot detection method of counting and system
CN105679313A (en) * 2016-04-15 2016-06-15 福建新恒通智能科技有限公司 Audio recognition alarm system and method
CN106251861A (en) * 2016-08-05 2016-12-21 重庆大学 A kind of abnormal sound in public places detection method based on scene modeling
CN106251861B (en) * 2016-08-05 2019-04-23 重庆大学 A kind of abnormal sound in public places detection method based on scene modeling
CN107665712A (en) * 2017-09-06 2018-02-06 中国科学院声学研究所北海研究站 A kind of marine organisms recognition methods based on dynamic time warping
CN112133326A (en) * 2020-09-08 2020-12-25 东南大学 Gunshot data amplification and detection method based on antagonistic neural network
CN113488071A (en) * 2021-07-16 2021-10-08 河南牧原智能科技有限公司 Pig cough recognition method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
Deshmukh et al. Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech
CN108922541B (en) Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models
CN100505040C (en) Audio frequency splitting method for changing detection based on decision tree and speaking person
CN100485780C (en) Quick audio-frequency separating method based on tonic frequency
CN106601230B (en) Logistics sorting place name voice recognition method and system based on continuous Gaussian mixture HMM model and logistics sorting system
CN103021421A (en) Multilevel screening detecting recognizing method for shots
Cheng et al. Speech emotion recognition using gaussian mixture model
CN103646649A (en) High-efficiency voice detecting method
Venter et al. Automatic detection of African elephant (Loxodonta africana) infrasonic vocalisations from recordings
CN102968990A (en) Speaker identifying method and system
CN108335699A (en) A kind of method for recognizing sound-groove based on dynamic time warping and voice activity detection
CN109545191A (en) The real-time detection method of voice initial position in a kind of song
CN109215634A (en) A kind of method and its system of more word voice control on-off systems
CN112992191B (en) Voice endpoint detection method and device, electronic equipment and readable storage medium
CN102201230B (en) Voice detection method for emergency
CN110379438A (en) A kind of voice signal fundamental detection and extracting method and system
Nellore et al. Locating Burst Onsets Using SFF Envelope and Phase Information.
Sorin et al. The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluation
Esfandian et al. Voice activity detection using clustering-based method in Spectro-Temporal features space
Sriskandaraja et al. A model based voice activity detector for noisy environments.
Narayanan et al. Coupling binary masking and robust ASR
Tu et al. Towards improving statistical model based voice activity detection
Laleye et al. Automatic boundary detection based on entropy measures for text-independent syllable segmentation
Stahl et al. Phase-processing for voice activity detection: A statistical approach
CN114678040B (en) Voice consistency detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130403