CN106448659A - Speech endpoint detection method based on short-time energy and fractal dimensions - Google Patents
Speech endpoint detection method based on short-time energy and fractal dimensions Download PDFInfo
- Publication number
- CN106448659A CN106448659A CN201611178115.5A CN201611178115A CN106448659A CN 106448659 A CN106448659 A CN 106448659A CN 201611178115 A CN201611178115 A CN 201611178115A CN 106448659 A CN106448659 A CN 106448659A
- Authority
- CN
- China
- Prior art keywords
- frame
- threshold
- speech
- ratio
- voice signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title abstract description 16
- 238000000034 method Methods 0.000 claims abstract description 51
- 230000008569 process Effects 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 abstract 1
- 238000002474 experimental method Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000002085 persistent effect Effects 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000012067 mathematical method Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Telephonic Communication Services (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The invention discloses a speech endpoint detection method based on short-time energy and fractal dimensions. The method includes the steps: preprocessing source speech signals to obtain each frame speech signal; calculating a fractal dimension value corresponding to each frame speech signal by the theory of the fractal dimensions, and calculating a short-time energy value of each frame speech signal to obtain the ratio of the short-time energy value to the fractal dimension value; judging whether the ratio corresponding to each frame speech signal is larger than or equal to a first threshold value or not, and taking a frame larger than or equal to the first threshold value as a speech frame if the ratio is larger than or equal to the first threshold value; extracting starting endpoints and finishing endpoints of the source speech signals in the direction of two sides of the speech frame. The theory of the fractal dimensions is applied to endpoint detection, the ratio of the short-time energy value of each frame to the fractal dimension value of each frame is compared with the first threshold value, so that the speech frame is screened, and the starting endpoints and the finishing endpoints are extracted in the direction of the two sides of the speech frame. Therefore, the endpoints can be effectively extracted from the speech signals with low signal-to-noise ratio by the method.
Description
Technical field
The present invention relates to technical field of voice recognition, more particularly to a kind of based on short-time energy and the voice of fractal dimension
End-point detecting method.
Background technology
In speech recognition, end-point detection is a very important job.So-called end points is specifically referred to from one section of voice
Original position and the end position of voice is determined in signal.End-point detection can not only reduce the collection of data in speech recognition
Amount, saves process time, moreover it is possible to exclude the interference of unvoiced segments or noise segment, improves the performance of speech recognition system, and in language
Noise and quiet section of bit rate can also be reduced in sound coding, improve the efficiency of coding.How voice signal is accurately found out
End points especially in the environment of low signal-to-noise ratio, noise energy may flood the voice signal of speaker, so to follow-up instruction
Practice identification and produce considerable influence.What so a kind of higher method of robustness just showed is particularly important.
Traditional end-point detecting method is to carry out end points judgement using the double threshold two parameter method of short-time energy and zero-crossing rate.
Choosing a high thresholding first on the short-time energy envelope of voice carries out once thick judgement, higher than the identification of the threshold value
It is voice segments, and the state pause judgments position of voice should be then located at less than on the envelope outside the high threshold;Then recycle
Zero-crossing rate determines a low thresholding, for the first time thick judgement high threshold starting point to the left, terminal continue to search for out to the right voice
Section real original position, why reuse zero-crossing rate come second judgement be due to Chinese syllable by average short-time energy relatively
Big simple or compound vowel of a Chinese syllable syllable and frequency larger consonant initial consonant two parts of higher i.e. zero-crossing rate are constituted.
In the case of without noise jamming or high s/n ratio, above-mentioned end-point detecting method can accurately find out speaker
The start-stop position of end-speech.But when noise is serious, for example, when signal to noise ratio is reduced to 10dB, above-mentioned end-point detecting method is just
The position of end points is detected exactly cannot.
As can be seen here, when signal to noise ratio is relatively low, how accurately to detect that the endpoint location of voice signal is art technology
Personnel's problem demanding prompt solution.
Content of the invention
It is an object of the invention to provide a kind of based on short-time energy and the speech terminals detection side of fractal dimension
How method, for when signal to noise ratio is relatively low, accurately detecting the endpoint location of voice signal.
For solving above-mentioned technical problem, the present invention provides a kind of based on short-time energy and the speech terminals detection of fractal dimension
Method, including:
Pretreatment is carried out to source voice signal obtains each frame voice signal;
Using fractal dimension Theoretical Calculation described in the corresponding values of fractal dimension of each frame voice signal, and calculate described per
The short-time energy value of one frame voice signal, to obtain the ratio of the short-time energy value and the values of fractal dimension;
Whether the ratio corresponding to each frame voice signal is judged more than or equal to first threshold, if it is, greatly
In or equal to the first threshold frame be Speech frame;
It is drawn up, in the Speech frame both sides side, starting endpoint and the end caps that the source voice signal includes.
Preferably, described starting endpoint and the knot that the source voice signal includes is drawn up in the Speech frame both sides side
Shu Duandian is specifically included:
Judge whether the ratio of the frame on the left of the Speech frame is less than Second Threshold successively, if it is not, then continue to judge, directly
To frame of the ratio less than the Second Threshold is found, and using the frame as the starting endpoint;
Judge whether the ratio of the frame on the right side of the Speech frame is less than Second Threshold successively, if it is not, then continue to judge, directly
To frame of the ratio less than the Second Threshold is found, and using the frame as the end caps;
Wherein, the Second Threshold is less than the first threshold.
Preferably, the first threshold is 1.6.
Preferably, the Second Threshold is 1.16.
Preferably, the pretreatment includes preemphasis process, sub-frame processing and windowing process.
Preferably, processed using hamming window function in the windowing process.
Preferably, the fractal dimension is that correlation dimension, then the corresponding values of fractal dimension is correlation dimension numerical value.
Provided by the present invention based on short-time energy and the sound end detecting method of fractal dimension, including believing to source voice
Number carrying out pretreatment obtains each frame voice signal;Using the Theoretical Calculation of fractal dimension corresponding point of shape of each frame voice signal
Dimension value, and the short-time energy value of each frame voice signal is calculated, to obtain the ratio of short-time energy value and values of fractal dimension;Sentence
Whether the ratio for breaking corresponding to each frame voice signal is more than or equal to first threshold, if it is, being more than or equal to the first threshold
The frame of value is Speech frame;Starting endpoint and end caps that on the direction of Speech frame both sides, extraction source voice signal includes.We
Method applies the theory of fractal dimension in end-point detection, by the ratio of the short-time energy value of each frame and values of fractal dimension with
First threshold compares, and so as to filter out Speech frame, is then drawn up starting endpoint and end caps in the both sides side of Speech frame.
Therefore this method can efficiently extract end points in the relatively low voice signal of signal to noise ratio.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention, accompanying drawing to be used needed for embodiment will be done simply below
Introduce, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for ordinary skill people
For member, on the premise of not paying creative work, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 for the present invention provide a kind of based on short-time energy and the flow process of the sound end detecting method of fractal dimension
Figure;
Fig. 2 is that a kind of end-point detection in pure speech waveform after addition babble noise provided in an embodiment of the present invention is shown
It is intended to;
Fig. 3 is that a kind of end-point detection in pure speech waveform after addition pink noise provided in an embodiment of the present invention is illustrated
Figure.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.Based on this
Embodiment in invention, those of ordinary skill in the art are not under the premise of creative work is made, and obtained is every other
Embodiment, belongs to the scope of the present invention.
The core of the present invention be provide a kind of based on short-time energy and the sound end detecting method of fractal dimension.
In order that those skilled in the art more fully understand the present invention program, with reference to the accompanying drawings and detailed description
The present invention is described in further detail.
Fig. 1 for the present invention provide a kind of based on short-time energy and the flow process of the sound end detecting method of fractal dimension
Figure.As shown in figure 1, the method includes:
S10:Pretreatment is carried out to source voice signal obtains each frame voice signal.
The persistent period length of source voice signal here is not specified by, before calculating to source voice signal, first
Pretreatment to be first carried out, source voice signal is mainly carried out sub-frame processing by pretreatment here, but is not to say that and can only be entered
Row sub-frame processing.In being embodied as, pretreatment can include preemphasis process, sub-frame processing and windowing process.By pre-add
Process again and HFS can not only be lifted, the impact of lip radiation is removed, while the low frequency part that also decayed.By preemphasis
The interference that can also reduce fundamental frequency to the blob detection that resonates is processed, is conducive to detecting formant.Windowing process be by sub-frame processing
Signal afterwards carries out transition connection, preferably embodiment, is processed using hamming window function in windowing process.Need
Illustrate, windowing process can also adopt alternate manner, not represent a kind of only this mode.
In the present embodiment, the persistent period of each frame signal can be set as 10ms-30ms, but be not limited to that this
Scope.
S11:Using the corresponding values of fractal dimension of each frame voice signal of the Theoretical Calculation of fractal dimension, and calculate each frame
The short-time energy value of voice signal, to obtain the ratio of short-time energy value and values of fractal dimension.
Fractal theory is with Fractal Dimension and mathematical method, objective things to be described, and more convergence complication system is true
Real attribute and the description of state, more conform to multiformity and the complexity of objective things.As voice signal has fractal property,
Therefore fractal theory can be applied in end-point detection.In being embodied as, fractal dimension has multiple methods, for example meter box dimension
Number, information dimension, correlation dimension etc..Different dimension methodology is had nothing in common with each other on dimension is calculated, it is contemplated that correlation dimension reflects
The distributed intelligence at set midpoint, so the result fluctuation for calculating is relatively small.So preferably embodiment, point shape
Dimension is correlation dimension, then corresponding values of fractal dimension is correlation dimension numerical value.
The computing formula for being associated as numerical value is as follows:
Wherein, p is NiIndividual point falls into capacity for the probability in the box of δ, I is positive integer, and m represents the sequence number of sampled point, and N represents the total of frame
Quantity, 1≤m≤l, 1≤i≤N, l are the length of each frame, xiAnd xjFor ith and jth phase space reconstruction vector, H (dI, j, P)
It is Heaviside jump function.
The computing formula of short-time energy signal is:
Wherein, yiM () is the energy value of m-th sampled point in the i-th frame.
S12:Judge the ratio corresponding to each frame voice signal whether more than or equal to first threshold.If it is, entering
Step S13.Wherein, ratio is Speech frame more than or equal to the frame of first threshold.
S13:Starting endpoint and end caps that on the direction of Speech frame both sides, extraction source voice signal includes.
It is understood that each frame voice signal can obtain a ratio, ratio is filtered out in step S12 more than the
The frame of one threshold value.Such frame there may be multiple, then we need to determine whether the corresponding starting endpoint of each frame and knot
Shu Duandian.
Both sides direction in the present embodiment refers to left side and the right side of Speech frame, and left side is the frame before current voice frame
Direction, and right side be current voice frame after frame direction.For example in one section of voice signal, there are 10 frames, respectively
First frame, second frame, the 3rd frame, the 4th frame, the 5th frame, the 6th frame, the 7th frame, the 8th frame, the 9th
Individual frame, the tenth frame.If the 3rd frame is that if Speech frame, the 8th frame is Speech frame, then the left side of the 3rd frame is exactly
Second frame, the right side of the 3rd frame is exactly the 4th frame.
The present embodiment provide based on short-time energy and the sound end detecting method of fractal dimension, including believing to source voice
Number carrying out pretreatment obtains each frame voice signal;Using the Theoretical Calculation of fractal dimension corresponding point of shape of each frame voice signal
Dimension value, and the short-time energy value of each frame voice signal is calculated, to obtain the ratio of short-time energy value and values of fractal dimension;Sentence
Whether the ratio for breaking corresponding to each frame voice signal is more than or equal to first threshold, if it is, being more than or equal to the first threshold
The frame of value is Speech frame;Starting endpoint and end caps that on the direction of Speech frame both sides, extraction source voice signal includes.We
Method applies the theory of fractal dimension in end-point detection, by the ratio of the short-time energy value of each frame and values of fractal dimension with
First threshold compares, and so as to filter out Speech frame, is then drawn up starting endpoint and end caps in the both sides side of Speech frame.
Therefore this method can efficiently extract end points in the relatively low voice signal of signal to noise ratio.
Preferably embodiment, starting endpoint and knot that on the direction of Speech frame both sides, extraction source voice signal includes
Shu Duandian is specifically included:
Judge whether the ratio of the frame on the left of Speech frame is less than Second Threshold successively, if it is not, then continue to judge, until looking for
The frame of Second Threshold is less than to ratio, and using the frame as starting endpoint;
Judge whether the ratio of the frame on the right side of Speech frame is less than Second Threshold successively, if it is not, then continue to judge, until looking for
The frame of Second Threshold is less than to ratio, and using the frame as end caps;
Wherein, Second Threshold is less than first threshold.
Illustrate also by taking above 10 frames as an example, if the 3rd frame is Speech frame, if the 8th frame is Speech frame, then
For the 3rd frame, need to judge the frame in the left side of the 3rd frame successively, due to being to judge successively, therefore, first have to sentence
First frame on the left of disconnected 3rd frame, i.e., second frame, if the ratio of second frame is less than Second Threshold, second frame
It is exactly starting endpoint, otherwise judges first frame again.For the right side of the 3rd frame, it is the ratio for judging the 4th frame first
Whether value is less than Second Threshold, if it is not, then continue to judge the ratio of the 5th frame, until finding ratio less than the second threshold
The frame of value.If it is understood that not finding the ratio of a frame less than Second Threshold, the Speech frame is exactly initiating terminal
Point.
For the 8th frame, identical with the implementation procedure of the 3rd frame, the present embodiment is repeated no more.
Used as preferred embodiment, first threshold is 1.6.Used as preferred embodiment, Second Threshold is 1.16.
It is understood that first threshold and Second Threshold need to set according to concrete situation, 1.6 and 1.16 are chosen here
A kind of simply specific embodiment.
In order to the reliability of the end-point detecting method of present invention offer is verified, the emulation experiment of correlation has been carried out.Experiment fortune
Row environment for win7 system 32 pc machines, software be matlabR2013a, certainly using other fractal dimensions such as box-counting dimension,
Information dimension is also possible.As using being slightly different in time performance testing result used by different fractal dimensions, experiment is adopted
With correlation dimension, under above-mentioned environment, run time is 5 to 10 minutes or so, slightly longer than traditional end-point detection time, but anti-
The robust sex expression that makes an uproar is good.This experiment sample frequency is set to 8000Hz, and frame length is that 200 sampling points, frame is moved as 100 samples
Point, window function is set to hamming window.
Fig. 2 is that a kind of end-point detection in pure speech waveform after addition babble noise provided in an embodiment of the present invention is shown
It is intended to.As shown in Fig. 2 it is end caps that solid line is starting endpoint, dotted line.As shown in Fig. 2 under equivalent environment, by this
The end-point detecting method of bright offer can detect end points in noise, and identical with the endpoint location in pure speech waveform, table
Bright the method reliability is higher.
Fig. 3 is that a kind of end-point detection in pure speech waveform after addition pink noise provided in an embodiment of the present invention is illustrated
Figure.
Above the sound end detecting method based on short-time energy and fractal dimension provided by the present invention is carried out in detail
Thin introduction.In description, each embodiment is described by the way of going forward one by one, and what each embodiment was stressed is real with other
Apply the difference of example, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment
Speech, as which corresponds to the method disclosed in Example, so description is fairly simple, related part is referring to method part illustration
?.It should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention, also
Some improvement being carried out to the present invention and being modified, these improve and modification also falls into the protection domain of the claims in the present invention
Interior.
Claims (7)
1. a kind of based on short-time energy and the sound end detecting method of fractal dimension, it is characterised in that to include:
Pretreatment is carried out to source voice signal obtains each frame voice signal;
Using fractal dimension Theoretical Calculation described in the corresponding values of fractal dimension of each frame voice signal, and calculate each frame
The short-time energy value of voice signal, to obtain the ratio of the short-time energy value and the values of fractal dimension;
Whether judge ratio corresponding to each frame voice signal more than or equal to first threshold, if it is, more than or
It is Speech frame equal to the frame of the first threshold;
It is drawn up, in the Speech frame both sides side, starting endpoint and the end caps that the source voice signal includes.
2. sound end detecting method according to claim 1, it is characterised in that described in Speech frame both sides direction
Upper extract starting endpoint that the source voice signal includes and end caps are specifically included:
Judge whether the ratio of the frame on the left of the Speech frame is less than Second Threshold successively, if it is not, then continue to judge, until looking for
The frame of the Second Threshold is less than to ratio, and using the frame as the starting endpoint;
Judge whether the ratio of the frame on the right side of the Speech frame is less than Second Threshold successively, if it is not, then continue to judge, until looking for
The frame of the Second Threshold is less than to ratio, and using the frame as the end caps;
Wherein, the Second Threshold is less than the first threshold.
3. sound end detecting method according to claim 2, it is characterised in that the first threshold be.
4. sound end detecting method according to claim 3, it is characterised in that the Second Threshold be.
5. sound end detecting method according to claim 1, it is characterised in that the pretreatment is included at preemphasis
Reason, sub-frame processing and windowing process.
6. sound end detecting method according to claim 5, it is characterised in that adopt hamming window in the windowing process
Function is processed.
7. sound end detecting method according to claim 1, it is characterised in that the fractal dimension is correlation dimension,
Then the corresponding values of fractal dimension is correlation dimension numerical value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611178115.5A CN106448659B (en) | 2016-12-19 | 2016-12-19 | A kind of sound end detecting method based on short-time energy and fractal dimension |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611178115.5A CN106448659B (en) | 2016-12-19 | 2016-12-19 | A kind of sound end detecting method based on short-time energy and fractal dimension |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106448659A true CN106448659A (en) | 2017-02-22 |
CN106448659B CN106448659B (en) | 2019-09-27 |
Family
ID=58215020
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611178115.5A Expired - Fee Related CN106448659B (en) | 2016-12-19 | 2016-12-19 | A kind of sound end detecting method based on short-time energy and fractal dimension |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106448659B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108198558A (en) * | 2017-12-28 | 2018-06-22 | 电子科技大学 | A kind of audio recognition method based on CSI data |
CN108305639A (en) * | 2018-05-11 | 2018-07-20 | 南京邮电大学 | Speech-emotion recognition method, computer readable storage medium, terminal |
CN109346095A (en) * | 2018-10-10 | 2019-02-15 | 广州市讯飞樽鸿信息技术有限公司 | A kind of heart sound end-point detecting method |
WO2019100327A1 (en) * | 2017-11-24 | 2019-05-31 | 深圳传音通讯有限公司 | Signal processing method, device and terminal |
CN110364187A (en) * | 2019-07-03 | 2019-10-22 | 深圳华海尖兵科技有限公司 | A kind of endpoint recognition methods of voice signal and device |
CN113488071A (en) * | 2021-07-16 | 2021-10-08 | 河南牧原智能科技有限公司 | Pig cough recognition method, device, equipment and readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007264567A (en) * | 2006-03-30 | 2007-10-11 | Railway Technical Res Inst | Decision processing method for unspoken voice in voice |
CN101625857A (en) * | 2008-07-10 | 2010-01-13 | 新奥特(北京)视频技术有限公司 | Self-adaptive voice endpoint detection method |
CN101625858A (en) * | 2008-07-10 | 2010-01-13 | 新奥特(北京)视频技术有限公司 | Method for extracting short-time energy frequency value in voice endpoint detection |
CN102184732A (en) * | 2011-04-28 | 2011-09-14 | 重庆邮电大学 | Fractal-feature-based intelligent wheelchair voice identification control method and system |
CN103366739A (en) * | 2012-03-28 | 2013-10-23 | 郑州市科学技术情报研究所 | Self-adaptive endpoint detection method and self-adaptive endpoint detection system for isolate word speech recognition |
-
2016
- 2016-12-19 CN CN201611178115.5A patent/CN106448659B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007264567A (en) * | 2006-03-30 | 2007-10-11 | Railway Technical Res Inst | Decision processing method for unspoken voice in voice |
CN101625857A (en) * | 2008-07-10 | 2010-01-13 | 新奥特(北京)视频技术有限公司 | Self-adaptive voice endpoint detection method |
CN101625858A (en) * | 2008-07-10 | 2010-01-13 | 新奥特(北京)视频技术有限公司 | Method for extracting short-time energy frequency value in voice endpoint detection |
CN102184732A (en) * | 2011-04-28 | 2011-09-14 | 重庆邮电大学 | Fractal-feature-based intelligent wheelchair voice identification control method and system |
CN103366739A (en) * | 2012-03-28 | 2013-10-23 | 郑州市科学技术情报研究所 | Self-adaptive endpoint detection method and self-adaptive endpoint detection system for isolate word speech recognition |
Non-Patent Citations (1)
Title |
---|
沈亚强等: "基于时间序列短时分形维数的噪声", 《浙江师大学报(自然科学版)》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019100327A1 (en) * | 2017-11-24 | 2019-05-31 | 深圳传音通讯有限公司 | Signal processing method, device and terminal |
CN108198558A (en) * | 2017-12-28 | 2018-06-22 | 电子科技大学 | A kind of audio recognition method based on CSI data |
CN108305639A (en) * | 2018-05-11 | 2018-07-20 | 南京邮电大学 | Speech-emotion recognition method, computer readable storage medium, terminal |
CN109346095A (en) * | 2018-10-10 | 2019-02-15 | 广州市讯飞樽鸿信息技术有限公司 | A kind of heart sound end-point detecting method |
CN109346095B (en) * | 2018-10-10 | 2023-07-07 | 广州九路科技有限公司 | Heart sound endpoint detection method |
CN110364187A (en) * | 2019-07-03 | 2019-10-22 | 深圳华海尖兵科技有限公司 | A kind of endpoint recognition methods of voice signal and device |
CN113488071A (en) * | 2021-07-16 | 2021-10-08 | 河南牧原智能科技有限公司 | Pig cough recognition method, device, equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106448659B (en) | 2019-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106448659A (en) | Speech endpoint detection method based on short-time energy and fractal dimensions | |
US11062699B2 (en) | Speech recognition with trained GMM-HMM and LSTM models | |
CN109473123A (en) | Voice activity detection method and device | |
CN102982811B (en) | Voice endpoint detection method based on real-time decoding | |
CN108710704B (en) | Method and device for determining conversation state, electronic equipment and storage medium | |
CN110706690A (en) | Speech recognition method and device | |
CN108428448A (en) | A kind of sound end detecting method and audio recognition method | |
CN112183099A (en) | Named entity identification method and system based on semi-supervised small sample extension | |
CN105006230A (en) | Voice sensitive information detecting and filtering method based on unspecified people | |
CN104021789A (en) | Self-adaption endpoint detection method using short-time time-frequency value | |
CN109087667B (en) | Voice fluency recognition method and device, computer equipment and readable storage medium | |
CN101887722A (en) | Rapid voiceprint authentication method | |
CN102073676A (en) | Method and system for detecting network pornography videos in real time | |
CN110942776B (en) | Audio splicing prevention detection method and system based on GRU | |
CN109215634A (en) | A kind of method and its system of more word voice control on-off systems | |
CN105869658A (en) | Voice endpoint detection method employing nonlinear feature | |
CN111128128A (en) | Voice keyword detection method based on complementary model scoring fusion | |
CN111477219A (en) | Keyword distinguishing method and device, electronic equipment and readable storage medium | |
CN103021421A (en) | Multilevel screening detecting recognizing method for shots | |
CN101067929B (en) | Method for enhancing and extracting phonetic resonance hump trace utilizing formant | |
CN117116292A (en) | Audio detection method, device, electronic equipment and storage medium | |
CN111613250B (en) | Long voice endpoint detection method and device, storage medium and electronic equipment | |
CN115359323A (en) | Image text information generation method and deep learning model training method | |
CN104240705A (en) | Intelligent voice-recognition locking system for safe box | |
CN112216285B (en) | Multi-user session detection method, system, mobile terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190927 Termination date: 20201219 |