CN106448659A - Speech endpoint detection method based on short-time energy and fractal dimensions - Google Patents

Speech endpoint detection method based on short-time energy and fractal dimensions Download PDF

Info

Publication number
CN106448659A
CN106448659A CN201611178115.5A CN201611178115A CN106448659A CN 106448659 A CN106448659 A CN 106448659A CN 201611178115 A CN201611178115 A CN 201611178115A CN 106448659 A CN106448659 A CN 106448659A
Authority
CN
China
Prior art keywords
frame
threshold
speech
ratio
voice signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611178115.5A
Other languages
Chinese (zh)
Other versions
CN106448659B (en
Inventor
魏啸天
鲍鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201611178115.5A priority Critical patent/CN106448659B/en
Publication of CN106448659A publication Critical patent/CN106448659A/en
Application granted granted Critical
Publication of CN106448659B publication Critical patent/CN106448659B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Telephonic Communication Services (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a speech endpoint detection method based on short-time energy and fractal dimensions. The method includes the steps: preprocessing source speech signals to obtain each frame speech signal; calculating a fractal dimension value corresponding to each frame speech signal by the theory of the fractal dimensions, and calculating a short-time energy value of each frame speech signal to obtain the ratio of the short-time energy value to the fractal dimension value; judging whether the ratio corresponding to each frame speech signal is larger than or equal to a first threshold value or not, and taking a frame larger than or equal to the first threshold value as a speech frame if the ratio is larger than or equal to the first threshold value; extracting starting endpoints and finishing endpoints of the source speech signals in the direction of two sides of the speech frame. The theory of the fractal dimensions is applied to endpoint detection, the ratio of the short-time energy value of each frame to the fractal dimension value of each frame is compared with the first threshold value, so that the speech frame is screened, and the starting endpoints and the finishing endpoints are extracted in the direction of the two sides of the speech frame. Therefore, the endpoints can be effectively extracted from the speech signals with low signal-to-noise ratio by the method.

Description

A kind of based on short-time energy and the sound end detecting method of fractal dimension
Technical field
The present invention relates to technical field of voice recognition, more particularly to a kind of based on short-time energy and the voice of fractal dimension End-point detecting method.
Background technology
In speech recognition, end-point detection is a very important job.So-called end points is specifically referred to from one section of voice Original position and the end position of voice is determined in signal.End-point detection can not only reduce the collection of data in speech recognition Amount, saves process time, moreover it is possible to exclude the interference of unvoiced segments or noise segment, improves the performance of speech recognition system, and in language Noise and quiet section of bit rate can also be reduced in sound coding, improve the efficiency of coding.How voice signal is accurately found out End points especially in the environment of low signal-to-noise ratio, noise energy may flood the voice signal of speaker, so to follow-up instruction Practice identification and produce considerable influence.What so a kind of higher method of robustness just showed is particularly important.
Traditional end-point detecting method is to carry out end points judgement using the double threshold two parameter method of short-time energy and zero-crossing rate. Choosing a high thresholding first on the short-time energy envelope of voice carries out once thick judgement, higher than the identification of the threshold value It is voice segments, and the state pause judgments position of voice should be then located at less than on the envelope outside the high threshold;Then recycle Zero-crossing rate determines a low thresholding, for the first time thick judgement high threshold starting point to the left, terminal continue to search for out to the right voice Section real original position, why reuse zero-crossing rate come second judgement be due to Chinese syllable by average short-time energy relatively Big simple or compound vowel of a Chinese syllable syllable and frequency larger consonant initial consonant two parts of higher i.e. zero-crossing rate are constituted.
In the case of without noise jamming or high s/n ratio, above-mentioned end-point detecting method can accurately find out speaker The start-stop position of end-speech.But when noise is serious, for example, when signal to noise ratio is reduced to 10dB, above-mentioned end-point detecting method is just The position of end points is detected exactly cannot.
As can be seen here, when signal to noise ratio is relatively low, how accurately to detect that the endpoint location of voice signal is art technology Personnel's problem demanding prompt solution.
Content of the invention
It is an object of the invention to provide a kind of based on short-time energy and the speech terminals detection side of fractal dimension
How method, for when signal to noise ratio is relatively low, accurately detecting the endpoint location of voice signal.
For solving above-mentioned technical problem, the present invention provides a kind of based on short-time energy and the speech terminals detection of fractal dimension Method, including:
Pretreatment is carried out to source voice signal obtains each frame voice signal;
Using fractal dimension Theoretical Calculation described in the corresponding values of fractal dimension of each frame voice signal, and calculate described per The short-time energy value of one frame voice signal, to obtain the ratio of the short-time energy value and the values of fractal dimension;
Whether the ratio corresponding to each frame voice signal is judged more than or equal to first threshold, if it is, greatly In or equal to the first threshold frame be Speech frame;
It is drawn up, in the Speech frame both sides side, starting endpoint and the end caps that the source voice signal includes.
Preferably, described starting endpoint and the knot that the source voice signal includes is drawn up in the Speech frame both sides side Shu Duandian is specifically included:
Judge whether the ratio of the frame on the left of the Speech frame is less than Second Threshold successively, if it is not, then continue to judge, directly To frame of the ratio less than the Second Threshold is found, and using the frame as the starting endpoint;
Judge whether the ratio of the frame on the right side of the Speech frame is less than Second Threshold successively, if it is not, then continue to judge, directly To frame of the ratio less than the Second Threshold is found, and using the frame as the end caps;
Wherein, the Second Threshold is less than the first threshold.
Preferably, the first threshold is 1.6.
Preferably, the Second Threshold is 1.16.
Preferably, the pretreatment includes preemphasis process, sub-frame processing and windowing process.
Preferably, processed using hamming window function in the windowing process.
Preferably, the fractal dimension is that correlation dimension, then the corresponding values of fractal dimension is correlation dimension numerical value.
Provided by the present invention based on short-time energy and the sound end detecting method of fractal dimension, including believing to source voice Number carrying out pretreatment obtains each frame voice signal;Using the Theoretical Calculation of fractal dimension corresponding point of shape of each frame voice signal Dimension value, and the short-time energy value of each frame voice signal is calculated, to obtain the ratio of short-time energy value and values of fractal dimension;Sentence Whether the ratio for breaking corresponding to each frame voice signal is more than or equal to first threshold, if it is, being more than or equal to the first threshold The frame of value is Speech frame;Starting endpoint and end caps that on the direction of Speech frame both sides, extraction source voice signal includes.We Method applies the theory of fractal dimension in end-point detection, by the ratio of the short-time energy value of each frame and values of fractal dimension with First threshold compares, and so as to filter out Speech frame, is then drawn up starting endpoint and end caps in the both sides side of Speech frame. Therefore this method can efficiently extract end points in the relatively low voice signal of signal to noise ratio.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention, accompanying drawing to be used needed for embodiment will be done simply below Introduce, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for ordinary skill people For member, on the premise of not paying creative work, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 for the present invention provide a kind of based on short-time energy and the flow process of the sound end detecting method of fractal dimension Figure;
Fig. 2 is that a kind of end-point detection in pure speech waveform after addition babble noise provided in an embodiment of the present invention is shown It is intended to;
Fig. 3 is that a kind of end-point detection in pure speech waveform after addition pink noise provided in an embodiment of the present invention is illustrated Figure.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.Based on this Embodiment in invention, those of ordinary skill in the art are not under the premise of creative work is made, and obtained is every other Embodiment, belongs to the scope of the present invention.
The core of the present invention be provide a kind of based on short-time energy and the sound end detecting method of fractal dimension.
In order that those skilled in the art more fully understand the present invention program, with reference to the accompanying drawings and detailed description The present invention is described in further detail.
Fig. 1 for the present invention provide a kind of based on short-time energy and the flow process of the sound end detecting method of fractal dimension Figure.As shown in figure 1, the method includes:
S10:Pretreatment is carried out to source voice signal obtains each frame voice signal.
The persistent period length of source voice signal here is not specified by, before calculating to source voice signal, first Pretreatment to be first carried out, source voice signal is mainly carried out sub-frame processing by pretreatment here, but is not to say that and can only be entered Row sub-frame processing.In being embodied as, pretreatment can include preemphasis process, sub-frame processing and windowing process.By pre-add Process again and HFS can not only be lifted, the impact of lip radiation is removed, while the low frequency part that also decayed.By preemphasis The interference that can also reduce fundamental frequency to the blob detection that resonates is processed, is conducive to detecting formant.Windowing process be by sub-frame processing Signal afterwards carries out transition connection, preferably embodiment, is processed using hamming window function in windowing process.Need Illustrate, windowing process can also adopt alternate manner, not represent a kind of only this mode.
In the present embodiment, the persistent period of each frame signal can be set as 10ms-30ms, but be not limited to that this Scope.
S11:Using the corresponding values of fractal dimension of each frame voice signal of the Theoretical Calculation of fractal dimension, and calculate each frame The short-time energy value of voice signal, to obtain the ratio of short-time energy value and values of fractal dimension.
Fractal theory is with Fractal Dimension and mathematical method, objective things to be described, and more convergence complication system is true Real attribute and the description of state, more conform to multiformity and the complexity of objective things.As voice signal has fractal property, Therefore fractal theory can be applied in end-point detection.In being embodied as, fractal dimension has multiple methods, for example meter box dimension Number, information dimension, correlation dimension etc..Different dimension methodology is had nothing in common with each other on dimension is calculated, it is contemplated that correlation dimension reflects The distributed intelligence at set midpoint, so the result fluctuation for calculating is relatively small.So preferably embodiment, point shape Dimension is correlation dimension, then corresponding values of fractal dimension is correlation dimension numerical value.
The computing formula for being associated as numerical value is as follows:
Wherein, p is NiIndividual point falls into capacity for the probability in the box of δ, I is positive integer, and m represents the sequence number of sampled point, and N represents the total of frame Quantity, 1≤m≤l, 1≤i≤N, l are the length of each frame, xiAnd xjFor ith and jth phase space reconstruction vector, H (dI, j, P) It is Heaviside jump function.
The computing formula of short-time energy signal is:
Wherein, yiM () is the energy value of m-th sampled point in the i-th frame.
S12:Judge the ratio corresponding to each frame voice signal whether more than or equal to first threshold.If it is, entering Step S13.Wherein, ratio is Speech frame more than or equal to the frame of first threshold.
S13:Starting endpoint and end caps that on the direction of Speech frame both sides, extraction source voice signal includes.
It is understood that each frame voice signal can obtain a ratio, ratio is filtered out in step S12 more than the The frame of one threshold value.Such frame there may be multiple, then we need to determine whether the corresponding starting endpoint of each frame and knot Shu Duandian.
Both sides direction in the present embodiment refers to left side and the right side of Speech frame, and left side is the frame before current voice frame Direction, and right side be current voice frame after frame direction.For example in one section of voice signal, there are 10 frames, respectively First frame, second frame, the 3rd frame, the 4th frame, the 5th frame, the 6th frame, the 7th frame, the 8th frame, the 9th Individual frame, the tenth frame.If the 3rd frame is that if Speech frame, the 8th frame is Speech frame, then the left side of the 3rd frame is exactly Second frame, the right side of the 3rd frame is exactly the 4th frame.
The present embodiment provide based on short-time energy and the sound end detecting method of fractal dimension, including believing to source voice Number carrying out pretreatment obtains each frame voice signal;Using the Theoretical Calculation of fractal dimension corresponding point of shape of each frame voice signal Dimension value, and the short-time energy value of each frame voice signal is calculated, to obtain the ratio of short-time energy value and values of fractal dimension;Sentence Whether the ratio for breaking corresponding to each frame voice signal is more than or equal to first threshold, if it is, being more than or equal to the first threshold The frame of value is Speech frame;Starting endpoint and end caps that on the direction of Speech frame both sides, extraction source voice signal includes.We Method applies the theory of fractal dimension in end-point detection, by the ratio of the short-time energy value of each frame and values of fractal dimension with First threshold compares, and so as to filter out Speech frame, is then drawn up starting endpoint and end caps in the both sides side of Speech frame. Therefore this method can efficiently extract end points in the relatively low voice signal of signal to noise ratio.
Preferably embodiment, starting endpoint and knot that on the direction of Speech frame both sides, extraction source voice signal includes Shu Duandian is specifically included:
Judge whether the ratio of the frame on the left of Speech frame is less than Second Threshold successively, if it is not, then continue to judge, until looking for The frame of Second Threshold is less than to ratio, and using the frame as starting endpoint;
Judge whether the ratio of the frame on the right side of Speech frame is less than Second Threshold successively, if it is not, then continue to judge, until looking for The frame of Second Threshold is less than to ratio, and using the frame as end caps;
Wherein, Second Threshold is less than first threshold.
Illustrate also by taking above 10 frames as an example, if the 3rd frame is Speech frame, if the 8th frame is Speech frame, then For the 3rd frame, need to judge the frame in the left side of the 3rd frame successively, due to being to judge successively, therefore, first have to sentence First frame on the left of disconnected 3rd frame, i.e., second frame, if the ratio of second frame is less than Second Threshold, second frame It is exactly starting endpoint, otherwise judges first frame again.For the right side of the 3rd frame, it is the ratio for judging the 4th frame first Whether value is less than Second Threshold, if it is not, then continue to judge the ratio of the 5th frame, until finding ratio less than the second threshold The frame of value.If it is understood that not finding the ratio of a frame less than Second Threshold, the Speech frame is exactly initiating terminal Point.
For the 8th frame, identical with the implementation procedure of the 3rd frame, the present embodiment is repeated no more.
Used as preferred embodiment, first threshold is 1.6.Used as preferred embodiment, Second Threshold is 1.16.
It is understood that first threshold and Second Threshold need to set according to concrete situation, 1.6 and 1.16 are chosen here A kind of simply specific embodiment.
In order to the reliability of the end-point detecting method of present invention offer is verified, the emulation experiment of correlation has been carried out.Experiment fortune Row environment for win7 system 32 pc machines, software be matlabR2013a, certainly using other fractal dimensions such as box-counting dimension, Information dimension is also possible.As using being slightly different in time performance testing result used by different fractal dimensions, experiment is adopted With correlation dimension, under above-mentioned environment, run time is 5 to 10 minutes or so, slightly longer than traditional end-point detection time, but anti- The robust sex expression that makes an uproar is good.This experiment sample frequency is set to 8000Hz, and frame length is that 200 sampling points, frame is moved as 100 samples Point, window function is set to hamming window.
Fig. 2 is that a kind of end-point detection in pure speech waveform after addition babble noise provided in an embodiment of the present invention is shown It is intended to.As shown in Fig. 2 it is end caps that solid line is starting endpoint, dotted line.As shown in Fig. 2 under equivalent environment, by this The end-point detecting method of bright offer can detect end points in noise, and identical with the endpoint location in pure speech waveform, table Bright the method reliability is higher.
Fig. 3 is that a kind of end-point detection in pure speech waveform after addition pink noise provided in an embodiment of the present invention is illustrated Figure.
Above the sound end detecting method based on short-time energy and fractal dimension provided by the present invention is carried out in detail Thin introduction.In description, each embodiment is described by the way of going forward one by one, and what each embodiment was stressed is real with other Apply the difference of example, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment Speech, as which corresponds to the method disclosed in Example, so description is fairly simple, related part is referring to method part illustration ?.It should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention, also Some improvement being carried out to the present invention and being modified, these improve and modification also falls into the protection domain of the claims in the present invention Interior.

Claims (7)

1. a kind of based on short-time energy and the sound end detecting method of fractal dimension, it is characterised in that to include:
Pretreatment is carried out to source voice signal obtains each frame voice signal;
Using fractal dimension Theoretical Calculation described in the corresponding values of fractal dimension of each frame voice signal, and calculate each frame The short-time energy value of voice signal, to obtain the ratio of the short-time energy value and the values of fractal dimension;
Whether judge ratio corresponding to each frame voice signal more than or equal to first threshold, if it is, more than or It is Speech frame equal to the frame of the first threshold;
It is drawn up, in the Speech frame both sides side, starting endpoint and the end caps that the source voice signal includes.
2. sound end detecting method according to claim 1, it is characterised in that described in Speech frame both sides direction Upper extract starting endpoint that the source voice signal includes and end caps are specifically included:
Judge whether the ratio of the frame on the left of the Speech frame is less than Second Threshold successively, if it is not, then continue to judge, until looking for The frame of the Second Threshold is less than to ratio, and using the frame as the starting endpoint;
Judge whether the ratio of the frame on the right side of the Speech frame is less than Second Threshold successively, if it is not, then continue to judge, until looking for The frame of the Second Threshold is less than to ratio, and using the frame as the end caps;
Wherein, the Second Threshold is less than the first threshold.
3. sound end detecting method according to claim 2, it is characterised in that the first threshold be.
4. sound end detecting method according to claim 3, it is characterised in that the Second Threshold be.
5. sound end detecting method according to claim 1, it is characterised in that the pretreatment is included at preemphasis Reason, sub-frame processing and windowing process.
6. sound end detecting method according to claim 5, it is characterised in that adopt hamming window in the windowing process Function is processed.
7. sound end detecting method according to claim 1, it is characterised in that the fractal dimension is correlation dimension, Then the corresponding values of fractal dimension is correlation dimension numerical value.
CN201611178115.5A 2016-12-19 2016-12-19 A kind of sound end detecting method based on short-time energy and fractal dimension Expired - Fee Related CN106448659B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611178115.5A CN106448659B (en) 2016-12-19 2016-12-19 A kind of sound end detecting method based on short-time energy and fractal dimension

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611178115.5A CN106448659B (en) 2016-12-19 2016-12-19 A kind of sound end detecting method based on short-time energy and fractal dimension

Publications (2)

Publication Number Publication Date
CN106448659A true CN106448659A (en) 2017-02-22
CN106448659B CN106448659B (en) 2019-09-27

Family

ID=58215020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611178115.5A Expired - Fee Related CN106448659B (en) 2016-12-19 2016-12-19 A kind of sound end detecting method based on short-time energy and fractal dimension

Country Status (1)

Country Link
CN (1) CN106448659B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108198558A (en) * 2017-12-28 2018-06-22 电子科技大学 A kind of audio recognition method based on CSI data
CN108305639A (en) * 2018-05-11 2018-07-20 南京邮电大学 Speech-emotion recognition method, computer readable storage medium, terminal
CN109346095A (en) * 2018-10-10 2019-02-15 广州市讯飞樽鸿信息技术有限公司 A kind of heart sound end-point detecting method
WO2019100327A1 (en) * 2017-11-24 2019-05-31 深圳传音通讯有限公司 Signal processing method, device and terminal
CN110364187A (en) * 2019-07-03 2019-10-22 深圳华海尖兵科技有限公司 A kind of endpoint recognition methods of voice signal and device
CN113488071A (en) * 2021-07-16 2021-10-08 河南牧原智能科技有限公司 Pig cough recognition method, device, equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007264567A (en) * 2006-03-30 2007-10-11 Railway Technical Res Inst Decision processing method for unspoken voice in voice
CN101625857A (en) * 2008-07-10 2010-01-13 新奥特(北京)视频技术有限公司 Self-adaptive voice endpoint detection method
CN101625858A (en) * 2008-07-10 2010-01-13 新奥特(北京)视频技术有限公司 Method for extracting short-time energy frequency value in voice endpoint detection
CN102184732A (en) * 2011-04-28 2011-09-14 重庆邮电大学 Fractal-feature-based intelligent wheelchair voice identification control method and system
CN103366739A (en) * 2012-03-28 2013-10-23 郑州市科学技术情报研究所 Self-adaptive endpoint detection method and self-adaptive endpoint detection system for isolate word speech recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007264567A (en) * 2006-03-30 2007-10-11 Railway Technical Res Inst Decision processing method for unspoken voice in voice
CN101625857A (en) * 2008-07-10 2010-01-13 新奥特(北京)视频技术有限公司 Self-adaptive voice endpoint detection method
CN101625858A (en) * 2008-07-10 2010-01-13 新奥特(北京)视频技术有限公司 Method for extracting short-time energy frequency value in voice endpoint detection
CN102184732A (en) * 2011-04-28 2011-09-14 重庆邮电大学 Fractal-feature-based intelligent wheelchair voice identification control method and system
CN103366739A (en) * 2012-03-28 2013-10-23 郑州市科学技术情报研究所 Self-adaptive endpoint detection method and self-adaptive endpoint detection system for isolate word speech recognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
沈亚强等: "基于时间序列短时分形维数的噪声", 《浙江师大学报(自然科学版)》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019100327A1 (en) * 2017-11-24 2019-05-31 深圳传音通讯有限公司 Signal processing method, device and terminal
CN108198558A (en) * 2017-12-28 2018-06-22 电子科技大学 A kind of audio recognition method based on CSI data
CN108305639A (en) * 2018-05-11 2018-07-20 南京邮电大学 Speech-emotion recognition method, computer readable storage medium, terminal
CN109346095A (en) * 2018-10-10 2019-02-15 广州市讯飞樽鸿信息技术有限公司 A kind of heart sound end-point detecting method
CN109346095B (en) * 2018-10-10 2023-07-07 广州九路科技有限公司 Heart sound endpoint detection method
CN110364187A (en) * 2019-07-03 2019-10-22 深圳华海尖兵科技有限公司 A kind of endpoint recognition methods of voice signal and device
CN113488071A (en) * 2021-07-16 2021-10-08 河南牧原智能科技有限公司 Pig cough recognition method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN106448659B (en) 2019-09-27

Similar Documents

Publication Publication Date Title
CN106448659A (en) Speech endpoint detection method based on short-time energy and fractal dimensions
US11062699B2 (en) Speech recognition with trained GMM-HMM and LSTM models
CN109473123A (en) Voice activity detection method and device
CN102982811B (en) Voice endpoint detection method based on real-time decoding
CN108710704B (en) Method and device for determining conversation state, electronic equipment and storage medium
CN110706690A (en) Speech recognition method and device
CN108428448A (en) A kind of sound end detecting method and audio recognition method
CN112183099A (en) Named entity identification method and system based on semi-supervised small sample extension
CN105006230A (en) Voice sensitive information detecting and filtering method based on unspecified people
CN104021789A (en) Self-adaption endpoint detection method using short-time time-frequency value
CN109087667B (en) Voice fluency recognition method and device, computer equipment and readable storage medium
CN101887722A (en) Rapid voiceprint authentication method
CN102073676A (en) Method and system for detecting network pornography videos in real time
CN110942776B (en) Audio splicing prevention detection method and system based on GRU
CN109215634A (en) A kind of method and its system of more word voice control on-off systems
CN105869658A (en) Voice endpoint detection method employing nonlinear feature
CN111128128A (en) Voice keyword detection method based on complementary model scoring fusion
CN111477219A (en) Keyword distinguishing method and device, electronic equipment and readable storage medium
CN103021421A (en) Multilevel screening detecting recognizing method for shots
CN101067929B (en) Method for enhancing and extracting phonetic resonance hump trace utilizing formant
CN117116292A (en) Audio detection method, device, electronic equipment and storage medium
CN111613250B (en) Long voice endpoint detection method and device, storage medium and electronic equipment
CN115359323A (en) Image text information generation method and deep learning model training method
CN104240705A (en) Intelligent voice-recognition locking system for safe box
CN112216285B (en) Multi-user session detection method, system, mobile terminal and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190927

Termination date: 20201219