CN108074585A - A kind of voice method for detecting abnormality based on sound source characteristics - Google Patents
A kind of voice method for detecting abnormality based on sound source characteristics Download PDFInfo
- Publication number
- CN108074585A CN108074585A CN201810126670.6A CN201810126670A CN108074585A CN 108074585 A CN108074585 A CN 108074585A CN 201810126670 A CN201810126670 A CN 201810126670A CN 108074585 A CN108074585 A CN 108074585A
- Authority
- CN
- China
- Prior art keywords
- mrow
- voice
- glottis
- sound source
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000005856 abnormality Effects 0.000 title claims abstract description 16
- 210000004704 glottis Anatomy 0.000 claims abstract description 49
- 238000001514 detection method Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims description 9
- 238000001228 spectrum Methods 0.000 claims description 6
- 230000005284 excitation Effects 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims description 4
- 239000013256 coordination polymer Substances 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 210000001260 vocal cord Anatomy 0.000 abstract description 8
- 238000004519 manufacturing process Methods 0.000 abstract description 6
- 230000002159 abnormal effect Effects 0.000 abstract description 3
- 238000004458 analytical method Methods 0.000 abstract description 3
- 238000000605 extraction Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000012706 support-vector machine Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 210000004556 brain Anatomy 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 208000019901 Anxiety disease Diseases 0.000 description 1
- 208000001387 Causalgia Diseases 0.000 description 1
- 208000023890 Complex Regional Pain Syndromes Diseases 0.000 description 1
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- 230000036506 anxiety Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 208000014439 complex regional pain syndrome type 2 Diseases 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000003458 metachromatic effect Effects 0.000 description 1
- 230000004118 muscle contraction Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 108090000765 processed proteins & peptides Proteins 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Emergency Alarm Devices (AREA)
Abstract
The invention discloses a kind of voice method for detecting abnormality based on sound source characteristics, include the following steps:Pass through sensor real-time collecting voice data;Obtained speech segments are pre-processed;For the voice data of voice segments, glottis ripple signal is obtained using iteration self-adapting liftering;Characteristic parameter is extracted from glottis ripple signal:Normalized amplitude business compares data with glottis closing time;The characteristic extracted input ideal SVM models are classified;Tag along sort is obtained, for judging speaker's situation, speaker's situation label is exported, execution module is transferred to be fed back.The present invention characteristic be, for the stressed speech under stress, it has broken away from and model is generated based on traditional linear speech, extraction lacks the recognition methods of the acoustical characteristic parameters of physical significance, establish sound source estimation model, using the liftering technology of speech production, analysis and extraction carry out the detection of abnormal speech based on the characteristic parameter of human vocal cord vibration.
Description
Technical field
The present invention relates to a kind of voice method for detecting abnormality based on sound source characteristics, belong to intelligent sound technical field.
Background technology
Pressure is the natural reaction that body is stimulated for physics, psychology or emotion, when we are subject to these stimulations, greatly
Brain can release alcohol and peptide matters to body, so as to cause nervous reaction.This lasting Anxiety for work, it will
It is reflected on phonatory organ, so as to cause the change of the series of parameters such as audible frequency, the rate of articulation.These change believes in voice
The various fields of number processing suffer from very important meaning, such as stressed speech identification, emotion recognition etc..
The important embodiment mode of pressure one is voice when speaker speaks, and becoming, which influences voice, generates very important one
A influence factor.Mostly it is absorbed in Mr. Yu when ambient enviroment or words person's self-condition are abnormal variation or due to the use of person
Work, speech recognition is to aid in the underworks of other work, in this process, at this moment depositing due to operating pressure
It is subject to stress in, speaker, interlocutor's pronunciation will have large effect, and so as to generate abnormality, generate language
The change of tune is different, and abnormality is often embodied among the voice of speaker, forms the voice signal under pressure anomaly state.
But the stressed speech under the stressed speech under stress, particularly multitask brain load pressure, from acoustically
Discrimination it is relatively low, general acoustic feature cannot by its it is correct classify, deficient in stability and robustness.Further, since
Stressed speech is in generating process, significant difference compared with sound source characteristics have with normally voice.Therefore, in detection process
In, we improve the reliability of Stressful speech classification by sound source characteristics.It is language by improving the mark efficiency of stressed speech
The strong robustness of sound identifying system lays the foundation.
The content of the invention
Problem to be solved by this invention is that pressure state is detected from the angle of the sound source of speech production, proposes one
Pressure detection method of the kind based on speech production modeling.The characteristic of the present invention is to have broken away to generate mould based on traditional linear speech
Type and lack physical significance acoustical characteristic parameters recognition methods, establish sound source estimation model, utilize the inverse of speech production
Filtering technique, analysis and extraction carry out the detection of abnormal speech based on the characteristic parameter of human vocal cord vibration.
Technical scheme is as follows:
A kind of voice method for detecting abnormality based on sound source characteristics, includes the following steps:
(1), sensor real-time collecting voice data is passed through;
(2), the voice segments and noise segment of voice data are judged by end-point detection, to decide whether to carry out next step voice
Signal processing works;
(3), the voice data framing adding window to obtained voice segments, and high frequency preemphasis processing is carried out to each frame;
(4), for the voice data of voice segments, glottis ripple signal is obtained using iteration self-adapting liftering;
(5), extract the characteristic parameter normalized amplitude business of glottis ripple and glottis closing time compares data;
(6), by the data extracted input, trained SVM models are classified;
(7), obtain tag along sort, for judging speaker's situation, export speaker's situation label, transfer to execution module into
Row feedback.
Adding window uses Hamming window to a frame voice adding window in above-mentioned steps (3).
The processing of above-mentioned steps (3) medium-high frequency preemphasis is by the limited exciter response high-pass filter of a single order to its high frequency
Part is promoted.
The step of glottis ripple signal is obtained in above-mentioned steps (4) is as follows:
(a), channel model is established using iteration self-adapting liftering;
Iteration self-adapting liftering eliminates the influence that glottal excitation is brought in primary speech signal frequency spectrum;
(b) and then by the method for liftering the influence of formant is eliminated;
Acoustic model is established by linear predictive coding and discrete complete or collected works' model exactly, is finally obtained using liftering
Glottis ripple signal.
The extracting method of the characteristic parameter normalized amplitude business of glottis ripple is as follows in above-mentioned steps (5):
NAQ is normalized amplitude business in formula;T is pitch period;AQ is amplitude business, is glottis ripple peak swing and its correspondence
The ratio between the maximum negative peak of first derivative;
F in formulaacFor the maximum crest value of glottal;Dpeak is the maximum negative peak that glottal corresponds to first derivative.
Glottis closing time is as follows than the extracting method of data in above-mentioned steps (5):
CPR compares data for glottis closing time in formula;CP is the Closure states of glottis stage;O is glottis total opening time.
The advantageous effect that the present invention is reached:
The present invention by sounding physiological system under pressure influence on the Research foundation of variation characteristic, studying physiological feature
Inner link between sound source parameter verifies an important factor for can reflecting pressure state in glottis wave characteristic so that required
The glottis wave parameter obtained not only possesses theoretical direction, and with specific physical significance;Finding out can describe in sonification system
The glottis wave parameter of pressure correlation sound source characteristic establishes the inner link of sound source characteristics and physiological characteristic, is identified with this feature
It with the correlation of pressure variance factor, indicates the mode of vibration of vocal cords and has physical significance, finally to voice exception shape
The detection of state improves the precision and reliability of speech recognition system.
Present invention could apply to environment inside cars, judge its pressure shape by detecting the voice data of driver and passenger
State, by transmission equipment by status information feedback to execution module, and then adopted an effective measure automatically by execution module as:It reminds
Driver takes care, notifies that nearby vehicle pays attention to avoidance etc. using car networking, so as to reach protection safety of life and property
Purpose.
Description of the drawings
Fig. 1 is the basic flow chart of the present invention;
Fig. 2 is the basic flow chart for obtaining svm classifier model.
Fig. 3 is iteration self-adapting liftering (IAIF) technical architecture plan that the present invention establishes;
Fig. 4 is the ROC curve of five kinds of parameters in embodiment 1;
Fig. 5 is each parameter value of ROC curve of five kinds of features in embodiment 1, and wherein AUC is area under the curve, and SE is standard
Difference;CL is confidence interval;
Fig. 6 is to be tested in embodiment 1 by 50 wheels, the average recognition rate of the grader drawn.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention
Technical solution, and be not intended to limit the protection scope of the present invention and limit the scope of the invention.
As shown in Figure 1, a kind of voice method for detecting abnormality based on sound source characteristics, includes the following steps:
(1), sensor real-time collecting voice data is passed through;
(2), the voice segments and noise segment of voice data are judged by end-point detection, to decide whether to carry out next step voice
Signal processing works;
Present invention uses the sound end detecting method based on energy and short-time zero-crossing rate, to efficiently differentiate voice
Section.The above method is the detection method of existing maturation, is not elaborated herein.
(3), the voice data framing adding window to obtained voice segments, and high frequency preemphasis processing is carried out to each frame;
Preemphasis:The average power spectra of voice signal is influenced by glottal excitation and mouth and nose radiation, and front end about exists
More than 800Hz decays by 6dB/oct (octave), and the more high corresponding ingredient of frequency is smaller, to be carried out therefore to voice signal
Its high frequency section is promoted by the limited exciter response high-pass filter of a single order before analysis.
Framing:Since voice signal has short-term stationarity, we can carry out sub-frame processing to signal.From macroscopically
It sees, it short enough must ensure that signal is stable in frame, i.e., the length of a frame should be less than the length of a phoneme.Normally
Under word speed, the duration of phoneme is about 50-200 milliseconds, so frame length is generally less than 50 milliseconds.From microcosmic, it
It must include enough vibration periods again, because Fourier transformation will analyze frequency, only repeatedly enough times ability
Analyze frequency.The fundamental frequency of voice, male voice is in 100 hertzs, and for female voice in 200 hertzs, it is exactly 10 milliseconds to be converted into the cycle
With 5 milliseconds.Since a frame will include multiple cycles, so generally taking at least 20 milliseconds.
Adding window:Using Hamming window to a frame voice adding window, it not only has preferable frequency resolution, can also reduce frequency spectrum
Leakage, so as to reduce the influence of Gibbs' effect.
(4), as shown in figure 3, voice data for voice segments, obtains glottis ripple using iteration self-adapting liftering and believe
Number;
(a), channel model is established using iteration self-adapting liftering (IAIF);
Iteration self-adapting liftering eliminates the influence that glottal excitation is brought in primary speech signal frequency spectrum;
(b) and then by the method for liftering (IF) influence of formant is eliminated;
Acoustic model is established by linear predictive coding (LPC) and discrete complete or collected works' model (DAP) exactly, finally using inverse
Filter (IF) to obtain glottis ripple signal, as shown in Figure 2.
(5), extract the characteristic parameter normalized amplitude business of glottis ripple and glottis closing time compares data;
Under working pressure conditions, since vocal cords contraction of muscle causes the randomization of vocal cord vibration.So as to cause
The variation of glottis interior air-flow fluidised form so that voice signal is made a variation.The variation of these vocal cords features will be reacted in glottis
In the feature of ripple, hence glottis ripple can reflect operating pressure to a certain extent.We use normalized amplitude business
(NAQ) and glottis closing time ratio (CPR) characterizes the intrinsic propesties of glottis ripple, and the feature of proposition has specific physical significance,
Reflect the different vibration mode of vocal cords during speech production.
The characteristic parameter one of glottis ripple:Normalized amplitude business, the main closed manners for reflecting vocal cords, extracting method is such as
Under:
NAQ is normalized amplitude business in formula;T is pitch period;AQ is amplitude business, is glottis ripple peak swing and its correspondence
The ratio between the maximum negative peak of first derivative;
F in formulaacFor the maximum crest value of glottal;Dpeak is the maximum negative peak that glottal corresponds to first derivative.
Since the instantaneous moment of glottis closure or openness need not be measured so that AQ becomes more readily available, but the value of AQ depends on
The measurement of signal fundamental frequency (F0), therefore in formula (1), normalize to obtain NAQ by pitch period, eliminate and fundamental frequency is measured
Dependence.
The characteristic parameter two of glottis ripple:Glottis closing time ratio (CPR).CPR parameters reflect the Closure states of glottis stage and account for sound
The ratio of door total opening time is mainly shown as the crooked degree of glottis signal in glottis ripple..
Glottis closing time is as follows than the extracting method of data:
CPR compares data for glottis closing time in formula;CP is the Closure states of glottis stage;O is glottis total opening time.
(6), by the data extracted input, trained SVM models are classified;
Support vector machines (SVM) plays an important role [8] always in area of pattern recognition, and so-called supporting vector refers to that
A bit in the training sample point of interval area edge.SVM is classified using linear processes hyperplane.SVM is built upon statistics
The VC dimensions of the theories of learning are theoretical and Structural risk minization basis on, according to limited sample information model complexity
Seek between (i.e. to the study precision of specific training sample) and learning ability (ability for identifying arbitrary sample without error)
Optimal compromise, in the hope of obtaining best Generalization Ability.Support vector machines is substantially the Nonlinear Classifier of two classes, very
It is suitble to unique recognition point of stressed speech:(1) due to being not at every moment in pressure state in speaker's voiced process
In, pressure shows as of short duration instantaneity in continuous speech, so only a small amount of samples can be defined as the variation under pressure
Voice, therefore stressed speech identification is usually small sample problem.(2) a kind of typical two class of the stressed speech identification of causalgia is known
Other problem.We establish the Classification and Identification model based on SVM, relevant in speaker, since each subject is spoken
The sample size of people is relatively fewer, is typical small sample problem, so in this case, SVM models achieve relatively good
Recognition effect.
Classification is identified to voice under stressed speech and normal condition by svm classifier model in the present invention, realizes institute
Sound source parameter is for becoming the evaluation of the susceptibility of metachromatic state in proposition method, so as to test the validity of proposed method
Card.
(7), obtain tag along sort, for judging speaker's situation, export speaker's situation label, transfer to execution module into
Row feedback.
Embodiment 1
We used the database that Fuji Tsu collects, wherein the speech samples comprising 11 speakers (4
Male and 7 women).For simulation psychological pressure generate specific situation, for speaker be provided with three kinds of different tasks, with
Operator carries out telephone talk when progress, to simulate the situation of pressure in the phone.
Three task (A) high concentrations being related to are (it is required that speaker completes to include solving logic puzzle and finds two figures
The task of difference between piece);(B) time pressure (it is required that speaker answers a question under time pressure);(C) risk taking behavior (is adopted
Risk task is taken, to assess serious hope of the speaker to pecuniary gain).For each speaker, there are four types of the dialogues of different task.
In talking with twice, spokesman is required to complete task within the limited time, and in other dialogues, without any task,
It can easily chat.
The part intercepted from speech is vowel/a/ ,/i/ ,/u/ ,/e/ ,/o/.These experiments are for every spokesman
It carries out, all results are all determined by spokesman.11 experimental subjects for testing to choose are in speaker system
It carries out, the number of sample depends on speaker, and total speech samples number is 700.
In the present invention, used verification data are all from telephone communication data, wherein 100 subjects (male 50 people,
50 people of female) participate in experiment.In experiment, operator is chatted by phone and each subject, everyone average four groups of dialogues, every group
Chatting time is 10 minutes, and records most real voice communication data.In four groups of dialogues, two groups are stopping under light state
It chats, in addition in two groups of dialogues, subject is applied in different types of pressure respectively, and the pressure of application includes:(1) multiplexing is appointed
Business;(2) it is pressed for time;(3) venture, detail such as table 1.The real speech data that subject people speaks under pressure state
It is logged for the verification of pressure detection method validity.
Table 1
In order to verify the validity of proposed method, the present invention is using Receiver operating curve (ROC), to evaluate not
The recognition performance of same parameter, as shown in Figure 4, Figure 5, ROC curve are that (cut off value is determined according to a series of two different mode classifications
Determine threshold), with true positive rate (sensitivity) for ordinate, false positive rate (1- specificities) is the curve that abscissa is drawn.ROC curve
Closer to the upper left corner, area under the curve (AUC) is bigger, and method for expressing recognition performance is better, and accuracy is higher.
True positives (TPR):
False positive (FPR):
TP:True positives;TN:True negative;FP:False positive;FN:False negative
The Source Model parameter of proposition compared with traditional parameter, is passed through the average knowledge in pressure detecting by the present invention
It does not compare in rate, illustrates that the method based on speech production modeling has apparent advantage in pressure detection method, so as to reach
Distinguish the purpose of normal condition and abnormality.Three traditional speech parameters include, and fundamental frequency, mel-frequency cepstrum coefficient are thrown
Object line frequency spectrum parameter (F0, MFCC, PSP) is used as experimental comparison group.
In sorting phase, NAQ and CPR are established SVM models as bivector, choose 125 groups of samples as training sample
This, 125 groups of samples are used as test group, test and have selected the speech samples of 7 different speakers (4 3 female of man) to be from database,
It is intended to eliminate the variation of the speech parameter caused by individual specificity, while F0, MFCC, PSP with the shape of one-dimensional sample
Formula is trained, as experimental comparison group.As shown in fig. 6, after 50 wheel experiments, parameter is calculated in SV disaggregated models
Average recognition rate.As can be seen that NAQ with CPR sound source characteristics compared with traditional parameters, embodied under abnormality good different
The recognition performance of Chang Yuyin.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, without departing from the technical principles of the invention, several improvement and deformation can also be made, these are improved and deformation
Also it should be regarded as protection scope of the present invention.
Claims (6)
1. a kind of voice method for detecting abnormality based on sound source characteristics, it is characterised in that include the following steps:
(1), sensor real-time collecting voice data is passed through;
(2), the voice segments and noise segment of voice data are judged by end-point detection, to decide whether to carry out next step voice signal
Handle work;
(3), the voice data framing adding window to obtained voice segments, and high frequency preemphasis processing is carried out to each frame;
(4), for the voice data of voice segments, glottis ripple signal is obtained using iteration self-adapting liftering;
(5), extract the characteristic parameter normalized amplitude business of glottis ripple and glottis closing time compares data;
(6), by the data extracted input, trained SVM models are classified;
(7), tag along sort is obtained, for judging speaker's situation, exports speaker's situation label, execution module is transferred to carry out anti-
Feedback.
2. a kind of voice method for detecting abnormality based on sound source characteristics according to claim 1, it is characterised in that:The step
Suddenly adding window uses Hamming window to a frame voice adding window in (3).
3. a kind of voice method for detecting abnormality based on sound source characteristics according to claim 1, it is characterised in that:The step
Suddenly (3) medium-high frequency preemphasis processing promotes its high frequency section by the limited exciter response high-pass filter of a single order.
4. a kind of voice method for detecting abnormality based on sound source characteristics according to claim 1, it is characterised in that:The step
Suddenly the step of glottis ripple signal is obtained in (4) is as follows:
(a), channel model is established using iteration self-adapting liftering;
Iteration self-adapting liftering eliminates the influence that glottal excitation is brought in primary speech signal frequency spectrum;
(b) and then by the method for liftering the influence of formant is eliminated;
Acoustic model is established by linear predictive coding and discrete complete or collected works' model exactly, finally obtains glottis using liftering
Ripple signal.
5. a kind of voice method for detecting abnormality based on sound source characteristics according to claim 1, it is characterised in that:The step
Suddenly the extracting method of the characteristic parameter normalized amplitude business of glottis ripple is as follows in (5):
<mrow>
<mi>N</mi>
<mi>A</mi>
<mi>Q</mi>
<mo>=</mo>
<mfrac>
<mrow>
<mi>A</mi>
<mi>Q</mi>
</mrow>
<mi>T</mi>
</mfrac>
<mo>=</mo>
<mfrac>
<msub>
<mi>f</mi>
<mrow>
<mi>a</mi>
<mi>c</mi>
</mrow>
</msub>
<mrow>
<mi>d</mi>
<mi>p</mi>
<mi>e</mi>
<mi>a</mi>
<mi>k</mi>
<mo>&CenterDot;</mo>
<mi>T</mi>
</mrow>
</mfrac>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
NAQ is normalized amplitude business in formula;T is pitch period;AQ is amplitude business, is glottis ripple peak swing corresponding with its one
The ratio between maximum negative peak of order derivative;
<mrow>
<mi>A</mi>
<mi>Q</mi>
<mo>=</mo>
<mfrac>
<msub>
<mi>f</mi>
<mrow>
<mi>a</mi>
<mi>c</mi>
</mrow>
</msub>
<mrow>
<mi>d</mi>
<mi>p</mi>
<mi>e</mi>
<mi>a</mi>
<mi>k</mi>
</mrow>
</mfrac>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
F in formulaacFor the maximum crest value of glottal;Dpeak is the maximum negative peak that glottal corresponds to first derivative.
6. a kind of voice method for detecting abnormality based on sound source characteristics according to claim 1, it is characterised in that:The step
Suddenly glottis closing time is as follows than the extracting method of data in (5):
<mrow>
<mi>C</mi>
<mi>P</mi>
<mi>R</mi>
<mo>=</mo>
<mfrac>
<mrow>
<mi>C</mi>
<mi>P</mi>
</mrow>
<mi>O</mi>
</mfrac>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
CPR compares data for glottis closing time in formula;CP is the Closure states of glottis stage;O is glottis total opening time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810126670.6A CN108074585A (en) | 2018-02-08 | 2018-02-08 | A kind of voice method for detecting abnormality based on sound source characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810126670.6A CN108074585A (en) | 2018-02-08 | 2018-02-08 | A kind of voice method for detecting abnormality based on sound source characteristics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108074585A true CN108074585A (en) | 2018-05-25 |
Family
ID=62155229
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810126670.6A Pending CN108074585A (en) | 2018-02-08 | 2018-02-08 | A kind of voice method for detecting abnormality based on sound source characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108074585A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112735386A (en) * | 2021-01-18 | 2021-04-30 | 苏州大学 | Voice recognition method based on glottal wave information |
CN113824843A (en) * | 2020-06-19 | 2021-12-21 | 大众问问(北京)信息科技有限公司 | Voice call quality detection method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011149558A2 (en) * | 2010-05-28 | 2011-12-01 | Abelow Daniel H | Reality alternate |
CN102324229A (en) * | 2011-09-08 | 2012-01-18 | 中国科学院自动化研究所 | Method and system for detecting abnormal use of voice input equipment |
CN103730130A (en) * | 2013-12-20 | 2014-04-16 | 中国科学院深圳先进技术研究院 | Detection method and system for pathological voice |
US9338547B2 (en) * | 2012-06-26 | 2016-05-10 | Parrot | Method for denoising an acoustic signal for a multi-microphone audio device operating in a noisy environment |
-
2018
- 2018-02-08 CN CN201810126670.6A patent/CN108074585A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011149558A2 (en) * | 2010-05-28 | 2011-12-01 | Abelow Daniel H | Reality alternate |
CN102324229A (en) * | 2011-09-08 | 2012-01-18 | 中国科学院自动化研究所 | Method and system for detecting abnormal use of voice input equipment |
US9338547B2 (en) * | 2012-06-26 | 2016-05-10 | Parrot | Method for denoising an acoustic signal for a multi-microphone audio device operating in a noisy environment |
CN103730130A (en) * | 2013-12-20 | 2014-04-16 | 中国科学院深圳先进技术研究院 | Detection method and system for pathological voice |
Non-Patent Citations (1)
Title |
---|
李宁: "基于声学参数和支持向量机的病理噪音分类研究", 《华东师范大学》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113824843A (en) * | 2020-06-19 | 2021-12-21 | 大众问问(北京)信息科技有限公司 | Voice call quality detection method, device, equipment and storage medium |
CN113824843B (en) * | 2020-06-19 | 2023-11-21 | 大众问问(北京)信息科技有限公司 | Voice call quality detection method, device, equipment and storage medium |
CN112735386A (en) * | 2021-01-18 | 2021-04-30 | 苏州大学 | Voice recognition method based on glottal wave information |
CN112735386B (en) * | 2021-01-18 | 2023-03-24 | 苏州大学 | Voice recognition method based on glottal wave information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hansen et al. | Speaker recognition by machines and humans: A tutorial review | |
US8428945B2 (en) | Acoustic signal classification system | |
CN109044396B (en) | Intelligent heart sound identification method based on bidirectional long-time and short-time memory neural network | |
CN106941005A (en) | A kind of vocal cords method for detecting abnormality based on speech acoustics feature | |
CN101923855A (en) | Test-irrelevant voice print identifying system | |
Sahoo et al. | Silence removal and endpoint detection of speech signal for text independent speaker identification | |
Vikram et al. | Estimation of Hypernasality Scores from Cleft Lip and Palate Speech. | |
Kim et al. | Hierarchical approach for abnormal acoustic event classification in an elevator | |
CN110265063A (en) | A kind of lie detecting method based on fixed duration speech emotion recognition sequence analysis | |
Subhashree et al. | Speech Emotion Recognition: Performance Analysis based on fused algorithms and GMM modelling | |
CN108074585A (en) | A kind of voice method for detecting abnormality based on sound source characteristics | |
Whitehill et al. | Whosecough: In-the-wild cougher verification using multitask learning | |
Iwok et al. | Evaluation of Machine Learning Algorithms using Combined Feature Extraction Techniques for Speaker Identification | |
CN110415707B (en) | Speaker recognition method based on voice feature fusion and GMM | |
Kalimoldayev et al. | Voice verification and identification using i-vector representation | |
Thomas et al. | Data-driven voice soruce waveform modelling | |
Warule et al. | Hilbert-Huang Transform-Based Time-Frequency Analysis of Speech Signals for the Identification of Common Cold | |
CN113823267B (en) | Automatic depression recognition method and device based on voice recognition and machine learning | |
Islam et al. | Neural-Response-Based Text-Dependent speaker identification under noisy conditions | |
Shofiyah et al. | Voice recognition system for home security keys with mel-frequency cepstral coefficient method and backpropagation artificial neural network | |
Komlen et al. | Text independent speaker recognition using LBG vector quantization | |
Estrebou et al. | Voice recognition based on probabilistic SOM | |
Godino-Llorente et al. | Automatic detection of voice impairments due to vocal misuse by means of gaussian mixture models | |
Arsikere et al. | Speaker recognition via fusion of subglottal features and MFCCs. | |
Pandiaraj et al. | A confidence measure based—Score fusion technique to integrate MFCC and pitch for speaker verification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180525 |