CN106504772B

CN106504772B - Speech-emotion recognition method based on weights of importance support vector machine classifier

Info

Publication number: CN106504772B
Application number: CN201610969948.7A
Authority: CN
Inventors: 黄永明; 吴奥; 章国宝
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2016-11-04
Filing date: 2016-11-04
Publication date: 2019-08-20
Anticipated expiration: 2036-11-04
Also published as: CN106504772A

Abstract

The invention discloses a kind of speech-emotion recognition methods based on weights of importance support vector machine classifier, the quantization including training sample and test sample deviation, the foundation of weights of importance Modulus Model and the foundation of the SVM based on weights of importance coefficient.This method quantifies the deviation of training sample and test sample on the basis of weights of importance coefficient, to carry out deviation adjusting in classifier level.There is the weights of importance Modulus Model of deviation training sample in emotional semantic classification and test sample by constructing in the present invention, quantify the deviation of training sample and test sample in speech samples, utilize the SVM classifier based on weights of importance Modulus Model, deviation adjusting is carried out by adjusting Optimal Separating Hyperplane in classifier level, improves the accuracy and stability of speech emotion recognition.

Description

Speech-emotion recognition method based on weights of importance support vector machine classifier

Technical field

The present invention relates to a kind of speech-emotion recognition methods based on weights of importance support vector machine classifier, belong to language Sound emotion recognition technical field.

Background technique

With the fast development of information technology and the rise of various intelligent terminals, existing man-machine interactive system is faced with day Beneficial acid test.In order to overcome the obstacle of human-computer interaction, make human-computer interaction it is more convenient, naturally, the emotion intelligence of machine is just It is increasingly subject to the attention of each area research person.Voice as the high efficiency interactive medium in human-computer interaction now with development potential, Carry emotion information abundant.Important subject of the speech emotion recognition as emotion intelligence is surveyed in remote teaching, auxiliary Lie, automatic remote telephone service center and clinical medicine, intelligent toy, before smart phone etc. has wide application Scape has attracted more and more research institutions and the extensive concern of researcher.

In improving speech emotion recognition, training sample is different from the environment of the acquisition time of test sample and acquisition, There is the offsets of covariant between training sample and test sample, in order to improve the precision and robust of speech emotion recognition Property, compensating to deviation existing for it seems most important.The deviation generated by speech sample environment is excluded, from raw tone The redundancies such as the unrelated speech content information of similar emotion are rejected in data, extract effective emotion information, are to improve language The key points and difficulties of sound emotion recognition system robustness.

As a kind of emerging voice technology, weights of importance Modulus Model is because of its flexibility in speech signal processing And validity, increasingly obtain the extensive attention of researcher.For classification problem, quantify on the basis of weights of importance coefficient The deviation of training sample and test sample reduces environmental factor to voice feelings to carry out deviation adjusting in classifier level The other influence of perception improves the accuracy and stability of speech emotion recognition.It is this in classifier level compensation training sample The method of this existing covariant deviation between test sample has great importance in speech emotion recognition research.

Summary of the invention

Technical problem: the present invention provides a kind of robustness that can be improved speech emotion recognition, by classifier level Existing covariant deviation based on weights of importance support vector machine classifier between compensation training sample and test sample Speech-emotion recognition method can reduce sample and record environment and record the influences of the irrelevant informations for speech recognition such as people, can To improve the precision and robustness of speech emotion recognition.

Technical solution: the speech-emotion recognition method of the invention based on weights of importance support vector machine classifier, packet Include following steps:

Step 1: the voice signal of input being pre-processed, and extracts feature vector d_i；

Step 2: the sample set of input is divided into training sample setAnd test sample collectionAnd from the survey Try sample setIn randomly select b template point, c_l, compositionWhereinIt is one that the training sample is concentrated Sample,It is the sample that the test sample is concentrated, n_trIt is that training sample concentrates number of samples, n_teIt is test sample collection Middle number of samples, i are that training sample concentrates sample serial number, and j is that test sample concentrates sample serial number, and l is the test randomly selected Sample serial number in sample set；

Step 3: calculating the best Gaussian kernel width of basic functionDetailed process is as follows:

Step 3.1: it is respectively 0.1,0.2 ..., 1 that default basic function Gaussian kernel width σ, which is arranged,；

Step 3.2: precompensation parameter vector α is calculated according to following below scheme:

Step 3.2.1: it calculates according to the following formulaBuilding withFor b × b matrix of element

It is the matrix of b × b,It isIn element, l, l '=1,2 ..., b, c_l′It is the test specimens randomly selected This collectionIn a bit, l ' is that the test sample that randomly selects concentrates sample index；

Step 3.2.2: it calculates according to the following formulaBuilding withFor the b dimensional vector of element

It is the vector of b dimension,It isIn element；

Step 3.2.3: precompensation parameter vector α is calculated:

It is constraint condition with α >=0, calculates optimization problemIt calculates following formula and takes minimum The value of parameter vector α when value:

WhereinIt is the approximating variances expectation of weights of importance, α ' is the transposed vector of vector α,It is vectorTurn Set vector；

Step 3.3: cross validation calculates the best Gaussian kernel width of basic function

Training sample setAnd test sample collectionIt is respectively classified into R subsetWithUnder Formula calculates the approximating variances expectation of r-th of weights of importance under cross validation:

WhereinIt is the approximating variances expectation of r-th weights of importance under cross validation, r=1,2 ... R,It is R training sample subset,It is r-th of test sample subset,It isSample number,It isSample number, s^trIt isIn a sample, s^teIt isIn a sample,It is sample s^trWeights of importance estimation,It is sample This s^teWeights of importance estimation, calculation formula is as follows:

Wherein α_lIt is first of element in the precompensation parameter vector α being calculated in step 3.2.3；

By preset 10 σ values: 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1 substitutes into following formula calculating respectively The approximating variances expectation of weights of importance under cross validationIt will be the smallestIt is most preferably high as basic function to be worth corresponding σ This core width

Wherein r=1,2 ... R；

Step 4: being constraint condition with α >=0, calculate optimization problemObtain optimal parameter vector

Wherein WhereinIt is matrixIn l row l ' column element,It is one-dimensional vectorIn first of element；

Step 5: it is calculate by the following formula weights of importance β (s):

WhereinFor optimal parameter vectorIn element, s be training test sample point in a sample, s ∈ D, D are The set of training test sample point；

Step 6: establish weights of importance SVM classifier:

It is added on the slack variable ξ of standard SVM classifier, obtains as follows using the weights of importance β (s) as coefficient SVM classifier expression formula:

The SVM classifier expression formula and following constraint condition are constituted into weights of importance SVM classifier:

y_i(<w,d_i>+b)≥1-ξ_i,ξ_i≥0,1≤i≤L

Wherein, w is the standard vector of Optimal Separating Hyperplane, | w | be w mould it is long, C is punishment parameter, d_iIt is by pre-processing it Training sample set afterwardsThe feature vector of extraction, y_i∈ {+1, -1 } is class label, they form training sample (d₁,y₁), (d₂,y₂) ..., (d_l,y_l), β_iIt is training sample point (d_i,y_i) weights of importance, ξ_iIt is training sample point (d_i,y_i) pine Relaxation variable；

Step 7: the weights of importance SVM classifier that the feature vector and the step 6 extracted using the step 1 are established Carry out the identification of speech emotional.

Further, in the method for the present invention, the pretreatment in the step 1 includes the following steps:

Step 1.1: preemphasis being carried out as the following formula to audio digital signals X according to the following formula, the voice letter after obtaining preemphasis Number

WhereinIndicate the discrete point serial number of audio digital signals X,For the length of audio digital signals X,WithAudio digital signals X is respectively indicatedWithValue on a discrete point,Language after indicating preemphasis Sound signal?Value on a discrete point, X (- 1)=0；

Step 1.2: using the method for overlapping segmentation to the voice signal after preemphasisCarry out framing, previous frame starting point with The distance of latter frame starting point is known as frame shifting, and frame pipettes 8ms herein, i.e., in sample rate F_sTake under=16kHz at 128 points, each frame length 16ms is taken, that is, takes at 256 points,Speech frame set is obtained by framingThe speech frame setIn The data of n-th of discrete point of k' speech frame are as follows:

WhereinFor the kth in speech frame set ' a speech frame, n indicates speech frame discrete point serial number, and k' is speech frame sequence Number, K' is speech frame totalframes, and is met:

It indicatesIt is rounded downwards；

Step 1.3: to each speech frameLength of window is selected to carry out adding window for 256 points of Hamming window w Processing, obtains adding window speech frame x_k'Are as follows:

Wherein x_k'(n)、W (n) respectively indicates x_k'、Value of the w on n-th of discrete point, length of window are 256 points of Hamming window function are as follows:

Step 1.4: to each adding window speech frame x_k', 1≤k'≤K', calculating short-time energy E_k'With short-time zero-crossing rate Z_k':

Wherein E_k'Indicate adding window speech frame x_k'Short-time energy, Z_k'Indicate x_k'Short-time zero-crossing rate, x_k'It (n) is adding window language Sound frame x_k'Value on n-th of sampled point, x_k'It (n-1) is x_k'Value on (n-1)th sampled point, sgn [x_k'(n)]、sgn [x_k'It (n-1)] is respectively x_k'(n)、x_k'(n-1) sign function, it may be assumed that

Wherein λ is the independent variable of above-mentioned sign function；

Step 1.5: determining short-time energy threshold value t_EWith short-time zero-crossing rate threshold value t_Z:

Wherein K' is speech frame totalframes；

Step 1.6: to each adding window speech frame, making first order differentiation with short-time energy first, i.e., be greater than short-time energy value Threshold value t_EAdding window speech frame differentiate efficient voice frame labeled as level-one, the smallest level-one of frame number is differentiated that efficient voice frame is made For the start frame of the currently active speech frame set, differentiate efficient voice frame as the currently active voice the maximum level-one of frame number The end frame of frame set；

Then make second level differentiation with short-time zero-crossing rate, i.e., the currently active speech frame set is pressed using start frame as starting point Differentiate frame by frame according to the descending sequence of frame number, short-time zero-crossing rate is greater than threshold value t_ZAdding window speech frame be labeled as effective language Sound frame, and differentiate frame by frame using end frame as starting point according to the ascending sequence of frame number, short-time zero-crossing rate is greater than threshold value t_ZAdding window speech frame be labeled as efficient voice frame；

The efficient voice frame set obtained after two-stage is differentiated is denoted as { p_k}_1≤k≤K, wherein k is efficient voice frame number, K For efficient voice frame totalframes, p_kFor k-th of efficient voice frame in efficient voice frame set.

Further, the feature vector d in the method for the present invention, in the step 1_iIt is to extract as follows:

With the short-time characteristic of voice frame level, the first-order difference and second differnce of short-time characteristic are right as low order descriptor The low order descriptor of sentence carries out that statement level feature is calculated.

The statistical nature of sentence sample is that (such as fundamental frequency, frame energy, mel-frequency fall with the short-time characteristic of voice frame level Spectral coefficient and wavelet packet cepstrum coefficient feature proposed in this paper etc.) as low order descriptor (Low Level Descriptor, LLD), statement level characteristic parameter obtained from carrying out statistics calculating as all short-time characteristics to sentence.

Common statistic is listed in Table 1 below in speech emotional feature extraction:

Table 1

Wherein short-time characteristic are as follows: fundamental frequency, logarithm frame energy, frequency band energy (0-250Hz, 0-650Hz, 250- 650Hz, 1-4kHz), the cepstrum energy of 26 mel-frequency bands, 13 rank mel cepstrum coefficients, minimax Meier Correlated Spectroscopy position It sets, 90%, 75%, 50%, 25% Meier correlation spectrum roll-off point.

The utility model has the advantages that compared with prior art, the present invention having the advantage that

In existing speech-emotion recognition method, do not exist between training sample and test sample in practical application Covariant offset account for, so as to cause actual speech emotion recognition application effect than speech emotional under experimental situation The effect of identification is worse.Weights of importance Modulus Model is established in the present invention, substantially considers the test in practical application Existing each species diversity between sample and training sample, i.e., to covariant existing between training sample and test sample deviate into Row quantization, it is the quantized value that weights of importance factor beta, which is calculated, this can intuitively show training sample and test specimens Deviation between this.It can be inclined by covariant in the extraction of subsequent speech emotional feature, the establishment process of classifier Quantized value β is moved, deviation is compensated, so that the deviation for excluding to generate by speech sample environment knows speech emotional significantly Other influence.Compared with the speech-emotion recognition method of other deviation compensations, weights of importance Modulus Model is established, for instruction The deviation practiced between sample and test sample is quantified, and the computational complexity and difficulty of covariant compensation are reduced.

Based on weights of importance Modulus Model is established, in SVM classifier, by increasing weights of importance coefficient, to instruction The deviation practiced between sample and test sample compensates.Compared with other SVM classifier recognition methods, this method is in classics Weights of importance is introduced in the objective function of SVM classifier, the technology using on-fixed penalty factor is equivalent to, according to importance Weight coefficient, the data big for weight increase penalty coefficient and reduce environmental factor to be adjusted to Optimal Separating Hyperplane Influence to speech emotion recognition improves the accuracy and stability of speech emotion recognition in practical application, than others Standard SVM has better classifying quality.

Detailed description of the invention

Fig. 1 is weights of importance SVM training flow chart of the invention.

Fig. 2 is weights of importance flow chart of the invention.

Specific embodiment

Below with reference to embodiment and Figure of description, the present invention is further illustrated.

Speech emotional characteristic extraction method based on speech content robust of the invention, comprising the following steps:

Step 1: the sample set of input being pre-processed, pretreated input training sample set is obtainedWith Test sample collectionAnd the test sample collection after pretreatmentIn b template point randomly selectingWherein It is the sample that training sample is concentrated after pre-processing,It is the sample that test sample is concentrated after pre-processing, n_trIt is trained Sample set number of samples, n_teIt is test sample collection number of samples, c_lBe fromIn the template point that randomly selects, i is trained sample This concentration sample index, j are that test sample concentrates sample index, and l is that the test sample randomly selected concentrates sample index.

Wherein pretreatment specifically comprises the following steps:

Step 1.1: preemphasis being carried out as the following formula to audio digital signals X, the voice signal after obtaining preemphasis

Step 1.2: using the method for overlapping segmentation to the voice signal after preemphasisCarry out framing, previous frame starting point with The distance of latter frame starting point is known as frame shifting, and frame pipettes 8ms herein, i.e., in sample rate F_sTake under=16kHz at 128 points, each frame length 16ms is taken, that is, takes at 256 points,Speech frame set is obtained by framing

It indicatesIt is rounded downwards；

It is subsequent that end-point detection is completed using well known energy zero-crossing rate dual-threshold judgement method, the specific steps are as follows:

Wherein K' is speech frame totalframes；

Step 1.6: to each adding window speech frame, making first order differentiation with short-time energy first, short-time energy value is greater than threshold Value t_EAdding window speech frame differentiate efficient voice frame labeled as level-one, using the smallest level-one of frame number differentiate efficient voice frame as The maximum level-one of frame number is differentiated efficient voice frame as the currently active speech frame by the start frame of the currently active speech frame set Then the end frame of set makees second level differentiation with short-time zero-crossing rate, i.e., to the currently active speech frame set, be with start frame Point differentiates frame by frame according to the descending sequence of frame number, and short-time zero-crossing rate is greater than threshold value t_ZAdding window speech frame be labeled as Efficient voice frame, and differentiate frame by frame using end frame as starting point according to the ascending sequence of frame number, short-time zero-crossing rate is big In threshold value t_ZAdding window speech frame be labeled as efficient voice frame, the efficient voice frame set obtained after two-stage is differentiated is denoted as {s_k}_1≤k≤K, wherein k is efficient voice frame number, and K is efficient voice frame totalframes, s_kFor k-th in efficient voice frame set Efficient voice frame.

Step 2: calculating the best Gaussian kernel width of basic function

For the degree of closeness of training sample data and the distribution of test sample data, weights of importance β (s) can be used To indicate:

Wherein p_tr(s) training sample set after pre-processing is indicatedTraining sample distribution density, p_te(s) pre- place is indicated Test sample collection after reasonTest sample distribution density.

Step 2.1: it is respectively 0.1,0.2 ..., 1 that default basic function Gaussian kernel width σ, which is arranged,；

Step 2.2: calculate precompensation parameter vector α:

β (s) is simulated by linear model are as follows:

α=(α₁,α₂,...,α_l) ',It is basic function,B and It can be according to sampleWithIt determines.

Calculate variance J₀(α):

Last is constant term to above formula, can be ignored, first two are indicated using J (α):

Wherein α ' is the transposed vector of vector α, and H is the matrix of b × b:H is b dimension Vector:

The expectation that J (α) is approached using the method for average obtains the approximating variances expectation of weights of importance

WhereinIt is the matrix of b × b: It is the vector of b dimension: It is vectorTransposed vector.

The nonnegativity restrictions for considering weights of importance β (x), is converted into optimization problem:

Constraint condition: α >=0

The optimization problem is calculated, parameter vector α is the optimal solution of the planning problem.

It is calculatingWithWhen,It is the gaussian kernel function that a core width is σ,

It willIt substitutes intoWithIn, it can calculate:

WhereinForIn element,ForIn element, l, l '=1,2 ..., b, c_l′Be fromIn it is random The template point of selection, l ' are that the test sample randomly selected concentrates sample index, and σ is 1 in preset value.

Step 2.3: cross validation calculates best basic function Gaussian kernel width

The training sample set after pretreatmentAnd test sample collectionIt is respectively classified into R subsetWithCalculate following formula:

WhereinIt is the approximating variances expectation of r-th weights of importance under cross validation, r=1,2 ... R,It is R training sample subset,It is r-th of test sample subset,It isSample number,It isSample number, s^trIt isIn a sample, s^teIt isIn a sample,It is sample s^trWeights of importance estimation,It is sample This s^teWeights of importance estimation.

Calculate the approximating variances expectation of weights of importance under cross validation

Wherein r=1,2 ... R.

Pass through minimumIt solves, i.e. following formula, obtains optimal solutionAs best basic function Gaussian kernel width

Wherein σ is respectively preset value 0.1,0.2 ..., 1.

Step 3: calculating optimal parameter vector

Utilize Gaussian bases obtained in step 2 and best basic function Gaussian kernel widthIt substitutes into and calculatesWithSuch as Following formula:

Wherein l, l '=1,2 ..., b；

Optimization problem is calculated using formula (9) (10)Constraint condition is α >=0, can be calculated To optimal parameter vector

Step 4: calculating approximate significance weight

β (x) can be obtained by step 2 to be modeled as by linear modelSubstitute into basic function

It can obtain:

WhereinFor vectorIn element, s is a sample in training test sample point, and s ∈ D, D are training test The set of sample point.

Step 5: establish weights of importance SVM classifier model:

It is added on the slack variable ξ of standard SVM classifier using weights of importance as coefficient:

Wherein constraint condition: y_i(<w,d_i>+b)≥1-ξ_i,ξ_i>=0,1≤i≤L, w are the standard vectors of Optimal Separating Hyperplane, | w | be w mould it is long, ξ is slack variable, and C is punishment parameter, d_iIt is by training sampleThe feature vector of extraction, y_i∈{+ It 1, -1 } is class label, they form training sample (d₁,y₁), (d₂,y₂) ..., (d_l,y_l), β_iIt is training sample point (d_i,y_i) Weights of importance.

Table 1

Wherein short-time characteristic are as follows: fundamental frequency, logarithm frame energy, frequency band energy (0-250Hz, 0-650Hz, 250- 650Hz, 1-4kHz), the cepstrum energy of 26 mel-frequency bands, 13 rank mel cepstrum coefficients, minimax Meier Correlated Spectroscopy position It sets, 90%, 75%, 50%, 25% Meier correlation spectrum roll-off point.Formula (13) and its constraint condition are weights of importance SVM points Class device model.

Above-described embodiment is only the preferred embodiment of the present invention, it should be pointed out that: for the ordinary skill of the art For personnel, without departing from the principle of the present invention, several improvement and equivalent replacement can also be made, these are to the present invention Claim improve with the technical solution after equivalent replacement, each fall within protection scope of the present invention.

Claims

1. a kind of speech-emotion recognition method based on weights of importance support vector machine classifier, which is characterized in that this method The following steps are included:

Step 2: the sample set of input is divided into training sample setAnd test sample collectionAnd from the test specimens This collectionIn randomly select b template point, c_l, compositionWhereinIt is the sample that the training sample is concentrated,It is the sample that the test sample is concentrated, n_trIt is that training sample concentrates number of samples, n_teIt is that test sample concentrates sample Number, i are that training sample concentrates sample serial number, and j is that test sample concentrates sample serial number, and l is the test sample collection randomly selected Middle sample serial number；

It is the matrix of b × b,It isIn element, l, l '=1,2 ..., b, c_l′It is the test sample collection randomly selectedIn a bit, l ' is that the test sample that randomly selects concentrates sample index；

It is the vector of b dimension,It isIn element；

Step 3.2.3: precompensation parameter vector α is calculated:

It is constraint condition with α >=0, calculates optimization problemWhen i.e. calculating following formula is minimized The value of parameter vector α:

WhereinIt is the approximating variances expectation of weights of importance, α ' is the transposed vector of vector α,It is vectorTransposition to Amount；

Training sample setAnd test sample collectionIt is respectively classified into R subsetWithIt counts according to the following formula Calculate the approximating variances expectation of r-th of weights of importance under cross validation:

WhereinIt is the approximating variances expectation of r-th weights of importance under cross validation, r=1,2 ... R,It is r-th of instruction Practice sample set,It is r-th of test sample subset,It isSample number,It isSample number, s^trIt isIn A sample, s^teIt isIn a sample,It is sample s^trWeights of importance estimation,It is sample s^te Weights of importance estimation, calculation formula is as follows:

By preset 10 σ values: 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1, which substitutes into following formula respectively and calculates, to intersect Verify the approximating variances expectation of lower weights of importanceIt will be the smallestIt is worth corresponding σ as the best Gaussian kernel of basic function Width

Wherein r=1,2 ... R；

WhereinL, l '= 1,2 ..., b, whereinIt is matrixIn l row l ' column element,It is one-dimensional vectorIn first of element；

Step 5: it is calculate by the following formula weights of importance β (s):

WhereinFor optimal parameter vectorIn element, s is a sample in training test sample point, and s ∈ D, D are training The set of test sample point；

Step 6: establish weights of importance SVM classifier:

It is added on the slack variable ξ of standard SVM classifier using the weights of importance β (s) as coefficient, obtains following SVM points Class device expression formula:

y_i(<w,d_i>+b)≥1-ξ_i,ξ_i≥0,1≤i≤L

Wherein, w is the standard vector of Optimal Separating Hyperplane, | w | be w mould it is long, C is punishment parameter, d_iIt is by the instruction after pre-processing Practice sample setThe feature vector of extraction, y_i∈ {+1, -1 } is class label, they form training sample (d₁,y₁), (d₂, y₂) ..., (d_l,y_l), β_iIt is training sample point (d_i,y_i) weights of importance, ξ_iIt is training sample point (d_i,y_i) relaxation become Amount；

Step 7: the weights of importance SVM classifier that the feature vector and the step 6 extracted using the step 1 are established carries out The identification of speech emotional.

2. the speech-emotion recognition method according to claim 1 based on weights of importance support vector machine classifier, It is characterized in that, the pretreatment in the step 1 includes the following steps:

Step 1.1: preemphasis being carried out as the following formula to audio digital signals X according to the following formula, the voice signal after obtaining preemphasis

Step 1.2: using the method for overlapping segmentation to the voice signal after preemphasisCarry out framing, previous frame starting point with it is latter The distance of frame starting point is known as frame shifting, and frame pipettes 8ms herein, i.e., in sample rate F_sTake under=16kHz at 128 points, each frame length takes 16ms takes at 256 points,Speech frame set is obtained by framingThe speech frame setMiddle kth ' The data of n-th of discrete point of a speech frame are as follows:

WhereinFor the kth in speech frame set ' a speech frame, n indicates speech frame discrete point serial number, and k' is voice frame number, K' is speech frame totalframes, and is met:

It indicatesIt is rounded downwards；

Step 1.3: to each speech frameLength of window is selected to carry out windowing process for 256 points of Hamming window w, Obtain adding window speech frame x_k'Are as follows:

Wherein x_k'(n)、W (n) respectively indicates x_k'、Value of the w on n-th of discrete point, length of window are 256 points Hamming window function are as follows:

Wherein E_k'Indicate adding window speech frame x_k'Short-time energy, Z_k'Indicate x_k'Short-time zero-crossing rate, x_k'It (n) is adding window speech frame x_k'Value on n-th of sampled point, x_k'It (n-1) is x_k'Value on (n-1)th sampled point, sgn [x_k'(n)]、sgn[x_k'(n- It 1)] is respectively x_k'(n)、x_k'(n-1) sign function, it may be assumed that

Wherein λ is the independent variable of above-mentioned sign function；

Wherein K' is speech frame totalframes；

Step 1.6: to each adding window speech frame, making first order differentiation with short-time energy first, i.e., short-time energy value is greater than threshold value t_E Adding window speech frame differentiate efficient voice frame labeled as level-one, differentiate efficient voice frame as currently the smallest level-one of frame number The maximum level-one of frame number is differentiated efficient voice frame as the currently active speech frame set by the start frame of efficient voice frame set End frame；

Then make second level differentiation with short-time zero-crossing rate, i.e., to the currently active speech frame set, using start frame as starting point, according to frame The descending sequence of serial number differentiates frame by frame, and short-time zero-crossing rate is greater than threshold value t_ZAdding window speech frame be labeled as efficient voice Frame, and differentiate frame by frame using end frame as starting point according to the ascending sequence of frame number, short-time zero-crossing rate is greater than threshold value t_Z Adding window speech frame be labeled as efficient voice frame；

The efficient voice frame set obtained after two-stage is differentiated is denoted as { p_k}_1≤k≤K, wherein k is efficient voice frame number, and K is effective Speech frame totalframes, p_kFor k-th of efficient voice frame in efficient voice frame set.

3. the speech-emotion recognition method according to claim 1 or 2 based on weights of importance support vector machine classifier, It is characterized in that, the feature vector d in the step 1_iIt is to extract as follows:

With the short-time characteristic of voice frame level, the first-order difference and second differnce of short-time characteristic are as low order descriptor, to sentence Low order descriptor carry out that statement level feature is calculated.