CN103810994A

CN103810994A - Method and system for voice emotion inference on basis of emotion context

Info

Publication number: CN103810994A
Application number: CN201310401319.0A
Authority: CN
Inventors: 毛启容; 白李娟; 王丽
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2013-09-05
Filing date: 2013-09-05
Publication date: 2014-05-21
Anticipated expiration: 2033-09-05
Also published as: CN103810994B

Abstract

The invention discloses a method and system for voice emotion inference on basis of an emotion context. The method comprises the steps of extracting context voice emotion characteristics and traditional voice emotion characteristics out of adjacent emotion sentences, respectively establishing a context model and a traditional model according to characteristic category differences; dividing to-be-analyzed continuous voices into emotion sentence sequences with emotions relatively independent, adopting a fusion method based on an emotion interactive matrix to fuse decision results of current emotion sentences of the to-be-analyzed continuous voices through the context model and the traditional model, and obtaining initial recognition results; utilizing an emotion context inference rule to adjust the emotion category of every sentence from the angle of whole to-be-analyzed continuous voices, and obtaining an emotion category sequence of the to-be-analyzed continuous voices. An emotion context emotion inference algorithm is adopted, emotion states of the to-be-analyzed emotion sentences are analyzed and adjusted with the help of the emotion interactive matrix, and accordingly the accurate rate of recognizing continuous voice emotions is improved.

Description

Based on the contextual speech emotional inference method of emotion and system

Technical field

The present invention relates to voice signal processing, sentiment analysis and mode identification technology, relate in particular to a kind of based on the contextual speech emotional inference method of emotion and system.

Background technology

The development of speech emotional recognition technology has important effect to development and the application of the novel human-machine interaction technology that promotes intellectuality, hommization.The affective state that how to use computer technology automatically to identify speaker from voice is subject to each area research person's extensive concern in recent years.In speech emotional identification research, researchers start to pay close attention to gradually contextual information to improving the impact of emotion recognition accuracy rate.So-called context refers to the information such as personal information (comprising: sex, age, culture, language, schooling, talk background etc.) and the affective state of nearest a period of time of expressing relevant object and object to be analyzed self to object emotion to be analyzed.

Prior art one has been analyzed sex, subject matter, speaker, the effect of the linguistic context contextual informations such as content to emotion recognition of speaking, but analyze mainly for isolated, non-natural simple sentence, yet the emotional speech of continuous expression under physical environment is not described and is processed.Prior art two starts to pay close attention to contextual information entrained between word and surrounding environment, context environmental has been proposed, dynamic environment and sentence global context 3 classes totally 5 kinds of environmental characteristics, and prove by experiment contextual information to improving the contribution of emotion recognition accuracy rate, but the scheme that this piece of document proposes need to build a large amount of and abundant emotion lexicon, and require must identify speaker's the content of speaking before emotion recognition, the content aware accuracy rate of speaking can affect the accuracy rate of emotion recognition, and the identification of the content of speaking has increased the time complexity of emotion recognition.Prior art three also without identification speaker's the content of speaking, has been analyzed influencing each other of the two person-to-person affective states of talking with according to the acoustic feature of voice, has drawn dialogue both sides' transference matrix.

But, in prior art, the emotion recognition of continuous speech is just analyzed for each current sentence, in order to address the deficiencies of the prior art, therefore, the invention provides a kind of based on the contextual speech emotional inference method of emotion and system, mainly utilizing human emotion to express and changing is a continuous process, between the current affective state of object to be analyzed and the affective state that will express, there is certain associated feature, continuous speech for single speaker carries out emotion recognition, invent the extracting method of emotion contextual feature and based on the contextual speech emotional inference method of emotion, the invention solves without identification speaker and speak under the condition of content, improve the problem of continuous speech emotion recognition rate.

Summary of the invention

The present invention is directed to the emotion recognition of continuous speech in background technology just for each current defect of analyzing, provide a kind of based on the contextual speech emotional inference method of emotion and system, the extracting method of invention speech emotional contextual feature and set up efficiently based on the contextual speech emotional inference pattern of emotion, complete based on the contextual speech emotional inference method of emotion.The final accuracy rate that improves continuous speech emotion recognition.

To achieve these goals, the technical scheme that the embodiment of the present invention provides is as follows:

A kind of based on the contextual speech emotional inference method of emotion, described method comprises:

S1, in adjacent emotion statement, extract context speech emotional feature and traditional voice affective characteristics, set up respectively context model and conventional model by other difference of feature class;

S2, continuous speech to be analyzed is divided into the relatively independent emotion statement sequence of emotion, and extract context speech emotional feature and the traditional voice affective characteristics of described emotion statement. then adopt respectively context model and conventional model to identify, show that these two models analyze the decision vector of emotion statement to band;

The result of decision of S3, fusion method to context model and the conventional model current emotion statement to be analyzed continuous speech of employing based on affective interaction matrix merges, and obtains preliminary recognition result;

S4, from whole continuous speech angle to be analyzed, the emotion classification of each statement is adjusted by emotion Context Reasoning rule, obtained the emotion classification sequence of continuous speech to be analyzed.

As a further improvement on the present invention, described step S3 comprises:

In the time utilizing conventional model and context model to merge maximum two classes of emotion statement decision vector to be analyzed, introduce the existing affective interaction matrix counting, and carry out affective interaction matrix disposal, obtain the mutual matrix of emotion context, the mutual matrix of context carries out fusion reasoning together with two decision vectors to the emotion classification of emotion statement.

As a further improvement on the present invention, described step S4 comprises:

Emotion Context Reasoning rule utilizes people's emotional expression to have successional feature, according to the emotion classification of the adjacent statement in front and back, the emotion classification of current emotion statement is adjusted.

As a further improvement on the present invention, the adjacent emotion statement in described step S1 is rear 1/3 sound section of part of last of the adjacent emotion statement in front and back and the whole statement of a rear statement.

As a further improvement on the present invention, described context speech emotional feature comprises: the dynamic affective characteristics of context, context difference affective characteristics, the dynamic affective characteristics in context edge and context edge difference affective characteristics.

As a further improvement on the present invention, the dynamic affective characteristics of described context is rear 1/3 sound section of part and the 33 speech emotional behavioral characteristics tieed up relevant to rate of change, mean change and covariance in 101 dimension traditional voice affective characteristicses in latter one whole sound section of the last sentence of adjacent emotion statement.

As a further improvement on the present invention, described context difference affective characteristics is that rear 1/3 sound section and latter one whole sound section of the first last sentence to adjacent emotion statement extracts respectively 101 traditional dimension speech emotional features, and then the two is done to the feature obtaining after difference operation.

As a further improvement on the present invention, the dynamic affective characteristics in described context edge is the 33 dimension speech emotional behavioral characteristics that extract in the adjacent sentence in edge of 1/3 sound section of part and front 1/3 sound section of part composition of latter from the last sentence of adjacent emotion statement.

As a further improvement on the present invention, described below edge difference affective characteristics is the feature of extracting by context difference emotional characteristic extraction method in the adjacent sentence in edge.

Correspondingly, a kind of based on the contextual speech emotional inference system of emotion, described system comprises:

Training unit, for extracting context speech emotional feature and traditional voice affective characteristics at adjacent emotion statement, sets up respectively context model and conventional model by other difference of feature class;

Recognition unit, for continuous speech to be analyzed is divided into the relatively independent emotion statement sequence of emotion, extract respectively context speech emotional feature and the traditional voice affective characteristics of described statement, then adopt respectively the context model and the conventional model that train to carry out emotion recognition to current statement, draw the decision vector of current statement on two models;

Fusion recognition unit, merges for the result of decision of the current emotion statement to continuous speech to be analyzed by context model and conventional model, obtains preliminary recognition result;

Adjustment unit, for the emotion classification of each statement being adjusted by emotion Context Reasoning rule from whole continuous speech angle to be analyzed, obtains the emotion classification sequence of continuous speech to be analyzed.

The present invention has following beneficial effect:

1, successfully extract context speech emotional feature between continuous emotion statement, and assist with it traditional voice affective characteristics extracting from single emotion statement, thereby improve the emotion recognition efficiency of continuous speech;

2, utilize dexterously the affective interaction matrix of existing statistics, the affective state of the emotion statement to be identified based on context speech emotional feature and the emotion statement volume affective state to be identified based on traditional voice affective characteristics are carried out to emotion reasoning fusion, obtain the preliminary emotion recognition result to emotion statement to be identified;

3, utilize the emotion of continuous emotion statement to change the feature with stability, formulated emotion Context Reasoning rule whole continuous identification voice are carried out to context dependent adjustment.

Accompanying drawing explanation

Fig. 1 is based on the contextual speech emotional inference method of emotion frame diagram in an embodiment of the present invention;

Fig. 2 is based on the contextual emotion reasoning algorithm of emotion process flow diagram in an embodiment of the present invention.

Embodiment

Describe the present invention below with reference to each embodiment shown in the drawings.But these embodiments do not limit the present invention, the conversion in structure, method or function that those of ordinary skill in the art makes according to these embodiments is all included in protection scope of the present invention.

The invention discloses one based on the contextual speech emotional inference method of emotion, comprising:

Specifically comprise:

Step 1: the speech emotional model of cognition of training based on traditional voice affective characteristics.

Step 1.1: the emotional speech signal in training storehouse is carried out to pre-service, comprise pre-emphasis, windowing process, point frame, end-point detection.

Step 1.2: the emotion statement in training set is extracted to conventional traditional voice affective characteristics 101 and tie up, comprise acoustics and the prosodic features of the voice such as Mel-cepstrum coefficient, fundamental frequency, duration, intensity, amplitude, tonequality and resonance peak.

Step 1.3 adopts the character pair of neutral statement to be normalized to the feature of extracting, then adopt SFFS (Sequential Forward Floating Search) method to carry out feature selecting, after feature selecting, remaining 56 traditional voice affective characteristicses.

Step 1.4: adopt 56 dimension traditional voice affective characteristics training svm classifier devices of emotion statement in training set, obtain the speech emotional model of cognition based on traditional voice affective characteristics.

Step 2: the speech emotional model of cognition of training based on context speech emotional feature.

Step 2.1: the emotion statement through in the pretreated training set of step 1.1 is extracted to context speech emotional feature, comprise: the dynamic affective characteristics of context, context difference affective characteristics, the dynamic affective characteristics in context edge, context edge difference affective characteristics is totally 268 dimensions.

Step 2.2: the context speech emotional feature that step 2.1 is extracted adopts the character pair of neutral statement to be normalized, then adopt SFFS (Sequential Forward Floating Search) method to carry out feature selecting, remaining 91 context speech emotional features after feature selecting.

Step 2.3: adopt 91 dimension context speech emotional features training SVM (the Support Vector Machine support vector machine) sorters that in training set, emotion statement extracts, obtain the speech emotional model of cognition based on context speech emotional feature.

Step 3: the affective state of identifying emotion statement to be identified

Step 3.1: continuous emotional speech signal to be identified is carried out to pre-service, comprise pre-emphasis, windowing process, automatic segmentation, point frame and end-point detection.

Step 3.2: extract the 56 dimension traditional voice affective characteristicses that the process step 1.2 of emotional speech signal to be identified is selected.

Step 3.3: input step 1.4 trains the speech emotional model of cognition based on traditional voice affective characteristics obtaining to identify, spy to recognition result be expressed as TP.

Step 3.4: extract the 91 dimension context speech emotional features that the process step 2.2 of emotional speech signal to be identified is selected.

Step 3.5: input step 2.3 trains the speech emotional model of cognition based on context speech emotional feature obtaining to identify, spy to recognition result be expressed as CP.

Step 4: according to the recognition result CP of the recognition result TP of the speech emotional model of cognition based on traditional voice affective characteristics and the speech emotional model of cognition based on context speech emotional feature, adopt blending algorithm to merge the recognition result of two models, tentatively obtain the affiliated emotion classification of voice signal to be identified and the degree of confidence of this result.

Step 5: adopt based on the contextual inference rule of emotion, according to the affective state of statement before and after emotion statement to be analyzed in continuous speech, the affective state that emotion statement to be analyzed is embodied is adjusted, and obtains the final affiliated affective state of emotion statement to be analyzed.

Correspondingly, the invention also discloses one based on the contextual speech emotional inference system of emotion, comprising:

Recognition unit, for continuous speech to be analyzed is divided into the relatively independent emotion statement sequence of emotion, extract respectively context speech emotional feature and the traditional voice affective characteristics of these statements, then adopt respectively the context model and the conventional model that train to carry out emotion recognition to current statement, draw the decision vector of current statement on two models.

Fusion recognition unit, merges for the result of decision of the current emotion statement to continuous speech to be analyzed by context model and conventional model, obtains preliminary recognition result; Adjustment unit, for the emotion classification of each statement being adjusted by emotion Context Reasoning rule from whole continuous speech angle to be analyzed, obtains the emotion classification sequence of continuous speech to be analyzed.

Below in conjunction with the drawings and specific embodiments, the present invention is further elaborated:

As shown in Figure 1, in the embodiment of the invention based on the contextual emotion inference system of emotion block diagram, be mainly divided into four-stage: training stage, cognitive phase, fusion recognition stage and the emotion adjusting stage based on emotion Context Reasoning rule.

1, the training stage

Training stage is set up the speech emotional model of cognition based on traditional voice affective characteristics and the speech emotional model of cognition based on context speech emotional feature, is divided into three steps:

(1) emotional speech Signal Pretreatment.

This step is to adopt traditional voice signal preprocess method to carry out pre-service to emotional speech signal, comprises pre-emphasis, windowing process, point frame, end-point detection.

(2) extraction of traditional voice affective characteristics and the training of the speech emotional model of cognition based on traditional voice affective characteristics.

(2-1) current emotion statement is extracted and comprises acoustics and the prosodic features of the voice such as Mel-cepstrum coefficient, fundamental frequency, duration, intensity, amplitude, tonequality and resonance peak, and on emotion statement, extract respectively the statistical nature such as maximal value, minimum value and variation range of these features.The extracting method of these features does not belong to part of the present invention, does not therefore describe in detail.The specific features extracting is shown in Table 1.

The description of table 1 traditional voice affective characteristics

(2-2) adopt the feature of neutral emotion to be normalized (2-1) feature that step is extracted, then adopt SFFS to carry out feature selecting to 101 dimension traditional voice affective characteristicses, remaining 56 dimensions after feature selecting.

(2-3) adopt 56 dimension traditional voice affective characteristicses after selecting to train the speech emotional model of cognition based on traditional voice affective characteristics, the model of cognition in present embodiment adopts SVM.

(3) extraction of context speech emotional feature and the training of the speech emotional model of cognition based on context speech emotional feature.

(3-1) extract context speech emotional feature, comprise the dynamic affective characteristics of context, context difference affective characteristics, the dynamic affective characteristics in context edge, context edge difference affective characteristics.

(3-1-1) extraction of the dynamic affective characteristics of context: statistical nature totally 33 dimensional features such as maximal value, minimum value and the variation range to two continuous emotion statements extraction short-time energies, zero-crossing rate, Mel cepstrum coefficient (front 12 coefficients), fundamental frequency, tonequality, mute rate, first three resonance peak coefficient.Specific features is shown in Table 2.

The description of table 2 speech emotional behavioral characteristics

(3-1-2) extraction of context difference affective characteristics: two continuous emotion statements are extracted respectively to statistical nature totally 101 dimensional features such as maximal value, minimum value and the variation range of the features such as the short-time energy, zero-crossing rate, Mel cepstrum coefficient (front 12 coefficients), fundamental frequency, tonequality, mute rate, first three resonance peak coefficient of every statement and these affective characteristicses.And then obtain 101 dimension context difference affective characteristicses with the corresponding affective characteristics that the affective characteristics of a rear emotion statement deducts last sentence statement.

(3-1-3) extraction of the dynamic affective characteristics in context edge: latter 1/3 sound section that extracts from continuous two emotion statements last sentence starts to statistical nature 33 dimensional features such as maximal value, minimum value and the variation ranges of the short-time energy of one section of emotion statement of front 1/3 sound section of cut-off of rear a word, zero-crossing rate, Mel cepstrum coefficient (front 12 coefficients), fundamental frequency, tonequality, mute rate, first three resonance peak coefficient totally.

(3-1-4) extraction of context edge difference affective characteristics: statistical nature totally 101 dimensional features such as the feature such as short-time energy, zero-crossing rate, Mel cepstrum coefficient (front 12 coefficients), fundamental frequency, tonequality, mute rate, first three resonance peak coefficient and maximal value, minimum value and the variation range of these features that extract respectively front 1/3 sound section of two fragment of rear 1/3 sound section and the rear a word of last sentence in continuous two emotion statements.Obtain 101 dimension context edge difference affective characteristicses with the corresponding affective characteristics of latter 1/3 section that the 101 dimension affective characteristicses of first 1/3 section of rear a word deduct last sentence again.

(3-2) feature that adopts neutral emotion to (3-1-1), (3-1-2), (3-1-3) and (3-1-4) feature that step is extracted be normalized, then adopt SFFS to carry out feature selecting to 268 dimension context speech emotional features, remaining 91 dimensions after feature selecting.

(3-3) adopt 91 dimension context speech emotional features after selecting to train the speech emotional model of cognition based on context speech emotional feature, the model of cognition here adopts SVM.

2, cognitive phase

Cognitive phase be by emotion statement to be identified after characteristic of correspondence is extracted, by the model that the extracted feature input first stage trains, calculate the affective state recognition result of this emotion statement on each model, point three steps are implemented.

(1) adopt the segmentation method based on energy envelope and dwell interval to carry out segmentation to continuous speech to continuous emotional speech signal.

(2) the emotional speech signal after segmentation is carried out to pre-service, the method adopting is with (1) step of training stage.

(3) extraction of traditional voice feature and the identification of the speech emotional based on traditional voice affective characteristics in emotion statement to be identified.

(3-1) extract the 56 dimension traditional voice affective characteristicses after feature selecting of current statement in emotion statement to be identified, the method adopting is with (2-1) step of training stage.

(3-2) identify the affective state of current statement in emotion statement to be identified.

The speech emotional model of cognition based on traditional voice affective characteristics that in the statement to be identified that this stage (3-1) step is extracted, the traditional voice affective characteristics of current statement input first stage (2-3) step has trained, calculates the affective state that this emotion statement to be identified embodies.

(4) the context speech emotional model of cognition that extracts the context speech emotional feature of current statement in emotion statement to be identified and adopt the context speech emotional feature identification of extracting to train through training stage (3-3) step is identified the affective state that current statement comprises.

(3-1) extraction of the context speech emotional feature of emotion statement to be identified, the extracting method adopting is with (3-1) step of training stage, and this step is only extracted 91 dimension context speech emotional features remaining after feature selecting.

(3-2) the speech emotional model of cognition based on context speech emotional feature that 91 dimension context speech emotional feature input first stage (3-3) steps of the statement to be identified this stage (3-1) step being extracted have trained, draws the affective state that this emotion statement to be identified comprises.

3, the fusion recognition stage

Draw the affective state of the emotion statement to be identified based on traditional voice affective characteristics and the affective state of the emotion statement to be identified based on context speech emotional feature that step (3-2) draws according to step in cognitive phase (3-2), according to following fusion method, the recognition result of two model of cognition is merged, tentatively draw the final affective state that emotion statement to be identified comprises.

Fusion method:

If test sample book integrates as Test={ts ₁, ts ₂..., ts _n, the current emotion statement list that needs identification is shown ts _j, model is ts to the sample identified _jfinal identification emotion classification be expressed as PreLabel (ts _j).If E={e ₁, e ₂..., e _mfor there being the target emotion classification collection of m class emotion, matrix IM represents the mutual matrix of context, shown in (1):

IM = [\begin{matrix} {IP}_{(1, 1)}, & \cdot \cdot \cdot & {IP}_{(1, j)}, & \cdot \cdot \cdot & {IP}_{(1, m)} \\ {IP}_{(i, 1)}, & \cdot \cdot \cdot & {IP}_{(i, j)}, & \cdot \cdot \cdot & {IP}_{(i, m)} \\ {IP}_{(m, 1)}, & \cdot \cdot \cdot & {IP}_{(m, j)}, & \cdot \cdot \cdot & {IP}_{(m, m)} \end{matrix}] - - - (1)

Wherein, vectorial IM _i=(IP _{(i, 1)}..., IP _{(i, j)}..., IP _{(i, m))}for emotion context vector alternately, the emotion classification that is illustrated in last emotion statement is e _itime, current emotion statement belongs to the probability of all kinds of affective states.And element c _{i, j}the emotion classification that last sentence emotion statement is worked as in expression is e _itime, current emotion statement belongs to emotion classification e _jprobability.If TP represents emotion statement to be identified, output after the model of cognition identification based on traditional affective characteristics, by the probability vector of the descending sequence of probable value, is expressed as TP=(tp ₁, tp ₂, tp ₃..., tp _i..., tp _m), wherein, tp _iwhat represent that emotion statement to be identified draws after the model of cognition identification based on traditional voice affective characteristics belongs to emotion e _iprobability.CP represents that emotion statement to be identified output after the model of cognition identification based on context affective characteristics, by the probability vector of the descending sequence of probable value, is expressed as CP=(cp ₁, cp ₂, cp ₃..., cp _i..., cp _m), wherein, cp _iwhat represent that emotion statement to be identified draws after the model of cognition identification based on context speech emotional feature belongs to emotion e _iprobability.If reliability rating TrustLevel is (ts _j)={ A ₁, A ₂, A ₃representing the reliability rating to each the reasoning results, its reliability rating is divided into A ₁, A ₂and A ₃three Estate, degree of belief is A ₁>A ₂>A ₃.Fusion method is divided into 2 steps and implements.

(1) data are prepared.

This part is that the data fusion that is about to carry out is done data preparation, and data, except probability vector TP and CP, also need the emotion statement ts that needs identification according to current _jlast emotion statement ts _j-1emotion classification select the mutual vectorial IM of corresponding emotion context from the mutual matrix of emotion context _i.Wherein, the mutual matrix of emotion context is that emotion that root counts under session operational scenarios two people's dialogues from Chinese Typical Representative drama changes in mutual matrix and refines, and matrix is as shown in table 3 alternately.This affective interaction matrix has been added up 4000 sections of dialogues of up to a hundred hours, has covered man man, man are talked with woman female, female.

The mutual matrix of table 3

In table 3, mark the affective interaction rule of two people's dialogues of A and B.This table is divided into left and right two parts, is illustrated in the definite situation of a people's emotion classification, and distribution and the probability size of above-mentioned 6 kinds of emotions appears in another person.When wherein left-hand component is illustrated in the affective state of given B, the emotion probability distribution information of A; In like manner, when right-hand component is illustrated in the affective state of determining A, the emotion that B may occur distributes.The difference distributing in order to eliminate emotion that left and right both sides cause due to people's individual difference, is averaging the right distribution probability of emotion in two tables, obtains the mutual matrix shown in table 4, is defined as the mutual matrix of context.In the mutual matrix of context, the computing method of probability are suc as formula shown in (2), in formula, and AIP _{(I, j)}represent that working as A is emotion e _itime, B is emotion e _jprobability; BIP _{(I, j)}represent that working as B is emotion e _itime, A is emotion e _jprobability: IP _{(I, j)}representing to work as last sentence is emotion e _itime, latter one is emotion e _jprobability.

The mutual matrix of table 4 context

{IP}_{(i, j)} = \frac{{AIP}_{(i, j)} + {BIP}_{(i, j)}}{2} - - - (2)

(2) use based on the contextual emotion reasoning algorithm of emotion ready upper step data are carried out to emotion reasoning fusion, arthmetic statement is as follows:

Input: emotion statement ts to be identified _j;

Probability output vector T P based on traditional affective characteristics model of cognition;

Probability output vector CP based on context affective characteristics model of cognition;

Affective interaction vector IM _i.

Output: emotion statement ts _jemotion classification PreLabel (ts under interim _j).

The concrete reasoning process of algorithm is described below:

(1) if tp ₁with cp ₁emotion classification identical, the emotion classification PreLabel (ts of sample to be tested _j) be labeled as tp ₁(cp ₁) classification, reliability rating is A ₃, algorithm finishes;

(2) otherwise, tp ₁with cp ₁emotion classification difference, now come with maximum classification together decision-making by the second largest classification of probability.

If (2-1) tp ₂with cp ₂emotion classification identical, be divided into following 3 species:

If (2-1-1) IP _{(i, 1)}with tp ₂(cp ₂) emotion classification identical, the emotion classification PreLabel (ts of sample to be tested _j) be labeled as IP _{(i, 1)}classification, reliability rating is A ₂, algorithm finishes;

If (2-2-2) IP _{(i, 1)}with tp ₁(or cp ₁) emotion classification identical, the emotion classification PreLabel (ts of sample to be tested _j) be labeled as tp ₁(or cp ₁) classification, reliability rating is A ₂, algorithm finishes;

(2-2-3) otherwise, calculate element tp ₁and cp ₁degree of confidence, the emotion classification that degree of confidence the greater is corresponding is labeled as the emotion classification PreLabel (ts of sample to be tested _j), reliability rating is A ₁, algorithm finishes.

If (2-2) tp ₂with cp ₂emotion classification difference, be divided into following 2 kinds of conditions:

If 1. tp ₁with cp ₂emotion classification identical, (or tp ₂with cp ₁emotion classification identical), the emotion PreLabel (ts of sample to be tested _j) be labeled as tp ₁(or cp ₁) classification, reliability rating is A ₂, algorithm finishes;

2. otherwise, calculate element tp ₁, tp ₂, cp ₁and cp ₂four degree of confidence, the emotion classification that degree of confidence the maximum is corresponding is labeled as the emotion classification PreLabel (ts of sample to be tested _j), reliability rating is A ₁algorithm finishes.

Wherein, in algorithm, in the probability output vector T P of two models and TC, the degree of confidence of each element is designated as conf _{(I, j)}, represent emotion statement ts _ibelong to emotion classification e _jdegree of confidence, computing method are suc as formula shown in (4), this degree of confidence is made up of two parts, a part comes from the output vector self of model, is designated as Pconf _{(I, j)}, its computing method are suc as formula shown in (3); Another part comes from the mutual matrix of context.In formula, P _{(I, j)}represent the emotion statement ts of model of cognition output _ibelong to emotion classification e _jprobability.

{Pconf}_{(i, j)} = (p_{(i, j)} - \frac{1}{m - 1} {\underset{k = 1}{Σ}}_{k &NotEqual; j}^{m} p_{(i, k)}) - - - (3)

{conf}_{(i, j)} = c_{(i, j)} * {Pconf}_{(i, j)} = c_{(i, j)} * (p_{(i, j)} - \frac{1}{m - 1} {\underset{k = 1}{Σ}}_{k &NotEqual; j}^{m} p_{(i, k)}) - - - (4)

Algorithm is for the arbitrary judgment that adopted maximum probability emotion classification in the past, and increase time large probability emotion classification and auxiliary its of the mutual matrix of context are treated test sample book emotion classification and finally judged.Meanwhile, the different arbitration schemes of each rule are arranged to different reliability ratings, its structure as shown in Figure 2.

4, the emotion adjusting stage based on emotion Context Reasoning rule

To test set Test={ts ₁, ts ₂..., ts _nemotion statement each to be identified obtain preliminary court verdict by as above algorithm reasoning.Seldom have according to emotion the reliability rating that in the situation of sudden change and reasoning algorithm, different differentiation schemes arrange, the recognition result of the each emotion statement to test set Test carries out context dependent adjustment according to emotion Context Reasoning rule.

It is ts that emotion Context Reasoning rule makes the current sentence that needs adjustment _jsentence (j=2,3 ..., n), use ts _j-1and ts _j+1to ts _jthe auxiliary adjustment of result of sentence.Work as ts _j-1and ts _j+1emotion classification when not identical, do not process; No person, ts _jthe emotion classification three kinds of situations below of dividing adjust, wherein, Label (ts _j) expression emotion statement ts _jfinal emotion classification.

Rule 1 is worked as ts _jreliability rating be A ₁time, think ts _jself recognition result is credible, does not do any correction, that is:

TrustLevel ({ts}_{j}) = A_{1} &DoubleRightArrow; Label ({ts}_{j}) = PreLabel ({ts}_{j});

Rule 2 is worked as ts _jreliability rating be A ₂time, think ts _jrecognition result is suspicious.According to ts _j-1and ts _j+1result treatment, if ts _j-1and ts _j+1reliability rating be not A ₃, the two reliability rating is A ₁or A ₂, think ts _j-1and ts _j+1credible result, and to ts _jmodified result be the emotion classification identical with it; Otherwise, do not revise, that is:

Rule 3 is worked as ts _jreliability rating be A ₃time, think ts _jrecognition result is insincere, and revising its emotion classification is ts _j-1(ts _j+1) emotion classification, that is:

(TrustLevel ({ts}_{j}) = A_{3}) &DoubleRightArrow; Label ({ts}_{j}) = Label ({ts}_{j - 1})

This rule makes full use of the contextual information between statement, changes seldom sudden change consider from emotion, and the PRELIMINARY RESULTS of reasoning algorithm is adjusted again, obtains the final emotion classification of each emotion statement in Test test set.

The present invention has following beneficial effect:

The present invention breaks through the deficiency that existing speech-emotion recognition method is only analyzed from emotion simple sentence, change and have the characteristic that is mutually related from the emotion of adjacent emotion statement, adopt the contextual emotion reasoning algorithm of emotion, by affective interaction matrix, the affective state of emotion statement to be analyzed is analyzed and adjusted, thus the accuracy rate of raising continuous speech emotion recognition.

Compared with prior art, the present invention has following beneficial effect:

Be to be understood that, although this instructions is described according to embodiment, but be not that each embodiment only comprises an independently technical scheme, this narrating mode of instructions is only for clarity sake, those skilled in the art should make instructions as a whole, technical scheme in each embodiment also can, through appropriately combined, form other embodiments that it will be appreciated by those skilled in the art that.

Listed a series of detailed description is above only illustrating for feasibility embodiment of the present invention; they are not in order to limit the scope of the invention, all do not depart from the equivalent embodiment that skill spirit of the present invention does or change and all should be included in protection scope of the present invention within.

Claims

1. based on the contextual speech emotional inference method of emotion, it is characterized in that, described method comprises:

2. method according to claim 1, is characterized in that, described step S3 comprises:

3. method according to claim 1, is characterized in that, described step S4 comprises:

4. method according to claim 1, is characterized in that, the adjacent emotion statement in described step S1 is rear 1/3 sound section of part of last of the adjacent emotion statement in front and back and the whole statement of a rear statement.

5. method according to claim 4, is characterized in that, described context speech emotional feature comprises: the dynamic affective characteristics of context, context difference affective characteristics, the dynamic affective characteristics in context edge and context edge difference affective characteristics.

6. method according to claim 5, it is characterized in that, the dynamic affective characteristics of described context is rear 1/3 sound section of part and the 33 speech emotional behavioral characteristics tieed up relevant to rate of change, mean change and covariance in 101 dimension traditional voice affective characteristicses in latter one whole sound section of the last sentence of adjacent emotion statement.

7. method according to claim 5, it is characterized in that, described context difference affective characteristics is that rear 1/3 sound section and latter one whole sound section of the first last sentence to adjacent emotion statement extracts respectively 101 traditional dimension speech emotional features, and then the two is done to the feature obtaining after difference operation.

8. method according to claim 6, it is characterized in that, the dynamic affective characteristics in described context edge is the 33 dimension speech emotional behavioral characteristics that extract in the adjacent sentence in edge of 1/3 sound section of part and front 1/3 sound section of part composition of latter from the last sentence of adjacent emotion statement.

9. method according to claim 8, is characterized in that, described below edge difference affective characteristics is the feature of extracting by context difference emotional characteristic extraction method in the adjacent sentence in edge.

10. as claimed in claim 1ly it is characterized in that based on the contextual speech emotional inference system of emotion, described system comprises: