CN103810994A - Method and system for voice emotion inference on basis of emotion context - Google Patents

Method and system for voice emotion inference on basis of emotion context Download PDF

Info

Publication number
CN103810994A
CN103810994A CN201310401319.0A CN201310401319A CN103810994A CN 103810994 A CN103810994 A CN 103810994A CN 201310401319 A CN201310401319 A CN 201310401319A CN 103810994 A CN103810994 A CN 103810994A
Authority
CN
China
Prior art keywords
emotion
context
statement
analyzed
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310401319.0A
Other languages
Chinese (zh)
Other versions
CN103810994B (en
Inventor
毛启容
白李娟
王丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN201310401319.0A priority Critical patent/CN103810994B/en
Publication of CN103810994A publication Critical patent/CN103810994A/en
Application granted granted Critical
Publication of CN103810994B publication Critical patent/CN103810994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention discloses a method and system for voice emotion inference on basis of an emotion context. The method comprises the steps of extracting context voice emotion characteristics and traditional voice emotion characteristics out of adjacent emotion sentences, respectively establishing a context model and a traditional model according to characteristic category differences; dividing to-be-analyzed continuous voices into emotion sentence sequences with emotions relatively independent, adopting a fusion method based on an emotion interactive matrix to fuse decision results of current emotion sentences of the to-be-analyzed continuous voices through the context model and the traditional model, and obtaining initial recognition results; utilizing an emotion context inference rule to adjust the emotion category of every sentence from the angle of whole to-be-analyzed continuous voices, and obtaining an emotion category sequence of the to-be-analyzed continuous voices. An emotion context emotion inference algorithm is adopted, emotion states of the to-be-analyzed emotion sentences are analyzed and adjusted with the help of the emotion interactive matrix, and accordingly the accurate rate of recognizing continuous voice emotions is improved.

Description

Based on the contextual speech emotional inference method of emotion and system
Technical field
The present invention relates to voice signal processing, sentiment analysis and mode identification technology, relate in particular to a kind of based on the contextual speech emotional inference method of emotion and system.
Background technology
The development of speech emotional recognition technology has important effect to development and the application of the novel human-machine interaction technology that promotes intellectuality, hommization.The affective state that how to use computer technology automatically to identify speaker from voice is subject to each area research person's extensive concern in recent years.In speech emotional identification research, researchers start to pay close attention to gradually contextual information to improving the impact of emotion recognition accuracy rate.So-called context refers to the information such as personal information (comprising: sex, age, culture, language, schooling, talk background etc.) and the affective state of nearest a period of time of expressing relevant object and object to be analyzed self to object emotion to be analyzed.
Prior art one has been analyzed sex, subject matter, speaker, the effect of the linguistic context contextual informations such as content to emotion recognition of speaking, but analyze mainly for isolated, non-natural simple sentence, yet the emotional speech of continuous expression under physical environment is not described and is processed.Prior art two starts to pay close attention to contextual information entrained between word and surrounding environment, context environmental has been proposed, dynamic environment and sentence global context 3 classes totally 5 kinds of environmental characteristics, and prove by experiment contextual information to improving the contribution of emotion recognition accuracy rate, but the scheme that this piece of document proposes need to build a large amount of and abundant emotion lexicon, and require must identify speaker's the content of speaking before emotion recognition, the content aware accuracy rate of speaking can affect the accuracy rate of emotion recognition, and the identification of the content of speaking has increased the time complexity of emotion recognition.Prior art three also without identification speaker's the content of speaking, has been analyzed influencing each other of the two person-to-person affective states of talking with according to the acoustic feature of voice, has drawn dialogue both sides' transference matrix.
But, in prior art, the emotion recognition of continuous speech is just analyzed for each current sentence, in order to address the deficiencies of the prior art, therefore, the invention provides a kind of based on the contextual speech emotional inference method of emotion and system, mainly utilizing human emotion to express and changing is a continuous process, between the current affective state of object to be analyzed and the affective state that will express, there is certain associated feature, continuous speech for single speaker carries out emotion recognition, invent the extracting method of emotion contextual feature and based on the contextual speech emotional inference method of emotion, the invention solves without identification speaker and speak under the condition of content, improve the problem of continuous speech emotion recognition rate.
Summary of the invention
The present invention is directed to the emotion recognition of continuous speech in background technology just for each current defect of analyzing, provide a kind of based on the contextual speech emotional inference method of emotion and system, the extracting method of invention speech emotional contextual feature and set up efficiently based on the contextual speech emotional inference pattern of emotion, complete based on the contextual speech emotional inference method of emotion.The final accuracy rate that improves continuous speech emotion recognition.
To achieve these goals, the technical scheme that the embodiment of the present invention provides is as follows:
A kind of based on the contextual speech emotional inference method of emotion, described method comprises:
S1, in adjacent emotion statement, extract context speech emotional feature and traditional voice affective characteristics, set up respectively context model and conventional model by other difference of feature class;
S2, continuous speech to be analyzed is divided into the relatively independent emotion statement sequence of emotion, and extract context speech emotional feature and the traditional voice affective characteristics of described emotion statement. then adopt respectively context model and conventional model to identify, show that these two models analyze the decision vector of emotion statement to band;
The result of decision of S3, fusion method to context model and the conventional model current emotion statement to be analyzed continuous speech of employing based on affective interaction matrix merges, and obtains preliminary recognition result;
S4, from whole continuous speech angle to be analyzed, the emotion classification of each statement is adjusted by emotion Context Reasoning rule, obtained the emotion classification sequence of continuous speech to be analyzed.
As a further improvement on the present invention, described step S3 comprises:
In the time utilizing conventional model and context model to merge maximum two classes of emotion statement decision vector to be analyzed, introduce the existing affective interaction matrix counting, and carry out affective interaction matrix disposal, obtain the mutual matrix of emotion context, the mutual matrix of context carries out fusion reasoning together with two decision vectors to the emotion classification of emotion statement.
As a further improvement on the present invention, described step S4 comprises:
Emotion Context Reasoning rule utilizes people's emotional expression to have successional feature, according to the emotion classification of the adjacent statement in front and back, the emotion classification of current emotion statement is adjusted.
As a further improvement on the present invention, the adjacent emotion statement in described step S1 is rear 1/3 sound section of part of last of the adjacent emotion statement in front and back and the whole statement of a rear statement.
As a further improvement on the present invention, described context speech emotional feature comprises: the dynamic affective characteristics of context, context difference affective characteristics, the dynamic affective characteristics in context edge and context edge difference affective characteristics.
As a further improvement on the present invention, the dynamic affective characteristics of described context is rear 1/3 sound section of part and the 33 speech emotional behavioral characteristics tieed up relevant to rate of change, mean change and covariance in 101 dimension traditional voice affective characteristicses in latter one whole sound section of the last sentence of adjacent emotion statement.
As a further improvement on the present invention, described context difference affective characteristics is that rear 1/3 sound section and latter one whole sound section of the first last sentence to adjacent emotion statement extracts respectively 101 traditional dimension speech emotional features, and then the two is done to the feature obtaining after difference operation.
As a further improvement on the present invention, the dynamic affective characteristics in described context edge is the 33 dimension speech emotional behavioral characteristics that extract in the adjacent sentence in edge of 1/3 sound section of part and front 1/3 sound section of part composition of latter from the last sentence of adjacent emotion statement.
As a further improvement on the present invention, described below edge difference affective characteristics is the feature of extracting by context difference emotional characteristic extraction method in the adjacent sentence in edge.
Correspondingly, a kind of based on the contextual speech emotional inference system of emotion, described system comprises:
Training unit, for extracting context speech emotional feature and traditional voice affective characteristics at adjacent emotion statement, sets up respectively context model and conventional model by other difference of feature class;
Recognition unit, for continuous speech to be analyzed is divided into the relatively independent emotion statement sequence of emotion, extract respectively context speech emotional feature and the traditional voice affective characteristics of described statement, then adopt respectively the context model and the conventional model that train to carry out emotion recognition to current statement, draw the decision vector of current statement on two models;
Fusion recognition unit, merges for the result of decision of the current emotion statement to continuous speech to be analyzed by context model and conventional model, obtains preliminary recognition result;
Adjustment unit, for the emotion classification of each statement being adjusted by emotion Context Reasoning rule from whole continuous speech angle to be analyzed, obtains the emotion classification sequence of continuous speech to be analyzed.
The present invention has following beneficial effect:
1, successfully extract context speech emotional feature between continuous emotion statement, and assist with it traditional voice affective characteristics extracting from single emotion statement, thereby improve the emotion recognition efficiency of continuous speech;
2, utilize dexterously the affective interaction matrix of existing statistics, the affective state of the emotion statement to be identified based on context speech emotional feature and the emotion statement volume affective state to be identified based on traditional voice affective characteristics are carried out to emotion reasoning fusion, obtain the preliminary emotion recognition result to emotion statement to be identified;
3, utilize the emotion of continuous emotion statement to change the feature with stability, formulated emotion Context Reasoning rule whole continuous identification voice are carried out to context dependent adjustment.
Accompanying drawing explanation
Fig. 1 is based on the contextual speech emotional inference method of emotion frame diagram in an embodiment of the present invention;
Fig. 2 is based on the contextual emotion reasoning algorithm of emotion process flow diagram in an embodiment of the present invention.
Embodiment
Describe the present invention below with reference to each embodiment shown in the drawings.But these embodiments do not limit the present invention, the conversion in structure, method or function that those of ordinary skill in the art makes according to these embodiments is all included in protection scope of the present invention.
The invention discloses one based on the contextual speech emotional inference method of emotion, comprising:
S1, in adjacent emotion statement, extract context speech emotional feature and traditional voice affective characteristics, set up respectively context model and conventional model by other difference of feature class;
S2, continuous speech to be analyzed is divided into the relatively independent emotion statement sequence of emotion, and extract context speech emotional feature and the traditional voice affective characteristics of described emotion statement. then adopt respectively context model and conventional model to identify, show that these two models analyze the decision vector of emotion statement to band;
The result of decision of S3, fusion method to context model and the conventional model current emotion statement to be analyzed continuous speech of employing based on affective interaction matrix merges, and obtains preliminary recognition result;
S4, from whole continuous speech angle to be analyzed, the emotion classification of each statement is adjusted by emotion Context Reasoning rule, obtained the emotion classification sequence of continuous speech to be analyzed.
Specifically comprise:
Step 1: the speech emotional model of cognition of training based on traditional voice affective characteristics.
Step 1.1: the emotional speech signal in training storehouse is carried out to pre-service, comprise pre-emphasis, windowing process, point frame, end-point detection.
Step 1.2: the emotion statement in training set is extracted to conventional traditional voice affective characteristics 101 and tie up, comprise acoustics and the prosodic features of the voice such as Mel-cepstrum coefficient, fundamental frequency, duration, intensity, amplitude, tonequality and resonance peak.
Step 1.3 adopts the character pair of neutral statement to be normalized to the feature of extracting, then adopt SFFS (Sequential Forward Floating Search) method to carry out feature selecting, after feature selecting, remaining 56 traditional voice affective characteristicses.
Step 1.4: adopt 56 dimension traditional voice affective characteristics training svm classifier devices of emotion statement in training set, obtain the speech emotional model of cognition based on traditional voice affective characteristics.
Step 2: the speech emotional model of cognition of training based on context speech emotional feature.
Step 2.1: the emotion statement through in the pretreated training set of step 1.1 is extracted to context speech emotional feature, comprise: the dynamic affective characteristics of context, context difference affective characteristics, the dynamic affective characteristics in context edge, context edge difference affective characteristics is totally 268 dimensions.
Step 2.2: the context speech emotional feature that step 2.1 is extracted adopts the character pair of neutral statement to be normalized, then adopt SFFS (Sequential Forward Floating Search) method to carry out feature selecting, remaining 91 context speech emotional features after feature selecting.
Step 2.3: adopt 91 dimension context speech emotional features training SVM (the Support Vector Machine support vector machine) sorters that in training set, emotion statement extracts, obtain the speech emotional model of cognition based on context speech emotional feature.
Step 3: the affective state of identifying emotion statement to be identified
Step 3.1: continuous emotional speech signal to be identified is carried out to pre-service, comprise pre-emphasis, windowing process, automatic segmentation, point frame and end-point detection.
Step 3.2: extract the 56 dimension traditional voice affective characteristicses that the process step 1.2 of emotional speech signal to be identified is selected.
Step 3.3: input step 1.4 trains the speech emotional model of cognition based on traditional voice affective characteristics obtaining to identify, spy to recognition result be expressed as TP.
Step 3.4: extract the 91 dimension context speech emotional features that the process step 2.2 of emotional speech signal to be identified is selected.
Step 3.5: input step 2.3 trains the speech emotional model of cognition based on context speech emotional feature obtaining to identify, spy to recognition result be expressed as CP.
Step 4: according to the recognition result CP of the recognition result TP of the speech emotional model of cognition based on traditional voice affective characteristics and the speech emotional model of cognition based on context speech emotional feature, adopt blending algorithm to merge the recognition result of two models, tentatively obtain the affiliated emotion classification of voice signal to be identified and the degree of confidence of this result.
Step 5: adopt based on the contextual inference rule of emotion, according to the affective state of statement before and after emotion statement to be analyzed in continuous speech, the affective state that emotion statement to be analyzed is embodied is adjusted, and obtains the final affiliated affective state of emotion statement to be analyzed.
Correspondingly, the invention also discloses one based on the contextual speech emotional inference system of emotion, comprising:
Training unit, for extracting context speech emotional feature and traditional voice affective characteristics at adjacent emotion statement, sets up respectively context model and conventional model by other difference of feature class;
Recognition unit, for continuous speech to be analyzed is divided into the relatively independent emotion statement sequence of emotion, extract respectively context speech emotional feature and the traditional voice affective characteristics of these statements, then adopt respectively the context model and the conventional model that train to carry out emotion recognition to current statement, draw the decision vector of current statement on two models.
Fusion recognition unit, merges for the result of decision of the current emotion statement to continuous speech to be analyzed by context model and conventional model, obtains preliminary recognition result; Adjustment unit, for the emotion classification of each statement being adjusted by emotion Context Reasoning rule from whole continuous speech angle to be analyzed, obtains the emotion classification sequence of continuous speech to be analyzed.
Below in conjunction with the drawings and specific embodiments, the present invention is further elaborated:
As shown in Figure 1, in the embodiment of the invention based on the contextual emotion inference system of emotion block diagram, be mainly divided into four-stage: training stage, cognitive phase, fusion recognition stage and the emotion adjusting stage based on emotion Context Reasoning rule.
1, the training stage
Training stage is set up the speech emotional model of cognition based on traditional voice affective characteristics and the speech emotional model of cognition based on context speech emotional feature, is divided into three steps:
(1) emotional speech Signal Pretreatment.
This step is to adopt traditional voice signal preprocess method to carry out pre-service to emotional speech signal, comprises pre-emphasis, windowing process, point frame, end-point detection.
(2) extraction of traditional voice affective characteristics and the training of the speech emotional model of cognition based on traditional voice affective characteristics.
(2-1) current emotion statement is extracted and comprises acoustics and the prosodic features of the voice such as Mel-cepstrum coefficient, fundamental frequency, duration, intensity, amplitude, tonequality and resonance peak, and on emotion statement, extract respectively the statistical nature such as maximal value, minimum value and variation range of these features.The extracting method of these features does not belong to part of the present invention, does not therefore describe in detail.The specific features extracting is shown in Table 1.
The description of table 1 traditional voice affective characteristics
Figure BDA0000377482240000061
(2-2) adopt the feature of neutral emotion to be normalized (2-1) feature that step is extracted, then adopt SFFS to carry out feature selecting to 101 dimension traditional voice affective characteristicses, remaining 56 dimensions after feature selecting.
(2-3) adopt 56 dimension traditional voice affective characteristicses after selecting to train the speech emotional model of cognition based on traditional voice affective characteristics, the model of cognition in present embodiment adopts SVM.
(3) extraction of context speech emotional feature and the training of the speech emotional model of cognition based on context speech emotional feature.
(3-1) extract context speech emotional feature, comprise the dynamic affective characteristics of context, context difference affective characteristics, the dynamic affective characteristics in context edge, context edge difference affective characteristics.
(3-1-1) extraction of the dynamic affective characteristics of context: statistical nature totally 33 dimensional features such as maximal value, minimum value and the variation range to two continuous emotion statements extraction short-time energies, zero-crossing rate, Mel cepstrum coefficient (front 12 coefficients), fundamental frequency, tonequality, mute rate, first three resonance peak coefficient.Specific features is shown in Table 2.
The description of table 2 speech emotional behavioral characteristics
Figure BDA0000377482240000071
(3-1-2) extraction of context difference affective characteristics: two continuous emotion statements are extracted respectively to statistical nature totally 101 dimensional features such as maximal value, minimum value and the variation range of the features such as the short-time energy, zero-crossing rate, Mel cepstrum coefficient (front 12 coefficients), fundamental frequency, tonequality, mute rate, first three resonance peak coefficient of every statement and these affective characteristicses.And then obtain 101 dimension context difference affective characteristicses with the corresponding affective characteristics that the affective characteristics of a rear emotion statement deducts last sentence statement.
(3-1-3) extraction of the dynamic affective characteristics in context edge: latter 1/3 sound section that extracts from continuous two emotion statements last sentence starts to statistical nature 33 dimensional features such as maximal value, minimum value and the variation ranges of the short-time energy of one section of emotion statement of front 1/3 sound section of cut-off of rear a word, zero-crossing rate, Mel cepstrum coefficient (front 12 coefficients), fundamental frequency, tonequality, mute rate, first three resonance peak coefficient totally.
(3-1-4) extraction of context edge difference affective characteristics: statistical nature totally 101 dimensional features such as the feature such as short-time energy, zero-crossing rate, Mel cepstrum coefficient (front 12 coefficients), fundamental frequency, tonequality, mute rate, first three resonance peak coefficient and maximal value, minimum value and the variation range of these features that extract respectively front 1/3 sound section of two fragment of rear 1/3 sound section and the rear a word of last sentence in continuous two emotion statements.Obtain 101 dimension context edge difference affective characteristicses with the corresponding affective characteristics of latter 1/3 section that the 101 dimension affective characteristicses of first 1/3 section of rear a word deduct last sentence again.
(3-2) feature that adopts neutral emotion to (3-1-1), (3-1-2), (3-1-3) and (3-1-4) feature that step is extracted be normalized, then adopt SFFS to carry out feature selecting to 268 dimension context speech emotional features, remaining 91 dimensions after feature selecting.
(3-3) adopt 91 dimension context speech emotional features after selecting to train the speech emotional model of cognition based on context speech emotional feature, the model of cognition here adopts SVM.
2, cognitive phase
Cognitive phase be by emotion statement to be identified after characteristic of correspondence is extracted, by the model that the extracted feature input first stage trains, calculate the affective state recognition result of this emotion statement on each model, point three steps are implemented.
(1) adopt the segmentation method based on energy envelope and dwell interval to carry out segmentation to continuous speech to continuous emotional speech signal.
(2) the emotional speech signal after segmentation is carried out to pre-service, the method adopting is with (1) step of training stage.
(3) extraction of traditional voice feature and the identification of the speech emotional based on traditional voice affective characteristics in emotion statement to be identified.
(3-1) extract the 56 dimension traditional voice affective characteristicses after feature selecting of current statement in emotion statement to be identified, the method adopting is with (2-1) step of training stage.
(3-2) identify the affective state of current statement in emotion statement to be identified.
The speech emotional model of cognition based on traditional voice affective characteristics that in the statement to be identified that this stage (3-1) step is extracted, the traditional voice affective characteristics of current statement input first stage (2-3) step has trained, calculates the affective state that this emotion statement to be identified embodies.
(4) the context speech emotional model of cognition that extracts the context speech emotional feature of current statement in emotion statement to be identified and adopt the context speech emotional feature identification of extracting to train through training stage (3-3) step is identified the affective state that current statement comprises.
(3-1) extraction of the context speech emotional feature of emotion statement to be identified, the extracting method adopting is with (3-1) step of training stage, and this step is only extracted 91 dimension context speech emotional features remaining after feature selecting.
(3-2) the speech emotional model of cognition based on context speech emotional feature that 91 dimension context speech emotional feature input first stage (3-3) steps of the statement to be identified this stage (3-1) step being extracted have trained, draws the affective state that this emotion statement to be identified comprises.
3, the fusion recognition stage
Draw the affective state of the emotion statement to be identified based on traditional voice affective characteristics and the affective state of the emotion statement to be identified based on context speech emotional feature that step (3-2) draws according to step in cognitive phase (3-2), according to following fusion method, the recognition result of two model of cognition is merged, tentatively draw the final affective state that emotion statement to be identified comprises.
Fusion method:
If test sample book integrates as Test={ts 1, ts 2..., ts n, the current emotion statement list that needs identification is shown ts j, model is ts to the sample identified jfinal identification emotion classification be expressed as PreLabel (ts j).If E={e 1, e 2..., e mfor there being the target emotion classification collection of m class emotion, matrix IM represents the mutual matrix of context, shown in (1):
IM = IP ( 1 , 1 ) , · · · IP ( 1 , j ) , · · · IP ( 1 , m ) IP ( i , 1 ) , · · · IP ( i , j ) , · · · IP ( i , m ) IP ( m , 1 ) , · · · IP ( m , j ) , · · · IP ( m , m ) - - - ( 1 )
Wherein, vectorial IM i=(IP (i, 1)..., IP (i, j)..., IP (i, m))for emotion context vector alternately, the emotion classification that is illustrated in last emotion statement is e itime, current emotion statement belongs to the probability of all kinds of affective states.And element c i, jthe emotion classification that last sentence emotion statement is worked as in expression is e itime, current emotion statement belongs to emotion classification e jprobability.If TP represents emotion statement to be identified, output after the model of cognition identification based on traditional affective characteristics, by the probability vector of the descending sequence of probable value, is expressed as TP=(tp 1, tp 2, tp 3..., tp i..., tp m), wherein, tp iwhat represent that emotion statement to be identified draws after the model of cognition identification based on traditional voice affective characteristics belongs to emotion e iprobability.CP represents that emotion statement to be identified output after the model of cognition identification based on context affective characteristics, by the probability vector of the descending sequence of probable value, is expressed as CP=(cp 1, cp 2, cp 3..., cp i..., cp m), wherein, cp iwhat represent that emotion statement to be identified draws after the model of cognition identification based on context speech emotional feature belongs to emotion e iprobability.If reliability rating TrustLevel is (ts j)={ A 1, A 2, A 3representing the reliability rating to each the reasoning results, its reliability rating is divided into A 1, A 2and A 3three Estate, degree of belief is A 1>A 2>A 3.Fusion method is divided into 2 steps and implements.
(1) data are prepared.
This part is that the data fusion that is about to carry out is done data preparation, and data, except probability vector TP and CP, also need the emotion statement ts that needs identification according to current jlast emotion statement ts j-1emotion classification select the mutual vectorial IM of corresponding emotion context from the mutual matrix of emotion context i.Wherein, the mutual matrix of emotion context is that emotion that root counts under session operational scenarios two people's dialogues from Chinese Typical Representative drama changes in mutual matrix and refines, and matrix is as shown in table 3 alternately.This affective interaction matrix has been added up 4000 sections of dialogues of up to a hundred hours, has covered man man, man are talked with woman female, female.
The mutual matrix of table 3
Figure BDA0000377482240000091
In table 3, mark the affective interaction rule of two people's dialogues of A and B.This table is divided into left and right two parts, is illustrated in the definite situation of a people's emotion classification, and distribution and the probability size of above-mentioned 6 kinds of emotions appears in another person.When wherein left-hand component is illustrated in the affective state of given B, the emotion probability distribution information of A; In like manner, when right-hand component is illustrated in the affective state of determining A, the emotion that B may occur distributes.The difference distributing in order to eliminate emotion that left and right both sides cause due to people's individual difference, is averaging the right distribution probability of emotion in two tables, obtains the mutual matrix shown in table 4, is defined as the mutual matrix of context.In the mutual matrix of context, the computing method of probability are suc as formula shown in (2), in formula, and AIP (I, j)represent that working as A is emotion e itime, B is emotion e jprobability; BIP (I, j)represent that working as B is emotion e itime, A is emotion e jprobability: IP (I, j)representing to work as last sentence is emotion e itime, latter one is emotion e jprobability.
The mutual matrix of table 4 context
Figure BDA0000377482240000101
IP ( i , j ) = AIP ( i , j ) + BIP ( i , j ) 2 - - - ( 2 )
(2) use based on the contextual emotion reasoning algorithm of emotion ready upper step data are carried out to emotion reasoning fusion, arthmetic statement is as follows:
Input: emotion statement ts to be identified j;
Probability output vector T P based on traditional affective characteristics model of cognition;
Probability output vector CP based on context affective characteristics model of cognition;
Affective interaction vector IM i.
Output: emotion statement ts jemotion classification PreLabel (ts under interim j).
The concrete reasoning process of algorithm is described below:
(1) if tp 1with cp 1emotion classification identical, the emotion classification PreLabel (ts of sample to be tested j) be labeled as tp 1(cp 1) classification, reliability rating is A 3, algorithm finishes;
(2) otherwise, tp 1with cp 1emotion classification difference, now come with maximum classification together decision-making by the second largest classification of probability.
If (2-1) tp 2with cp 2emotion classification identical, be divided into following 3 species:
If (2-1-1) IP (i, 1)with tp 2(cp 2) emotion classification identical, the emotion classification PreLabel (ts of sample to be tested j) be labeled as IP (i, 1)classification, reliability rating is A 2, algorithm finishes;
If (2-2-2) IP (i, 1)with tp 1(or cp 1) emotion classification identical, the emotion classification PreLabel (ts of sample to be tested j) be labeled as tp 1(or cp 1) classification, reliability rating is A 2, algorithm finishes;
(2-2-3) otherwise, calculate element tp 1and cp 1degree of confidence, the emotion classification that degree of confidence the greater is corresponding is labeled as the emotion classification PreLabel (ts of sample to be tested j), reliability rating is A 1, algorithm finishes.
If (2-2) tp 2with cp 2emotion classification difference, be divided into following 2 kinds of conditions:
If 1. tp 1with cp 2emotion classification identical, (or tp 2with cp 1emotion classification identical), the emotion PreLabel (ts of sample to be tested j) be labeled as tp 1(or cp 1) classification, reliability rating is A 2, algorithm finishes;
2. otherwise, calculate element tp 1, tp 2, cp 1and cp 2four degree of confidence, the emotion classification that degree of confidence the maximum is corresponding is labeled as the emotion classification PreLabel (ts of sample to be tested j), reliability rating is A 1algorithm finishes.
Wherein, in algorithm, in the probability output vector T P of two models and TC, the degree of confidence of each element is designated as conf (I, j), represent emotion statement ts ibelong to emotion classification e jdegree of confidence, computing method are suc as formula shown in (4), this degree of confidence is made up of two parts, a part comes from the output vector self of model, is designated as Pconf (I, j), its computing method are suc as formula shown in (3); Another part comes from the mutual matrix of context.In formula, P (I, j)represent the emotion statement ts of model of cognition output ibelong to emotion classification e jprobability.
Pconf ( i , j ) = ( p ( i , j ) - 1 m - 1 Σ k = 1 k ≠ j m p ( i , k ) ) - - - ( 3 )
conf ( i , j ) = c ( i , j ) * Pconf ( i , j ) = c ( i , j ) * ( p ( i , j ) - 1 m - 1 Σ k = 1 k ≠ j m p ( i , k ) ) - - - ( 4 )
Algorithm is for the arbitrary judgment that adopted maximum probability emotion classification in the past, and increase time large probability emotion classification and auxiliary its of the mutual matrix of context are treated test sample book emotion classification and finally judged.Meanwhile, the different arbitration schemes of each rule are arranged to different reliability ratings, its structure as shown in Figure 2.
4, the emotion adjusting stage based on emotion Context Reasoning rule
To test set Test={ts 1, ts 2..., ts nemotion statement each to be identified obtain preliminary court verdict by as above algorithm reasoning.Seldom have according to emotion the reliability rating that in the situation of sudden change and reasoning algorithm, different differentiation schemes arrange, the recognition result of the each emotion statement to test set Test carries out context dependent adjustment according to emotion Context Reasoning rule.
It is ts that emotion Context Reasoning rule makes the current sentence that needs adjustment jsentence (j=2,3 ..., n), use ts j-1and ts j+1to ts jthe auxiliary adjustment of result of sentence.Work as ts j-1and ts j+1emotion classification when not identical, do not process; No person, ts jthe emotion classification three kinds of situations below of dividing adjust, wherein, Label (ts j) expression emotion statement ts jfinal emotion classification.
Rule 1 is worked as ts jreliability rating be A 1time, think ts jself recognition result is credible, does not do any correction, that is:
TrustLevel ( ts j ) = A 1 ⇒ Label ( ts j ) = PreLabel ( ts j ) ;
Rule 2 is worked as ts jreliability rating be A 2time, think ts jrecognition result is suspicious.According to ts j-1and ts j+1result treatment, if ts j-1and ts j+1reliability rating be not A 3, the two reliability rating is A 1or A 2, think ts j-1and ts j+1credible result, and to ts jmodified result be the emotion classification identical with it; Otherwise, do not revise, that is:
Figure BDA0000377482240000122
Rule 3 is worked as ts jreliability rating be A 3time, think ts jrecognition result is insincere, and revising its emotion classification is ts j-1(ts j+1) emotion classification, that is:
( TrustLevel ( ts j ) = A 3 ) ⇒ Label ( ts j ) = Label ( ts j - 1 )
This rule makes full use of the contextual information between statement, changes seldom sudden change consider from emotion, and the PRELIMINARY RESULTS of reasoning algorithm is adjusted again, obtains the final emotion classification of each emotion statement in Test test set.
The present invention has following beneficial effect:
1, successfully extract context speech emotional feature between continuous emotion statement, and assist with it traditional voice affective characteristics extracting from single emotion statement, thereby improve the emotion recognition efficiency of continuous speech;
2, utilize dexterously the affective interaction matrix of existing statistics, the affective state of the emotion statement to be identified based on context speech emotional feature and the emotion statement volume affective state to be identified based on traditional voice affective characteristics are carried out to emotion reasoning fusion, obtain the preliminary emotion recognition result to emotion statement to be identified;
3, utilize the emotion of continuous emotion statement to change the feature with stability, formulated emotion Context Reasoning rule whole continuous identification voice are carried out to context dependent adjustment.
The present invention breaks through the deficiency that existing speech-emotion recognition method is only analyzed from emotion simple sentence, change and have the characteristic that is mutually related from the emotion of adjacent emotion statement, adopt the contextual emotion reasoning algorithm of emotion, by affective interaction matrix, the affective state of emotion statement to be analyzed is analyzed and adjusted, thus the accuracy rate of raising continuous speech emotion recognition.
Compared with prior art, the present invention has following beneficial effect:
1, successfully extract context speech emotional feature between continuous emotion statement, and assist with it traditional voice affective characteristics extracting from single emotion statement, thereby improve the emotion recognition efficiency of continuous speech;
2, utilize dexterously the affective interaction matrix of existing statistics, the affective state of the emotion statement to be identified based on context speech emotional feature and the emotion statement volume affective state to be identified based on traditional voice affective characteristics are carried out to emotion reasoning fusion, obtain the preliminary emotion recognition result to emotion statement to be identified;
3, utilize the emotion of continuous emotion statement to change the feature with stability, formulated emotion Context Reasoning rule whole continuous identification voice are carried out to context dependent adjustment.
Be to be understood that, although this instructions is described according to embodiment, but be not that each embodiment only comprises an independently technical scheme, this narrating mode of instructions is only for clarity sake, those skilled in the art should make instructions as a whole, technical scheme in each embodiment also can, through appropriately combined, form other embodiments that it will be appreciated by those skilled in the art that.
Listed a series of detailed description is above only illustrating for feasibility embodiment of the present invention; they are not in order to limit the scope of the invention, all do not depart from the equivalent embodiment that skill spirit of the present invention does or change and all should be included in protection scope of the present invention within.

Claims (10)

1. based on the contextual speech emotional inference method of emotion, it is characterized in that, described method comprises:
S1, in adjacent emotion statement, extract context speech emotional feature and traditional voice affective characteristics, set up respectively context model and conventional model by other difference of feature class;
S2, continuous speech to be analyzed is divided into the relatively independent emotion statement sequence of emotion, and extract context speech emotional feature and the traditional voice affective characteristics of described emotion statement. then adopt respectively context model and conventional model to identify, show that these two models analyze the decision vector of emotion statement to band;
The result of decision of S3, fusion method to context model and the conventional model current emotion statement to be analyzed continuous speech of employing based on affective interaction matrix merges, and obtains preliminary recognition result;
S4, from whole continuous speech angle to be analyzed, the emotion classification of each statement is adjusted by emotion Context Reasoning rule, obtained the emotion classification sequence of continuous speech to be analyzed.
2. method according to claim 1, is characterized in that, described step S3 comprises:
In the time utilizing conventional model and context model to merge maximum two classes of emotion statement decision vector to be analyzed, introduce the existing affective interaction matrix counting, and carry out affective interaction matrix disposal, obtain the mutual matrix of emotion context, the mutual matrix of context carries out fusion reasoning together with two decision vectors to the emotion classification of emotion statement.
3. method according to claim 1, is characterized in that, described step S4 comprises:
Emotion Context Reasoning rule utilizes people's emotional expression to have successional feature, according to the emotion classification of the adjacent statement in front and back, the emotion classification of current emotion statement is adjusted.
4. method according to claim 1, is characterized in that, the adjacent emotion statement in described step S1 is rear 1/3 sound section of part of last of the adjacent emotion statement in front and back and the whole statement of a rear statement.
5. method according to claim 4, is characterized in that, described context speech emotional feature comprises: the dynamic affective characteristics of context, context difference affective characteristics, the dynamic affective characteristics in context edge and context edge difference affective characteristics.
6. method according to claim 5, it is characterized in that, the dynamic affective characteristics of described context is rear 1/3 sound section of part and the 33 speech emotional behavioral characteristics tieed up relevant to rate of change, mean change and covariance in 101 dimension traditional voice affective characteristicses in latter one whole sound section of the last sentence of adjacent emotion statement.
7. method according to claim 5, it is characterized in that, described context difference affective characteristics is that rear 1/3 sound section and latter one whole sound section of the first last sentence to adjacent emotion statement extracts respectively 101 traditional dimension speech emotional features, and then the two is done to the feature obtaining after difference operation.
8. method according to claim 6, it is characterized in that, the dynamic affective characteristics in described context edge is the 33 dimension speech emotional behavioral characteristics that extract in the adjacent sentence in edge of 1/3 sound section of part and front 1/3 sound section of part composition of latter from the last sentence of adjacent emotion statement.
9. method according to claim 8, is characterized in that, described below edge difference affective characteristics is the feature of extracting by context difference emotional characteristic extraction method in the adjacent sentence in edge.
10. as claimed in claim 1ly it is characterized in that based on the contextual speech emotional inference system of emotion, described system comprises:
Training unit, for extracting context speech emotional feature and traditional voice affective characteristics at adjacent emotion statement, sets up respectively context model and conventional model by other difference of feature class;
Recognition unit, for continuous speech to be analyzed is divided into the relatively independent emotion statement sequence of emotion, extract respectively context speech emotional feature and the traditional voice affective characteristics of described statement, then adopt respectively the context model and the conventional model that train to carry out emotion recognition to current statement, draw the decision vector of current statement on two models;
Fusion recognition unit, merges for the result of decision of the current emotion statement to continuous speech to be analyzed by context model and conventional model, obtains preliminary recognition result;
Adjustment unit, for the emotion classification of each statement being adjusted by emotion Context Reasoning rule from whole continuous speech angle to be analyzed, obtains the emotion classification sequence of continuous speech to be analyzed.
CN201310401319.0A 2013-09-05 2013-09-05 Speech emotional inference method based on emotion context and system Active CN103810994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310401319.0A CN103810994B (en) 2013-09-05 2013-09-05 Speech emotional inference method based on emotion context and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310401319.0A CN103810994B (en) 2013-09-05 2013-09-05 Speech emotional inference method based on emotion context and system

Publications (2)

Publication Number Publication Date
CN103810994A true CN103810994A (en) 2014-05-21
CN103810994B CN103810994B (en) 2016-09-14

Family

ID=50707674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310401319.0A Active CN103810994B (en) 2013-09-05 2013-09-05 Speech emotional inference method based on emotion context and system

Country Status (1)

Country Link
CN (1) CN103810994B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106598948A (en) * 2016-12-19 2017-04-26 杭州语忆科技有限公司 Emotion recognition method based on long-term and short-term memory neural network and by combination with autocoder
CN106991172A (en) * 2017-04-05 2017-07-28 安徽建筑大学 Method for establishing multi-mode emotion interaction database
CN107305773A (en) * 2016-04-15 2017-10-31 美特科技(苏州)有限公司 Voice mood discrimination method
CN108039181A (en) * 2017-11-02 2018-05-15 北京捷通华声科技股份有限公司 The emotion information analysis method and device of a kind of voice signal
CN108346436A (en) * 2017-08-22 2018-07-31 腾讯科技(深圳)有限公司 Speech emotional detection method, device, computer equipment and storage medium
CN108664469A (en) * 2018-05-07 2018-10-16 首都师范大学 A kind of emotional category determines method, apparatus and server
CN109256150A (en) * 2018-10-12 2019-01-22 北京创景咨询有限公司 Speech emotion recognition system and method based on machine learning
CN110047517A (en) * 2019-04-24 2019-07-23 京东方科技集团股份有限公司 Speech-emotion recognition method, answering method and computer equipment
CN112418254A (en) * 2019-08-20 2021-02-26 北京易真学思教育科技有限公司 Emotion recognition method, device, equipment and storage medium
WO2021155662A1 (en) * 2020-02-03 2021-08-12 华为技术有限公司 Text information processing method and apparatus, computer device, and readable storage medium
CN113689886A (en) * 2021-07-13 2021-11-23 北京工业大学 Voice data emotion detection method and device, electronic equipment and storage medium
CN113889150A (en) * 2021-10-15 2022-01-04 北京工业大学 Speech emotion recognition method and device
WO2022016580A1 (en) * 2020-07-21 2022-01-27 南京智金科技创新服务中心 Intelligent voice recognition method and device
WO2024010485A1 (en) * 2022-07-07 2024-01-11 Nvidia Corporation Inferring emotion from speech in audio data using deep learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102881284A (en) * 2012-09-03 2013-01-16 江苏大学 Unspecific human voice and emotion recognition method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102881284A (en) * 2012-09-03 2013-01-16 江苏大学 Unspecific human voice and emotion recognition method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHUN CHEN ETC: "An Enhanced Speech Emotion Recognition System Based on Discourse Information", 《LECTURE NOTES IN COMPUTER SCIENCE》, vol. 3991, 31 December 2006 (2006-12-31), pages 449 - 456, XP019032816 *
白李娟 等: "基于声学上下文的语音情感特征提取与分析", 《小型微型计算机***》, vol. 34, no. 6, 30 June 2013 (2013-06-30), pages 1451 - 1456 *
黄程韦 等: "基于特征空间分解与融合的语音情感识别", 《信号处理》, vol. 26, no. 6, 30 June 2010 (2010-06-30), pages 835 - 842 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107305773A (en) * 2016-04-15 2017-10-31 美特科技(苏州)有限公司 Voice mood discrimination method
CN106598948A (en) * 2016-12-19 2017-04-26 杭州语忆科技有限公司 Emotion recognition method based on long-term and short-term memory neural network and by combination with autocoder
CN106598948B (en) * 2016-12-19 2019-05-03 杭州语忆科技有限公司 Emotion identification method based on shot and long term Memory Neural Networks combination autocoder
CN106991172B (en) * 2017-04-05 2020-04-28 安徽建筑大学 Method for establishing multi-mode emotion interaction database
CN106991172A (en) * 2017-04-05 2017-07-28 安徽建筑大学 Method for establishing multi-mode emotion interaction database
US11922969B2 (en) 2017-08-22 2024-03-05 Tencent Technology (Shenzhen) Company Limited Speech emotion detection method and apparatus, computer device, and storage medium
CN108346436A (en) * 2017-08-22 2018-07-31 腾讯科技(深圳)有限公司 Speech emotional detection method, device, computer equipment and storage medium
US11189302B2 (en) 2017-08-22 2021-11-30 Tencent Technology (Shenzhen) Company Limited Speech emotion detection method and apparatus, computer device, and storage medium
CN108039181B (en) * 2017-11-02 2021-02-12 北京捷通华声科技股份有限公司 Method and device for analyzing emotion information of sound signal
CN108039181A (en) * 2017-11-02 2018-05-15 北京捷通华声科技股份有限公司 The emotion information analysis method and device of a kind of voice signal
CN108664469A (en) * 2018-05-07 2018-10-16 首都师范大学 A kind of emotional category determines method, apparatus and server
CN109256150A (en) * 2018-10-12 2019-01-22 北京创景咨询有限公司 Speech emotion recognition system and method based on machine learning
CN109256150B (en) * 2018-10-12 2021-11-30 北京创景咨询有限公司 Speech emotion recognition system and method based on machine learning
CN110047517A (en) * 2019-04-24 2019-07-23 京东方科技集团股份有限公司 Speech-emotion recognition method, answering method and computer equipment
CN112418254A (en) * 2019-08-20 2021-02-26 北京易真学思教育科技有限公司 Emotion recognition method, device, equipment and storage medium
WO2021155662A1 (en) * 2020-02-03 2021-08-12 华为技术有限公司 Text information processing method and apparatus, computer device, and readable storage medium
WO2022016580A1 (en) * 2020-07-21 2022-01-27 南京智金科技创新服务中心 Intelligent voice recognition method and device
CN113689886A (en) * 2021-07-13 2021-11-23 北京工业大学 Voice data emotion detection method and device, electronic equipment and storage medium
CN113889150A (en) * 2021-10-15 2022-01-04 北京工业大学 Speech emotion recognition method and device
CN113889150B (en) * 2021-10-15 2023-08-29 北京工业大学 Speech emotion recognition method and device
WO2024010485A1 (en) * 2022-07-07 2024-01-11 Nvidia Corporation Inferring emotion from speech in audio data using deep learning

Also Published As

Publication number Publication date
CN103810994B (en) 2016-09-14

Similar Documents

Publication Publication Date Title
CN103810994B (en) Speech emotional inference method based on emotion context and system
Xie et al. Speech emotion classification using attention-based LSTM
CN108597541B (en) Speech emotion recognition method and system for enhancing anger and happiness recognition
Mirsamadi et al. Automatic speech emotion recognition using recurrent neural networks with local attention
CN108564942B (en) Voice emotion recognition method and system based on adjustable sensitivity
CN106503805B (en) A kind of bimodal based on machine learning everybody talk with sentiment analysis method
CN102723078B (en) Emotion speech recognition method based on natural language comprehension
CN103345922B (en) A kind of large-length voice full-automatic segmentation method
CN101064104B (en) Emotion voice creating method based on voice conversion
Xia et al. Using i-Vector Space Model for Emotion Recognition.
CN110675860A (en) Voice information identification method and system based on improved attention mechanism and combined with semantics
CN103258532B (en) A kind of Chinese speech sensibility recognition methods based on fuzzy support vector machine
CN107220235A (en) Speech recognition error correction method, device and storage medium based on artificial intelligence
CN107393554A (en) In a kind of sound scene classification merge class between standard deviation feature extracting method
CN110675859B (en) Multi-emotion recognition method, system, medium, and apparatus combining speech and text
CN107492382A (en) Voiceprint extracting method and device based on neutral net
CN105261367B (en) A kind of method for distinguishing speek person
CN104008754B (en) Speech emotion recognition method based on semi-supervised feature selection
Ren et al. Generating and protecting against adversarial attacks for deep speech-based emotion recognition models
Yaman et al. An integrative and discriminative technique for spoken utterance classification
CN103985381A (en) Voice frequency indexing method based on parameter fusion optimized decision
CN110992988B (en) Speech emotion recognition method and device based on domain confrontation
Song et al. A gesture-to-emotional speech conversion by combining gesture recognition and facial expression recognition
CN107221344A (en) A kind of speech emotional moving method
CN105280181A (en) Training method for language recognition model and language recognition method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant