CN102142253A

CN102142253A - Voice emotion identification equipment and method

Info

Publication number: CN102142253A
Application number: CN2010101047793A
Authority: CN
Inventors: 郭庆; 王彬; 陆应亮
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-01-29
Filing date: 2010-01-29
Publication date: 2011-08-03
Anticipated expiration: 2030-01-29
Also published as: CN102142253B

Abstract

The invention provides voice emotion identification equipment and a voice emotion identification method. The voice emotion identification equipment comprises an emotion identification unit and a confidence judgment unit, wherein the emotion identification unit is used for identifying the current emotion state of voice of an addressor into a preliminary emotion state; and the confidence judgment unit is used for calculating the confidence of the preliminary emotion state and judging whether the preliminary emotion state is credible by using the confidence, and if so, the preliminary emotion state is determined as a final emotion state and the final emotion state is output. The confidence judgment is performed on the identification result of the voice emotion state and the final emotion state is determined according to the judgment result, so that the accuracy of the identification result of the voice emotion state can be improved.

Description

Speech emotional identification equipment and method

Technical field

The present invention relates to speech recognition technology, more specifically, relate to speech emotional identification equipment and speech-emotion recognition method.

Background technology

The emotion ability is the important symbol of human intelligence, and emotion is essential in person to person mutual, and is playing the part of important role in processes such as the mankind's perception, decision-making.For a long time, the research of emotion intelligence only is present in psychology and cognitive science field.In recent years, along with the development of artificial intelligence, emotion intelligence is calculated this research topic with computer technology in conjunction with having produced emotion.This will promote development of computer widely.

Voice are human important means that exchange, and are the most convenient that transmits information mutually, the most direct approach of fundamental sum.Voice signal is also carrying abundant emotion information when passing on semantic information.Therefore, along with the fast development of human-computer interaction technology, the emotion information in the voice signal just more and more is subjected to researchist's attention.As an important research direction of voice signal Kansei Information Processing, speech emotional identification is computer understanding human emotion's key, is the prerequisite that realizes intelligent man-machine interaction.

The problem that speech emotional identification at first will solve is the division of affective state, two kinds of affective state division methods relatively more commonly used is arranged at present: be the affective state division methods of continuous distribution and the affective state division methods that is in a discrete distribution.

The affective state division methods that is continuous distribution is generally tended to the human emotion is expressed as the dimension that several successive changes.Yet, have only two dimensions to gain universal acceptance so far: degree of excitation (arousal) and evaluating deg (valence), and be not sufficient to express all basic emotions.In addition, owing to have only the quantization means of high reliability just to make the space have the calculating meaning, so another problem of representing of dimension is the position and the distance of how quantization means emotion.But present dimension model is not all accomplished this point, and this has also limited based on the emotion of dimensional space and has calculated.

The affective state division methods that is in a discrete distribution is divided into a plurality of discrete states with the human emotion.Because it has simplified emotion model, calculates also comparatively simple, therefore, this methods of the employing of great majority research so far.

About how to discern speaker's affective state automatically from voice, in existing many patents, patented claim or paper etc. several different methods is disclosed.For example:

Patent documentation 1 is a feature with fundamental frequency track, amplitude, the formant frequency track of voice, and adopted the regular difference of sex to handle to speaker's sex, be support vector machine (Support Vector Machine of each emotion training at last, SVM) model calculates its affective state by the SVM model to the input voice.

Patent documentation 2 at first carries out performance test to the features such as fundamental frequency, energy, word speed, resonance peak and bandwidth thereof of voice, filter out the feature set bigger by a kind of feature selecting algorithm to emotion recognition influence, selected altogether 12 kinds with fundamental frequency, word speed, energy, resonance peak, feature that the resonance peak bandwidth is relevant.Then the input voice are extracted above feature, compare, nearest emotion template is thought to import the affective state of voice with the feature of the every kind of emotion that prestores in the database.

Patent documentation 3 has adopted fundamental frequency, voice duration, these three kinds of prosodic informations of energy of voice to carry out emotion as feature and has calculated.

Non-patent literature 4 adopts the SVM method to carry out emotion recognition for real call center data.In addition, except using prosodic features and spectrum signature, some further features (these features might be obtained with higher confidence level by audio recognition method) have also been introduced from voice, as information such as phoneme boundary information, certain section voice are not fluent.

(GaussianMixture Model, GMM) method is carried out modelling for the spectrum signature of sequential to adopt mixed Gauss model in non-patent literature 5 and the non-patent literature 6.

(Hidden Markov Model, HMM) method is carried out modelling for the spectrum signature of sequential to adopt hidden Markov model in non-patent literature 7, non-patent literature 8 and the non-patent literature 9.Wherein, the difference that non-patent literature 9 further changes according to the acoustic feature of the dissimilar phoneme under the different emotions state (for example vowel, plosive, fricative, nasal sound etc.), train the HMM of each affective state respectively at different phoneme types, in identification, at first carry out the identification of phoneme, and then the HMM that uses under the different affective states carries out the identification of emotion for the input voice.

(LinearDiscriminant Analysis, LDA) method is carried out the identification of emotion according to prosodic features to adopt linear compartment analysis in non-patent literature 10 and the non-patent literature 11.

Yet, from disclosed many articles, patent and patented claim at present, most speech emotional identifying schemes is all paid close attention to by means of phonetic feature and affective state model etc. the affective state of voice is discerned, and do not consider of the influence of factors such as phonetic feature and affective state model to the recognition result accuracy, and can't carry out dynamic adjustments to the recognition result accuracy, therefore cause the unsettled recognition result accuracy that needs further improvement.

Therefore, the speech emotional identification equipment and/or the method that still need a kind of recognition result accuracy that can be improved at present.

Patent documentation 1: the invention people is called the Chinese patent 200610097301.6 of " a kind of speech-emotion recognition method based on support vector machine " for Zhao Li etc., name;

Patent documentation 2: the invention people is that Valery A.Petrushin, name are called System, the U.S. Patent application 9/387,037 of methodand article of manufacture for an emotion detection system;

Patent documentation 3: the invention people is that V Ian M.Bennett, name are called Emotion detectiondevice﹠amp; The U.S. Pat 2006/0122834A1 of method for use in distributed systems;

Non-patent literature 4: the author is the Five Emotion classes detection in real-word call center data:the use ofvarious types of paralinguistic features[C of Laurence Vidrascu and Laurence Devillers] .International workshop onParalinguistic Speech-between models and data, 2007, Saarbr ü cken, Germany;

Non-patent literature 5: the author is people's such as Daniel Neiberg and Kjell Elenius Emotionrecognition in spontaneous speech.Lund University, Centre for Language﹠amp; Literature, dept.of Linguistics ﹠amp; Phonetics.Working Papers 52 (2006), 101-104;

Non-patent literature 6: the author is people's such as Daniel Neiberg and Kjell Elenius Emotionrecognition in spontaneous speech UsingGMMs.Interspeech, 2006, and Pittsburgh;

Non-patent literature 7: the author is the HiddenMarkov model-based speech emotion recognition[C of Schuller B, Rigoll G and Lang M] .Proceedings of the2003 IEEE International Conference on Acoustics, Speech ， ﹠amp; SignalProcessing, Hong Kong, 2003:401-404;

Non-patent literature 8: the author is people's such as Nogueiras A, Moreno A and Bonafante A Speech emotion recognition using hidden Markov models[C] .Proceedings of Eurospeech, Aalborg, 2001:2679-2682;

Non-patent literature 9: the author is people's such as Chul Min Lee and Serdar Yildirim Emotion Recognition based on Phoneme Classes[C] .In:Proceedings ofthe International Conference on Spoken Language Processing (ICSLP2004), Jeju Island, Korea;

Non-patent literature 10: the author is the Narayanan of Chul Min Lee and Shrikanth S, Roberto Pieraccini.Combining Acoustic and Language Information forEmotion[C] .7th International Conference on Spoken LanguageProcessing, 2002, Denver, USA;

Non-patent literature 11: the author is the A study on theautomatic detection and characterization of emotion in a voice servicecontext[C of C.Blouin and V.Maffiolo] .Proc.Interspeech, 2005, Lisbon, 469-472.

Summary of the invention

Provided hereinafter about brief overview of the present invention, so that basic comprehension about some aspect of the present invention is provided.Should be appreciated that this general introduction is not about exhaustive general introduction of the present invention.It is not that intention is determined key of the present invention or pith, neither be intended to limit scope of the present invention.Its purpose only is to provide some notion with the form of simplifying, with this as the preorder in greater detail of argumentation after a while.

At least one purpose of the present invention is to provide a kind of speech emotional identification equipment and method, and it can overcome the part shortcoming and defect of above-mentioned prior art at least, with the accuracy of the recognition result that improves the speech emotional state.

Another object of the present invention provides corresponding computer programs product and/or computer-readable recording medium.

To achieve these goals, according to one embodiment of present invention, provide a kind of speech emotional identification equipment, having comprised: the emotion recognition unit is used for the current affective state of speaker's voice is identified as preliminary affective state; And degree of confidence judging unit, be used to calculate the degree of confidence of preliminary affective state, and utilize this degree of confidence to judge whether this preliminary affective state is credible, if this preliminary affective state is judged as credible, then this preliminary affective state is defined as final affective state, and exports this final affective state.

In this speech emotional identification equipment, also can comprise: amending unit, be used under the incredible situation of the preliminary affective state of degree of confidence judgment unit judges revising preliminary affective state obtaining the final affective state of voice, and export final affective state.

In this speech emotional identification equipment, the emotion recognition unit can be used for voice and a plurality of affective state model are mated to determine the preliminary affective state of voice, the degree of confidence judging unit can be used for utilizing with the corresponding affective state inverse model of preliminary affective state and/or formerly the affective state of voice calculate the degree of confidence of preliminary affective state.

In this speech emotional identification equipment, the degree of confidence judging unit can be configured to utilize the first preliminary degree of confidence of calculating preliminary affective state with the corresponding affective state inverse model of preliminary affective state, in the first preliminary degree of confidence under the situation within the preset range, the first preliminary degree of confidence is defined as degree of confidence, and the affective state of voice calculates the second preliminary degree of confidence of preliminary affective state and the first preliminary degree of confidence and the second preliminary degree of confidence is synthesized to obtain degree of confidence otherwise utilize formerly.

In this speech emotional identification equipment, the degree of confidence judging unit can be configured to utilize the affective state of voice formerly to calculate the first preliminary degree of confidence of preliminary affective state, in the first preliminary degree of confidence under the situation within the preset range, the first preliminary degree of confidence is defined as degree of confidence, calculates the second preliminary degree of confidence of preliminary affective state with the preliminary corresponding affective state inverse model of affective state and the first preliminary degree of confidence and the second preliminary degree of confidence are synthesized to obtain degree of confidence otherwise utilize.

To achieve these goals, according to another embodiment of the present invention, provide a kind of speech-emotion recognition method, having comprised: the preliminary affective state of identification speaker's voice; And the degree of confidence of calculating preliminary affective state, and utilize degree of confidence to judge whether this preliminary affective state is credible, if credible then this preliminary affective state is defined as final affective state, and would export this final affective state.

In this speech-emotion recognition method, also can comprise: revise preliminary affective state under the preliminary incredible situation of affective state obtaining the final affective state of voice judging, and export final affective state.

In this speech-emotion recognition method, the step of the preliminary affective state of recognizing voice comprises the preliminary affective state that voice and a plurality of affective state model is mated to determine voice, the step of calculating degree of confidence comprise utilize with the preliminary corresponding affective state inverse model of affective state and/or formerly the affective state of voice calculate the degree of confidence of preliminary affective state.

In this speech-emotion recognition method, the step of wherein calculating the degree of confidence of preliminary affective state comprises: utilize the first preliminary degree of confidence of calculating preliminary affective state with the preliminary corresponding affective state inverse model of affective state, in the first preliminary degree of confidence under the situation within the preset range, the first preliminary degree of confidence is defined as degree of confidence, otherwise utilize the affective state of voice formerly to calculate the second preliminary degree of confidence of preliminary affective state, and the first preliminary degree of confidence and the second preliminary degree of confidence are synthesized to obtain degree of confidence.

In this speech-emotion recognition method, the step of calculating the degree of confidence of preliminary affective state comprises: utilize the affective state of voice formerly to calculate the first preliminary degree of confidence of preliminary affective state, in the first preliminary degree of confidence under the situation within the preset range, the first preliminary degree of confidence is defined as degree of confidence, calculates the second preliminary degree of confidence of preliminary affective state with the preliminary corresponding affective state inverse model of affective state and the first preliminary degree of confidence and the second preliminary degree of confidence are synthesized to obtain degree of confidence otherwise utilize.

According to other embodiments of the invention, corresponding computer readable storage medium and computer program are also provided.

According to embodiments of the invention, carry out the degree of confidence judgement and determine final affective state, the accuracy that can improve the recognition result of speech emotional state according to judged result by recognition result to the speech emotional state.

By below in conjunction with the detailed description of accompanying drawing to most preferred embodiment of the present invention, these and other advantage of the present invention will be more obvious.

Description of drawings

The present invention can wherein use same or analogous Reference numeral to represent identical or similar parts in institute's drawings attached by being better understood with reference to hereinafter given in conjunction with the accompanying drawings description.Described accompanying drawing comprises in this manual and forms the part of this instructions together with following detailed description, and is used for further illustrating the preferred embodiments of the present invention and explains principle and advantage of the present invention.In the accompanying drawings:

Fig. 1 shows the synoptic diagram according to the speech emotional identification equipment of the embodiment of the invention one;

Fig. 2 shows the synoptic diagram according to the speech emotional identification equipment of the embodiment of the invention two;

Fig. 3 shows the synoptic diagram according to the speech emotional identification equipment of the embodiment of the invention three;

Fig. 4 shows the synoptic diagram according to the speech emotional identification equipment of the embodiment of the invention four;

Fig. 5 shows the synoptic diagram according to the speech emotional identification equipment of the embodiment of the invention five;

Fig. 6 shows the synoptic diagram according to the speech emotional identification equipment of the embodiment of the invention six;

Fig. 7 shows the synoptic diagram of the equipment of employed affective state model training in the embodiments of the invention;

Fig. 8 shows the process flow diagram according to the speech-emotion recognition method of the embodiment of the invention seven;

Fig. 9 shows the process flow diagram according to the speech-emotion recognition method of the embodiment of the invention eight;

Figure 10 shows the process flow diagram according to the speech-emotion recognition method of the embodiment of the invention nine;

Figure 11 shows the process flow diagram according to the speech-emotion recognition method of the embodiment of the invention ten;

Figure 12 shows the process flow diagram according to the speech-emotion recognition method of the embodiment of the invention 11; And

Figure 13 shows the block diagram of the exemplary configurations of the computing machine of realizing the embodiment of the invention.

It will be appreciated by those skilled in the art that in the accompanying drawing element only for simple and clear for the purpose of and illustrate, and not necessarily draw in proportion.For example, some size of component may have been amplified with respect to other elements in the accompanying drawing, so that help to improve the understanding to the embodiment of the invention.

Embodiment

To be described in detail one exemplary embodiment of the present invention in conjunction with the accompanying drawings hereinafter.For clarity and conciseness, all features of actual embodiment are not described in instructions.Yet, should understand, in the process of any this practical embodiments of exploitation, must make a lot of decisions specific to embodiment, so that realize developer's objectives, for example, meet and system and professional those relevant restrictive conditions, and these restrictive conditions may change to some extent along with the difference of embodiment.In addition, might be very complicated and time-consuming though will also be appreciated that development, concerning the those skilled in the art that have benefited from present disclosure, this development only is customary task.

At this, what also need to illustrate a bit is, for fear of having blured the present invention because of unnecessary details, in accompanying drawing and explanation, only described with according to closely-related apparatus structure of the solution of the present invention and/or treatment step, and omitted to the expression and the description of relation of the present invention parts little, that those of ordinary skills are known and processing.

In general, the speech emotional identifying relates to speech emotional exercise equipment and speech emotional identification equipment.

The speech emotional exercise equipment adopts the method for statistical modeling to carry out the statistics training of the model of this affective state respectively according to different affective states for selected feature usually, thereby obtains the statistical model of each affective state.

The speech emotional identification equipment then extracts the feature of input voice usually and the model of each affective state mates, and the affective state that will have a maximum match probability is considered as the emotion recognition result.

Fig. 1 shows the synoptic diagram according to the speech emotional state recognition equipment 100 of the embodiment of the invention one.

As shown in Figure 1, the speech emotional state recognition equipment 100 according to the embodiment of the invention one comprises emotion recognition unit 101 and degree of confidence judging unit 102.

Emotion recognition unit 101 is used for the current affective state of speaker's voice is identified as preliminary affective state.

Generally speaking, the voice signal under the different emotions state also has the different construction features and the regularity of distribution in characteristic aspect such as its time structure, amplitude structure, fundamental frequency structure and resonance peak structures.Thus, by the voice signal of various concrete patterns is calculated and is analyzed also modeling based on this in the construction features of characteristic aspect such as its time structure, amplitude structure, fundamental frequency structure and resonance peak structure and the regularity of distribution, the affective content that is implied in can recognition of speech signals.

Those skilled in the art is to be understood that; can select emotion recognition unit 101 employed known affective state sorting technique, phonetic feature system of selection, affective state method for establishing model or the like when the recognizing voice affective state neatly according to the demand of practical application, they all should be within the claimed spirit and scope of the present invention.

For example, in relevant research to the classification of basic affective state between four kinds to eight kinds.Influence bigger classification for example the affective state of Cornelius and Ekman with basic be categorized as six kinds of " happy ", " sadness ", " fear ", " abhoing ", " indignation " and " surprised " etc.In addition, in some specific application, for example in call-center application, can further simplify processing for the classification of emotion.For example, emotion can be divided into positivity, negativity and neutral three classes.

In addition, for example, using more phonetic feature at present mainly is the prosodic information and the spectrum information of voice.Prosodic information mainly comprises pitch, word speed and energy and pauses; What spectrum information use at present was more is Mei Er frequency cepstral coefficient (MFCC), linear predictor coefficient (LPC), resonance peak and correlated characteristic thereof or the like.

In addition, for example, can use following modeling method: support vector machine (Support vectormachine, SVM), mixed Gauss model (Gaussian Mixture Model, GMM), linear compartment analysis (Linear discriminant analysis, LDA), hidden Markov model (Gaussian Mixture Model, GMM), decision tree (Decision tree) or the like.

Therefore, for instructions for purpose of brevity, this just no longer to emotion recognition unit 101 during at the recognizing voice affective state employed concrete recognition methods be described in detail.

Degree of confidence judging unit 102 is used to calculate the degree of confidence of preliminary affective state, and utilize this degree of confidence to judge whether this preliminary affective state is credible, if this preliminary affective state is judged as credible, then this preliminary affective state is defined as final affective state, and exports this final affective state.

Those skilled in the art is to be understood that; the concrete confidence calculations method that can select degree of confidence judging unit 102 when calculating the degree of confidence of preliminary affective state, to be adopted neatly according to the demand of practical application; for example can wait to determine corresponding degree of confidence according to difference or the ratio between the matching probability of the size of the matching probability of the preliminary affective state affective state model corresponding with it or preliminary affective state and a plurality of affective state models, they all should be within the claimed spirit and scope of the present invention.

From the above, on voice being handled and analyzed, increased calculating according to the speech emotional identification equipment of the embodiment of the invention one to the degree of confidence of the affective state discerned with the basis of discerning its affective state.If judge that according to the degree of confidence of the affective state of being discerned the affective state of being discerned is credible (for example under the sufficiently high situation of this degree of confidence, for example be higher than predetermined threshold value or within predetermined span), can determine to import the affiliated affective state of voice so is the affective state of being discerned, and just the preliminary affective state that emotion recognition unit 101 is discerned is defined as final affective state.

Therefore, according to the speech emotional identification equipment of the embodiment of the invention one by introducing calculating to the degree of confidence of the affective state discerned, can reduce false-alarm on the one hand, on the other hand, because itself discerns easily between some affective state, thereby the introducing of degree of confidence can improve the accuracy of recognition result by mistake.

In addition, need to prove, though abovely speech emotional identification equipment according to present embodiment is described in conjunction with synoptic diagram shown in Figure 1, but those skilled in the art are to be understood that, synoptic diagram shown in Figure 1 only is exemplary, rather than to the restriction of scope of the present invention, those skilled in the art can carry out modification or modification to synoptic diagram shown in Figure 1 fully according to actual needs.

Fig. 2 shows the synoptic diagram according to the speech emotional identification equipment 200 of the embodiment of the invention two.

As shown in Figure 2, the speech emotional state recognition equipment 200 according to the embodiment of the invention two comprises emotion recognition unit 201, degree of confidence judging unit 202 and amending unit 203.

Emotion recognition unit 201 and degree of confidence judging unit 202 respectively with Fig. 1 in emotion recognition unit 101 and degree of confidence judging unit 102 similar, therefore just no longer emotion recognition unit 201 and degree of confidence judging unit 202 have been described in detail at this.

Amending unit 203 is used for judging at degree of confidence judging unit 202 revises preliminary affective state obtaining the final affective state of voice under the preliminary incredible situation of affective state, and exports final affective state.

Particularly, if degree of confidence judging unit 202 judges that according to the degree of confidence of the affective state of being discerned the affective state of being discerned is credible, for example in (for example be higher than predetermined threshold value or within predetermined span) under the sufficiently high situation of this degree of confidence, the affective state that can determine so to import under the voice is the affective state of being discerned.

Otherwise, if degree of confidence judging unit 202 judges that according to the degree of confidence of the affective state of being discerned the affective state of being discerned is insincere, for example in (for example be lower than predetermined threshold value or not within predetermined span) under the lower situation of this degree of confidence, amending unit 203 can be done correcting process to preliminary affective state again according to concrete application so.

For example, consider, can carry out correcting process for the not high enough affective state undetermined of degree of confidence targetedly from application point of view.

For example in the application of call center, what need most care is positivity, negativity and neutral three kinds of affective states, if affective state undetermined is that the degree of confidence of negative affect is not high enough, then amending unit 203 can be modified to neutral affective state with the affective state of input voice.

From the above, on voice being handled and analyzed, increased calculating according to the speech emotional identification equipment of the embodiment of the invention two to the degree of confidence of the affective state discerned with the basis of discerning its affective state, judge according to the degree of confidence of being calculated whether the affective state of being discerned is credible, and handle according to credible or insincere the execution accordingly of the affective state of being discerned.

Therefore, according to the speech emotional identification equipment of the embodiment of the invention two by introducing the calculating to the degree of confidence of the affective state discerned, the accuracy that can reduce false-alarm and improve recognition result.

Those skilled in the art is to be understood that; can select neatly and amending unit 203 is set according to the degree of confidence of the affective state of being discerned the affective state of being discerned is being defined as the concrete mode of the preliminary affective state of the correction taked when insincere according to concrete demands of applications, they all should be within the claimed spirit and scope of the present invention.

In addition, need to prove, though abovely speech emotional identification equipment according to present embodiment is described in conjunction with synoptic diagram shown in Figure 2, but those skilled in the art are to be understood that, synoptic diagram shown in Figure 2 only is exemplary, rather than to the restriction of scope of the present invention, those skilled in the art can carry out modification or modification to synoptic diagram shown in Figure 2 fully according to actual needs.

Fig. 3 shows the synoptic diagram according to the speech emotional identification equipment 300 of the embodiment of the invention three.

As shown in Figure 3, the speech emotional state recognition equipment 300 according to the embodiment of the invention three comprises emotion recognition unit 301 and degree of confidence judging unit 302.

Emotion recognition unit 301 is used for voice and a plurality of affective state model are mated to determine the preliminary affective state of voice.

As indicated above; can select emotion recognition unit 301 employed known affective state sorting technique, phonetic feature system of selection, affective state method for establishing model or the like when the recognizing voice affective state neatly according to the demand of practical application, they all should be within the claimed spirit and scope of the present invention.

For example, emotion recognition unit 301 mates phonetic feature respectively with various affective state models, to obtain the probable value that the current speech feature belongs to each affective state model.Is example to adopt sequential spectrum signature Mel cepstrum coefficient of frequency (MFCC) as the HMM model of affective characteristics, and the voice segments of input is being 39 dimensional feature vector o={o of frame number through being represented as after the feature extraction to count one by one ₁, o ₂... o _N, N is the frame number of this voice segments.This proper vector and 5 kinds of HMM affective state models storing are in advance carried out matching probability calculate, all calculate the probable value p (o|e that this proper vector belongs to this affective state model at each HMM affective state model _t=i).Because the use of HMM model is very general in this area, its computing method all have a detailed description in a lot of documents, are not described in detail here.

Therefore, for instructions for purpose of brevity, this just no longer to emotion recognition unit 301 during at the recognizing voice affective state employed concrete recognition methods be described in detail.

Degree of confidence judging unit 302 is used to utilize the degree of confidence of calculating this preliminary affective state with the determined preliminary corresponding affective state inverse model of affective state in emotion recognition unit 301.

Here, emotion recognition unit 301 employed affective state models and degree of confidence judging unit 302 employed affective state inverse models are based on that identical emotional speech storehouse trains.For example, referring to the synoptic diagram of the equipment 700 of employed affective state model training in the embodiments of the invention shown in Figure 7.This equipment 700 comprises affective state model training unit 702 and affective state inverse model training unit 703.Affective state model 704 is trained based on mark emotional speech storehouse 701 in affective state model training unit 702.Affective state inverse model training unit 703 is trained affective state inverse model 705 based on mark emotional speech storehouse 701.

In general, the emotional speech data of database has two kinds of obtain manners.A kind of is to design the recording drama in advance, looks for the people to carry out wherein playing the part of to obtain the speech data of corresponding emotion of various emotions then.Another is to extract from real recording data, extracts in for example can be from call-center application a large amount of clients and the actual taped conversations language material between operator.

Through this emotional speech data of database being carried out cutting and emotion mark, affective state in the dialog procedure is divided, for example be divided into happiness, indignation, worried, surprised, neutral, carry out sentence segmentation and mark in addition then, and remove affective state or the impaired sentence of statement, final form to possess a considerable amount ofly be rich in various emotions and show good mark emotional speech storehouse 701.

In addition, utilize mark emotional speech storehouse 701 to train, the invention is not restricted to this although Fig. 7 only shows equipment 700.For example, equipment 700 also can utilize universal phonetic data and mark emotional speech storehouse the two is trained.For example, can at first train a general model, and then carry out adaptive learning, obtain the emotion model knowledge base at last based on mark emotional speech storehouse according to the universal phonetic data.After this affective state model training unit 702 and affective state inverse model training unit 703 can be trained affective state model and affective state inverse model based on this emotion model knowledge base.

Affective state model training unit 702 can adopt the modeling method (for example HMM method or GMM method) of statistics and carry out training with the corresponding affective state model of this affective state at each affective state according to the speech data under the corresponding affective state of storage in the mark emotional speech storehouse 701, to obtain the pairing affective state model 704 of each affective state.

Perhaps, affective state model training unit 702 can at first train a general model according to the universal phonetic data, and then carries out adaptive learning based on mark emotional speech storehouse, obtains the corresponding affective state model 704 of each affective state at last.Wherein the algorithm that adaptive learning adopted can utilize algorithm known in some field of speech recognition, maximum a posteriori probability (maximuma posterior, MAP) algorithm for estimating, MLLR (Maximum Likelihood LinearRegression) algorithm etc. for example.

Affective state inverse model training unit 703 carries out the training of affective state inverse model at each affective state according to the speech data under the corresponding affective state of storage in the mark emotional speech storehouse 701, to obtain and the corresponding affective state inverse model 705 of each affective state.

The training of affective state inverse model 705 can adopt the method identical with the training method of aforesaid affective state model to train, only the training data difference.For the affective state inverse model training of certain affective state, carry out statistical computation with the data except the data that belong to this affective state in whole affection datas as training data, to obtain the pairing affective state inverse model 705 of this affective state.

Get back to Fig. 3 now.From the above, after emotion recognition unit 301 mated voice and a plurality of affective state model to determine the preliminary affective state of voice, degree of confidence judging unit 302 utilized the degree of confidence of calculating this preliminary affective state with the determined preliminary corresponding affective state inverse model 304 of affective state in emotion recognition unit 301.

For example, the confidence calculations of degree of confidence judging unit 302 can be according to the statistical information in the identifying.Then, can set a threshold value, if the degree of confidence of current affective state is higher than this threshold value, can differentiate the input voice is this affective state.

For example, the confidence calculations of degree of confidence judging unit 302 can be used the threshold value of being determined by the ratio between two probability, that is, the matching probability of input voice and the affective state model that is confirmed as preliminary affective state is divided by the ratio of the matching probability of the affective state inverse model of input voice and this preliminary affective state.

Because the calculating of probable value can be adopted the mode of taking the logarithm in the realization of computer program, therefore can be expressed as follows:

CM (e_{t} = i) = \log (P (o | e_{t} = i)) - \log (P (o | e_{t} = \overset{&OverBar;}{i}))

Wherein, CM (e _t=i) the current affective state of representative is the degree of confidence of i; P (o|e _t=i) belong to the probability of affective state i for voice;

P (o | e_{t} = \overset{&OverBar;}{i})

The inverse model that belongs to affective state i for voice

Probability.

Because the affective state inverse model is used to judge that voice belong to the probability of the affective state except this preliminary affective state, so degree of confidence judging unit 302 can be judged the degree of confidence of the determined preliminary affective state in emotion recognition unit 301 by using the affective state inverse model.

Therefore, judge that according to determined preliminary affective state and the matching degree between the corresponding affective state inverse model input voice belong to the degree of confidence of determined preliminary affective state according to the speech emotional state recognition equipment 300 of the embodiment of the invention three, thus the accuracy that can reduce false-alarm and improve recognition result.

In addition, need to prove, though abovely speech emotional identification equipment according to present embodiment is described in conjunction with synoptic diagram shown in Figure 3, but those skilled in the art are to be understood that, synoptic diagram shown in Figure 3 only is exemplary, rather than to the restriction of scope of the present invention, those skilled in the art can carry out modification or modification to synoptic diagram shown in Figure 3 fully according to actual needs.For example, those skilled in the art can be according to the needs of practical application and is selected the concrete mode of the confidence calculations of degree of confidence judging unit 302 flexibly, and they all should be within the claimed spirit and scope of the present invention.

Fig. 4 shows the synoptic diagram according to the speech emotional identification equipment 400 of the embodiment of the invention four.

As shown in Figure 4, the speech emotional identification equipment 400 according to the embodiment of the invention four comprises emotion recognition unit 401 and degree of confidence judging unit 402.

Emotion recognition unit 401 and the affective state model 406 that uses thereof respectively with Fig. 3 in emotion recognition unit 301 and affective state model 303 similar, therefore just no longer emotion recognition unit 401 and affective state model 406 have been described in detail at this.

Degree of confidence judging unit 402 comprises first preliminary confidence computation unit 403, second preliminary confidence computation unit 404 and the synthesis unit 405.

The first preliminary confidence computation unit 403 is utilized the first preliminary degree of confidence of calculating this preliminary affective state with the determined preliminary corresponding affective state inverse model 407 of affective state in emotion recognition unit 401.

For example, the first preliminary confidence computation unit 403 can adopt with Fig. 3 in the similar confidence calculations method of confidence calculations method of degree of confidence judging unit 302.Similarly, those skilled in the art can be according to the needs of practical application and is selected the concrete mode of the confidence calculations of the first preliminary confidence computation unit 403 flexibly, and they all should be within the claimed spirit and scope of the present invention.

The second preliminary confidence computation unit 404 utilizes speech emotional state 408 formerly to calculate the second preliminary degree of confidence of the determined preliminary affective state in emotion recognition unit 401.

Particularly, in one section dialog procedure, in general same individual's affective state is stable at short notice, if happiness, the affective state of these words be that the probability of indignation just is lower than is the probability of glad affective state or neutral affective state so such as the affective state of last a word.Similarly, a people in said continuous three words, the combination that changes in emotional is bigger, also very little such as the possibility of indignation-happiness-indignation because most people's affective state transforms the process all need a gradual change.Therefore, the possibility of the mutual conversion between the different affective state has also had difference.Thus, the second preliminary confidence computation unit 404 can be calculated the second preliminary degree of confidence of the determined preliminary affective state in emotion recognition unit 401 according to the affective state of voice formerly.Can utilize the conditional probability of the preliminary affective state of determined current speech under the prerequisite of the affective state of the voice of determining formerly or its function as the second preliminary degree of confidence.Conditional probability is big more, and the second preliminary degree of confidence is just big more, and conditional probability is more little, and the second preliminary degree of confidence is just more little.In one example, the second preliminary confidence computation unit 404 can utilize the conditional probability knowledge that obtains based on mark emotional speech storehouse statistics in advance to come the design conditions probability.Conditional probability knowledge can be considered as under the situation of the affective state that has a front N-1 statement, the probability of the affective state of N statement in succession.Concrete computing method are as follows.

Suppose to have M kind affective state i ∈ { e ₁, e ₂... e _i... e _M, then definition

(1) last statement affective state is i, and the probability that current statement affective state is j (binary (Bi-gram) training pattern) is:

P_{bi} (e_{t} = j | e_{t - 1} = i) = \frac{C (e_{t - 1} = i, e_{t} = j)}{C (e_{t - 1} = i)},

C (e wherein _T-1=i) represent that affective state is the number of times of the statement appearance of i in the mark emotional speech storehouse, C (e _T-1=i, e _t=j) in the expression mark emotional speech storehouse in same section process of speaking the affective state of continuous two words of same speaker be respectively the occurrence number of the situation of i and j.

(2) affective state of preceding two words is respectively i, j, and the probability that current statement affective state is k (ternary (Tri-gram) training pattern) is:

P_{tri} (e_{t} = k | e_{t - 2} = i, e_{t - 1} = j) = \frac{C (e_{t - 2} = i, e_{t - 1} = j, e_{t} = k)}{C (e_{t - 2} = i, e_{t - 1} = j)},

C (e wherein _T-2=i, e _T-1=j) in the expression mark emotional speech storehouse in same section process of speaking the affective state of continuous two words of same speaker be respectively the occurrence number of the situation of i and j, C (e _T-2=i, e _T-1=j, e _t=k) in the expression mark emotional speech storehouse in same section process of speaking the affective state of continuous three words of same speaker be respectively i, the occurrence number of the situation of j and k.

Similarly, the affective state that can obtain preceding two words is respectively i ₁, i ₂..., i _N-1, current statement affective state is i _NProbability (N unit (N-gram) training pattern) be:

P_{N} (e_{t} = i_{N} | e_{t - N + 1} = i_{1}, e_{t - N + 2} = i_{2}, {. . ., e}_{t - 1} = i_{N - 1}) = \frac{C (e_{t - N + 1} = i_{1}, e_{t - N + 2} = i_{2}, . . ., e_{t - 1} = i_{N - 1}, e_{t} = i_{N})}{C (e_{t - N + 1} = i_{1}, e_{t - N + 2} = i_{2}, . . ., e_{t - 1} = i_{N - 1})}

C (e wherein _T-N+1=i ₁, e _T-N+2=i ₂..., e _T-1=i _N-1) in the expression mark emotional speech storehouse in same section process of speaking the affective state of same speaker's N continuous-1 word be respectively i ₁, i ₂..., i _N-1The occurrence number of situation, C (e _T-N+1=i ₁, e _T-N+2=i ₂..., e _T-1=i _N-1, e _t=i _N) in the expression mark emotional speech storehouse in same section process of speaking the affective state of same speaker's N continuous word be respectively i ₁, i ₂..., i _N-1And i _NThe occurrence number of situation.

Conditional probability knowledge can only comprise the conditional probability based on a kind of training pattern, also can comprise based on the conditional probability of training pattern of the same race not.In one section was spoken process, current statement affective state was had the greatest impact by the emotion of front two words, and the affective state of the statement of front influences less and be subjected to more.Therefore, conditional probability knowledge preferably comprise conditional probability based on the ternary training pattern, based on conditional probability or its combination of binary training pattern.That is to say that conditional probability knowledge preferably includes the conditional probability knowledge of changing between the affective state of speaker's adjacent two and/or three words in the process of speaking continuously.

Calculating about conditional probability can also be with reference to Chinese patent application 200910150458.4 (denomination of invention be " speech emotional identification equipment and the method for carrying out speech emotional identification ").

The second preliminary confidence computation unit 404 can be sought the conditional probability between the preliminary affective state of the affective state of fixed voice formerly and determined current speech from conditional probability knowledge, and calculates the second preliminary degree of confidence based on this conditional probability.

The second preliminary degree of confidence that the first preliminary degree of confidence that 405 pairs first preliminary confidence computation unit 403 of synthesis unit are calculated and the second preliminary confidence computation unit 404 are calculated is synthesized, to determine final degree of confidence.For example, can be with the weighted sum of the first preliminary degree of confidence and the second preliminary degree of confidence as final degree of confidence.

In one example, can set two threshold value θ at the first preliminary degree of confidence of input voice ₁And θ ₂, θ wherein ₁Be set to higher threshold value and θ ₂Be set to lower threshold value.If the first preliminary degree of confidence that obtains in the first preliminary confidence computation unit 203 is greater than θ ₁, can judge that so directly the input voice are affective state undetermined, if the firstth preliminary degree of confidence is less than θ ₂, judge directly that then the input voice are not affective states undetermined, that is to say in the first preliminary degree of confidence greater than θ ₁Or first preliminary degree of confidence less than θ ₂Situation under, the second preliminary confidence computation unit 404 does not need to calculate the second preliminary degree of confidence.Otherwise, if the first preliminary degree of confidence is more than or equal to θ ₂And smaller or equal to θ ₁, the then second preliminary confidence computation unit 404 calculating second preliminary degree of confidence, and 405 pairs first preliminary degree of confidence of synthesis unit and the second preliminary degree of confidence are synthesized, to determine final degree of confidence.

From the above, be based on this section input voice on the one hand according to the confidence calculations of the speech emotional state recognition equipment 400 of the embodiment of the invention four and calculate the first preliminary degree of confidence, on the other hand can also be with reference to adjusting based on the second preliminary degree of confidence of the affective state of context voice, thus speech emotional state recognition result's accuracy and reduction false-alarm can further be improved.

In addition, need to prove, though abovely speech emotional identification equipment according to present embodiment is described in conjunction with synoptic diagram shown in Figure 4, but those skilled in the art are to be understood that, synoptic diagram shown in Figure 4 only is exemplary, rather than to the restriction of scope of the present invention, those skilled in the art can carry out modification or modification to synoptic diagram shown in Figure 4 fully according to actual needs.For example; those skilled in the art can be according to the needs of practical application and is selected the specific implementation of first preliminary confidence computation unit 403, the second preliminary confidence computation unit 404 or synthesis unit 405 neatly, and they all should be within the claimed spirit and scope of the present invention.

Fig. 5 shows the synoptic diagram according to the speech emotional identification equipment 500 of the embodiment of the invention five.

As shown in Figure 5, the speech emotional identification equipment 500 according to the embodiment of the invention five comprises emotion recognition unit 501 and degree of confidence judging unit 502.

Emotion recognition unit 501 and the affective state model 503 that uses thereof respectively with Fig. 3 in emotion recognition unit 301 and affective state model 303 similar, therefore just no longer emotion recognition unit 501 and affective state model 303 have been described in detail at this.

Degree of confidence judging unit 502 is used to utilize speech emotional state 504 formerly to calculate the degree of confidence of the determined preliminary affective state in emotion recognition unit 501.

The function and the implementation of the function of degree of confidence judging unit 502 and the state of speech emotional formerly 504 that uses thereof and the second preliminary confidence computation unit 404 among implementation and Fig. 4 and the state of speech emotional formerly 408 that uses thereof are similar, thus this with regard to no longer to degree of confidence judging unit 502 and formerly speech emotional state 504 be described in detail.

From the above, according to the speech emotional state recognition equipment 500 of the embodiment of the invention five based on context the affective state of voice judge that the input voice belong to the degree of confidence of determined preliminary affective state, thereby the accuracy that can reduce false-alarm and improve recognition result.

In addition, need to prove, though abovely speech emotional identification equipment according to present embodiment is described in conjunction with synoptic diagram shown in Figure 5, but those skilled in the art are to be understood that, synoptic diagram shown in Figure 5 only is exemplary, rather than to the restriction of scope of the present invention, those skilled in the art can carry out modification or modification to synoptic diagram shown in Figure 5 fully according to actual needs.

Fig. 6 shows the synoptic diagram according to the speech emotional identification equipment 600 of the embodiment of the invention six.

As shown in Figure 6, the speech emotional identification equipment 600 according to the embodiment of the invention six comprises emotion recognition unit 601 and degree of confidence judging unit 602.

Emotion recognition unit 601 and the affective state model 606 that uses thereof respectively with Fig. 3 in emotion recognition unit 301 and affective state model 303 similar, therefore just no longer emotion recognition unit 601 and affective state model 606 have been described in detail at this.

Degree of confidence judging unit 602 comprises first preliminary confidence computation unit 603, second preliminary confidence computation unit 604 and the synthesis unit 605.

The first preliminary confidence computation unit 603 utilizes speech emotional state 607 formerly to calculate the second preliminary degree of confidence of the determined preliminary affective state in emotion recognition unit 601.

The second preliminary confidence computation unit 604 is utilized the second preliminary degree of confidence of calculating this preliminary affective state with the determined preliminary corresponding affective state inverse model 608 of affective state in emotion recognition unit 601.

The function of the first preliminary confidence computation unit 603 and the second preliminary confidence computation unit 604 and implementation respectively with Fig. 4 in the second preliminary confidence computation unit 404 and the function and the implementation of the first preliminary confidence computation unit 403 similar, therefore just no longer the first preliminary confidence computation unit 603 and the second preliminary confidence computation unit 604 have been described in detail at this.

The second preliminary degree of confidence that the first preliminary degree of confidence that 605 pairs first preliminary confidence computation unit 603 of synthesis unit are calculated and the second preliminary confidence computation unit 604 are calculated is synthesized, to determine final degree of confidence.

For example, synthesis unit 605 can carry out the first preliminary degree of confidence that confidence calculations obtained in the first preliminary confidence computation unit 603 and be lower than that directly this preliminary affective state of judgement is insincere under the situation of predetermined threshold to the preliminary affective state of input voice based on the affective state of voice formerly, be higher than in this first preliminary degree of confidence and judge directly under the situation of another predetermined threshold that this preliminary affective state is credible, and this first preliminary degree of confidence under the situation between these two predetermined thresholds in conjunction with the second preliminary confidence computation unit 604 based on the affective state inverse model 608 of correspondence and the second preliminary degree of confidence of calculating is judged final degree of confidence.

From the above, the affective state that is based on the context voice according to the confidence calculations of the speech emotional state recognition equipment 600 of the embodiment of the invention six on the one hand calculates the first preliminary degree of confidence, can also adjust with reference to the second preliminary degree of confidence of importing voice based on this section on the other hand, thereby can further improve speech emotional state recognition result's accuracy and reduce false-alarm.

In addition, need to prove, though abovely speech emotional identification equipment according to present embodiment is described in conjunction with synoptic diagram shown in Figure 6, but those skilled in the art are to be understood that, synoptic diagram shown in Figure 6 only is exemplary, rather than to the restriction of scope of the present invention, those skilled in the art can carry out modification or modification to synoptic diagram shown in Figure 6 fully according to actual needs.For example; those skilled in the art can be according to the needs of practical application and is selected the specific implementation of first preliminary confidence computation unit 603, the second preliminary confidence computation unit 604 and synthesis unit 605 neatly, and they all should be within the claimed spirit and scope of the present invention.

According to embodiments of the invention, also provide a kind of speech emotional state identification method.

Fig. 8 shows the process flow diagram according to the speech-emotion recognition method of the embodiment of the invention seven.

As shown in Figure 8, according to the speech-emotion recognition method of the embodiment of the invention seven from step S801.In step S801, the preliminary affective state of identification speaker's voice.In step S802, calculate the degree of confidence of this preliminary affective state, and utilize degree of confidence to judge whether preliminary affective state is credible, if credible then described preliminary affective state is defined as final affective state, and would export described final affective state.

From the above, on voice being handled and analyzed, increased calculating according to the speech-emotion recognition method of the embodiment of the invention seven to the degree of confidence of the affective state discerned with the basis of discerning its affective state.If judge that according to the degree of confidence of the affective state of being discerned the affective state of being discerned is credible (for example under the sufficiently high situation of this degree of confidence, for example be higher than predetermined threshold value or within predetermined span), can determine to import the affiliated affective state of voice so is the affective state of being discerned, and just the preliminary affective state of being discerned is defined as final affective state.

Therefore, according to the speech-emotion recognition method of the embodiment of the invention seven by introducing calculating to the degree of confidence of the affective state discerned, can reduce false-alarm on the one hand, on the other hand, because itself discerns easily between some affective state, thereby the introducing of degree of confidence can improve the accuracy of recognition result by mistake.

Fig. 9 shows the process flow diagram according to the speech-emotion recognition method of the embodiment of the invention eight.

As shown in Figure 9, according to the speech-emotion recognition method of the embodiment of the invention eight from step S901.At step S901, the preliminary affective state of identification speaker's voice.At step S902, calculate the degree of confidence of this preliminary affective state.At step S903, utilize this degree of confidence to judge whether preliminary affective state is credible.If in step S903, judge preliminary affective state credible ("Yes" in step S903), then enter step S904.In step S904, the affective state that this is preliminary is defined as final affective state.If in step S903, judge preliminary affective state insincere ("No" in step S903), then enter step S905.In step S905, revise preliminary affective state obtaining the final affective state of voice, and export final affective state.

From the above, on voice being handled and analyzed, increased calculating according to the speech-emotion recognition method of the embodiment of the invention eight to the degree of confidence of the affective state discerned with the basis of discerning its affective state, judge according to the degree of confidence of being calculated whether the affective state of being discerned is credible, and handle according to credible or insincere the execution accordingly of the affective state of being discerned.

Therefore, according to the speech-emotion recognition method of the embodiment of the invention eight by introducing the calculating to the degree of confidence of the affective state discerned, the accuracy that can reduce false-alarm and improve recognition result.

In addition; those skilled in the art is to be understood that; in embodiments of the invention eight; can select neatly and be arranged on according to the degree of confidence of the affective state of being discerned according to concrete demands of applications the affective state of being discerned is defined as the concrete mode of the preliminary affective state of the correction taked when insincere, they all should be within the claimed spirit and scope of the present invention.

Figure 10 shows the process flow diagram according to the speech-emotion recognition method of the embodiment of the invention nine.

As shown in figure 10, according to the speech-emotion recognition method of the embodiment of the invention nine from step S1001.At step S1001, the preliminary affective state that speaker's voice and a plurality of affective state model mated to determine voice.At step S1002, utilize with the preliminary corresponding inverse model of affective state and/or formerly the affective state of voice calculate the degree of confidence of described preliminary affective state, and utilize this degree of confidence to judge whether preliminary affective state is credible, if credible then this preliminary affective state is defined as final affective state, and export this final affective state.

From the above, according to the speech emotional state identification method of the embodiment of the invention nine according to the matching degree between determined preliminary affective state and the corresponding affective state inverse model and/or formerly the affective state of voice judge that the input voice belong to the degree of confidence of determined preliminary affective state, thereby the accuracy that can reduce false-alarm and improve recognition result.

Figure 11 shows the process flow diagram according to the speech-emotion recognition method of the embodiment of the invention ten.

As shown in figure 11, according to the speech-emotion recognition method of the embodiment of the invention ten from step S1101.At step S1101, the preliminary affective state that speaker's voice and a plurality of affective state model mated to determine voice.At step S1102, utilize the first preliminary degree of confidence of calculating described preliminary affective state with the preliminary corresponding inverse model of affective state.At step S1103, judge that the first preliminary degree of confidence is whether in preset range.If in step S1103, judge the first preliminary degree of confidence ("Yes" in step S1103) in preset range, then enter step S1104.In step S1104, the first preliminary degree of confidence is defined as degree of confidence, then enter step S1107.If in step S1103, judge the first preliminary degree of confidence ("No" in step S1103) not in preset range, then enter step S1105.In step S1105, utilize formerly that the affective state of voice calculates the second preliminary degree of confidence, enter step S1106 then.In step S1106, the first preliminary degree of confidence and the second preliminary degree of confidence are synthesized to obtain degree of confidence, enter step S1107 then.In step S1107, utilize above-mentioned degree of confidence to judge whether preliminary affective state is credible, if credible then described preliminary affective state is defined as final affective state, and would export final affective state.

From the above, be based on this section input voice on the one hand according to the confidence calculations of the speech emotional state identification method of the embodiment of the invention ten and calculate the first preliminary degree of confidence, on the other hand can also be with reference to adjusting based on the second preliminary degree of confidence of the affective state of context voice, thus speech emotional state recognition result's accuracy and reduction false-alarm can further be improved.

Figure 12 shows the process flow diagram according to the speech-emotion recognition method of the embodiment of the invention 11.

As shown in figure 12, according to the speech-emotion recognition method of the embodiment of the invention 11 from step S1201.At step S1201, the preliminary affective state that speaker's voice and a plurality of affective state model mated to determine voice.At step S1202, utilize the affective state of voice formerly to calculate the first preliminary degree of confidence of preliminary affective state.At step S1203, judge that the first preliminary degree of confidence is whether in preset range.If in step S1203, judge the first preliminary degree of confidence ("Yes" in step S1203) in preset range, then enter step S1204.In step S1204, the first preliminary degree of confidence is defined as degree of confidence, then enter step S1207.If in step S1203, judge the first preliminary degree of confidence ("No" in step S1203) not in preset range, then enter step S1205.In step S1205, utilize with the preliminary corresponding inverse model of affective state and calculate the second preliminary degree of confidence, enter step S1206 then.In step S1206, the first preliminary degree of confidence and the second preliminary degree of confidence are synthesized to obtain degree of confidence, enter step S1207 then.In step S1107, utilize above-mentioned degree of confidence to judge whether preliminary affective state is credible, if credible then described preliminary affective state is defined as final affective state, and would export final affective state.

From the above, the affective state that is based on the context voice according to the confidence calculations of the speech emotional state identification method of the embodiment of the invention 11 on the one hand calculates the first preliminary degree of confidence, can also adjust with reference to the second preliminary degree of confidence of importing voice based on this section on the other hand, thereby can further improve speech emotional state recognition result's accuracy and reduce false-alarm.

Can be according to the specific implementation of each step in the embodiments of the invention seven to embodiment 11 with reference to structure and each functions of components of the speech emotional identification equipment according to embodiments of the invention one to embodiment six as described above.For instructions for purpose of brevity, just no longer the specific implementation of above-mentioned each step has been described in detail at this.

In addition, need to prove, though abovely speech emotional state identification method according to present embodiment is described in conjunction with the process flow diagram shown in Fig. 8-12, but those skilled in the art are to be understood that, process flow diagram shown in Fig. 8-12 only is exemplary, rather than to the restriction of scope of the present invention, those skilled in the art can carry out modification or modification to the process flow diagram shown in Fig. 8-12 fully according to actual needs.For example; employed known affective state sorting technique, phonetic feature system of selection, affective state method for establishing model or the like, the concrete confidence calculations method that is adopted when calculating the degree of confidence of preliminary affective state, concrete mode of revising preliminary affective state or the like when those skilled in the art can be chosen in the recognizing voice affective state neatly according to the demand of practical application fully, they all should be within the claimed spirit and scope of the present invention.

Can order following the instructions naturally carry out in chronological order when also it is pointed out that the step of carrying out the series of processes in the process flow diagram shown in above-mentioned Fig. 8-12, but do not need necessarily to carry out according to time sequencing.Some step can walk abreast or carry out independently of one another.

Speech emotional identification equipment and method according to the embodiment of the invention can be widely used in fields such as education, amusement, art.In the electronic game field, the affective state of Games Software by the perception player understood player's preference, and then can move by some and show this emotion and change, and makes recreation more true to nature, increase and the player between interactive.

The security that can also help to strengthen use equipment according to the speech emotional identification equipment and the method for the embodiment of the invention.For example, the affective state that the intelligent monitor system of driver's safe driving can the dynamic monitoring human pilot, and warning is in good time proposed.Voice signal (as speech speed, tonal variations, volume intensity, voice quality, sound articulation etc.) can be used for discerning the being slow of speech property that the driver answers a question, when system detects that driver's energy is not concentrated or emotion changes, can remind the driver at any time, and can control automatically, in time change the state and the reaction of car.

Speech emotional identification equipment and method according to the embodiment of the invention also have great application prospect in the field, call center.For example, Britain Telecom uses the emotion computing technique and improves the call center, makes its hommization more.When running into impolite especially user, the speech recognition system with emotion consciousness can remind the telephonist to remain calm.After handling this class phone, the meeting comfort ﹠ encouragement telephonist of system helps them to regulate mood.

In addition, speech emotional identification equipment and the method according to the embodiment of the invention can also have following application mode in the field, call center: monitoring in real time comprises for the monitoring of operator's mood with for customer anger monitoring jumpy.For example monitoring under the listless situation of operator, can notify managerial personnel to arrange this operator to have a rest to adjust its mood.The off-line batch processing extracts some typical cases and analyzes for managerial personnel, such as record one period some exemplary dialog in the data of Automatic Extraction, and as under operator's correct reply, the client becomes calmness by indignation.

Though described the present invention and advantage thereof in detail, be to be understood that and under not breaking away from, can carry out various changes, alternative and conversion by the situation of the appended the spirit and scope of the present invention that claim limited.

And method and apparatus according to the invention both can realize by hardware, also can realize by software and firmware.Under situation about realizing by software or firmware, from storage medium or network to computing machine with specialized hardware structure, multi-purpose computer 1300 for example shown in Figure 13 is installed the program that constitutes this software, and this computing machine can be carried out various functions or the like when various program is installed.

In Figure 13, CPU (central processing unit) (CPU) 1301 carries out various processing according to program stored among ROM (read-only memory) (ROM) 1302 or from the program that storage area 1308 is loaded into random-access memory (ram) 1303.In RAM 1303, also store data required when CPU 1301 carries out various processing or the like as required.CPU 1301, ROM 1302 and RAM 1303 are connected to each other via bus 1304.Input/output interface 1305 also is connected to bus 1304.Following parts are connected to input/output interface 1305: importation 1306 comprises keyboard, mouse or the like; Output 1307 comprises display, such as cathode ray tube (CRT), LCD (LCD) or the like and loudspeaker or the like; Storage area 1308 comprises hard disk or the like; And communications portion 1309, comprise that network interface unit is such as LAN card, modulator-demodular unit or the like.Communications portion 1309 is handled such as the Internet executive communication via network.

As required, driver 1310 also is connected to input/output interface 1305.Detachable media 1311 is installed on the driver 1310 as required such as disk, CD, magneto-optic disk, semiconductor memory or the like, makes the computer program of therefrom reading be installed to as required in the storage area 1308.

Realizing by software under the situation of above-mentioned series of processes, such as detachable media 1311 program that constitutes software is being installed such as the Internet or storage medium from network.

It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 13 wherein having program stored therein, distribute separately so that the detachable media 1311 of program to be provided to the user with equipment.The example of detachable media 1311 comprises disk (comprising floppy disk (registered trademark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trademark)) and semiconductor memory.Perhaps, storage medium can be hard disk that comprises in ROM 1302, the storage area 1308 or the like, computer program stored wherein, and be distributed to the user with the equipment that comprises them.

At this moment, as long as this system or equipment have the function of executive routine, then embodiments of the present invention are not limited to program, and this program also can be a form arbitrarily, for example, the program carried out of target program, interpreter or the shell script that offers operating system etc.

In addition, computing machine is by being connected to the corresponding website on the Internet, and will download and be installed to according to computer program code of the present invention and carry out this program in the computing machine then, also can realize the present invention.

And described enforcement program of the present invention also can for example adopt the form of one or more signals.Described signal can be can be from internet sites data downloaded signal, or the data-signal that provides on carrier signal, or with the data-signal of any other form.

At last, also need to prove, in this article, relational terms such as first and second grades only is used for an entity or operation are made a distinction with another entity or operation, and not necessarily requires or hint and have the relation of any this reality or in proper order between these entities or the operation.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby make and comprise that process, method, article or the equipment of a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or also be included as this process, method, article or equipment intrinsic key element.Do not having under the situation of more restrictions, the key element that limits by statement " comprising ... ", and be not precluded within process, method, article or the equipment that comprises described key element and also have other identical element.

Though more than describe embodiments of the invention in conjunction with the accompanying drawings in detail, should be understood that embodiment described above just is used to illustrate the present invention, and be not construed as limiting the invention.For a person skilled in the art, can under situation without departing from the spirit and scope of the present invention, make various changes and modifications above-mentioned embodiment.Therefore, scope of the present invention is only limited by appended claim and equivalent thereof.

Remarks

1. 1 kinds of speech emotional identification equipments of remarks comprise:

The emotion recognition unit is used for the current affective state of speaker's voice is identified as preliminary affective state; And

The degree of confidence judging unit, be used to calculate the degree of confidence of described preliminary affective state, and utilize described degree of confidence to judge whether described preliminary affective state is credible, if described preliminary affective state is judged as credible, then described preliminary affective state is defined as final affective state, and exports described final affective state.

Remarks 2. also comprises as remarks 1 described speech emotional identification equipment:

Amending unit is used under the described preliminary incredible situation of affective state of described degree of confidence judgment unit judges revising described preliminary affective state obtaining the final affective state of described voice, and exports described final affective state.

Remarks 3. is as remarks 1 described speech emotional identification equipment, wherein said emotion recognition unit is used for described voice and a plurality of affective state model are mated to determine the preliminary affective state of described voice, described degree of confidence judging unit be used to utilize with the described preliminary corresponding inverse model of affective state and/or formerly the affective state of voice calculate the degree of confidence of described preliminary affective state.

Remarks 4. is as remarks 3 described speech emotional identification equipments, wherein said degree of confidence judging unit is configured to utilize the first preliminary degree of confidence of calculating described preliminary affective state with the described preliminary corresponding inverse model of affective state, in the described first preliminary degree of confidence under the situation within the preset range, the described first preliminary degree of confidence is defined as described degree of confidence, otherwise utilizes the affective state of voice formerly to calculate the second preliminary degree of confidence of described preliminary affective state and the described first preliminary degree of confidence and the second preliminary degree of confidence are synthesized to obtain described degree of confidence.

Remarks 5. is as remarks 3 described speech emotional identification equipments, wherein said degree of confidence judging unit is configured to utilize the affective state of voice formerly to calculate the first preliminary degree of confidence of described preliminary affective state, in the described first preliminary degree of confidence under the situation within the preset range, the described first preliminary degree of confidence is defined as described degree of confidence, otherwise utilizes the second preliminary degree of confidence of calculating described preliminary affective state with the described preliminary corresponding inverse model of affective state also the described first preliminary degree of confidence and the second preliminary degree of confidence to be synthesized to obtain described degree of confidence.

6. 1 kinds of speech-emotion recognition methods of remarks comprise:

Determine the preliminary affective state of speaker's voice; And

Calculate the degree of confidence of described preliminary affective state, and utilize described degree of confidence to judge whether described preliminary affective state is credible, if credible then described preliminary affective state is defined as final affective state, and would export described final affective state.

Remarks 7. also comprises as remarks 6 described speech-emotion recognition methods:

Revise described preliminary affective state under the described preliminary incredible situation of affective state obtaining the final affective state of described voice judging, and export described final affective state.

Remarks 8. is as remarks 6 described speech-emotion recognition methods, wherein saidly determine that the step of the preliminary affective state of described voice comprises the preliminary affective state that described voice and a plurality of affective state model is mated to determine described voice, the step of described calculating degree of confidence comprise utilization and the described preliminary corresponding inverse model of affective state and/or formerly the affective state of voice calculate the degree of confidence of described preliminary affective state.

Remarks 9. is as remarks 8 described speech-emotion recognition methods, the step of the degree of confidence of the described preliminary affective state of wherein said calculating comprises: utilize the first preliminary degree of confidence of calculating described preliminary affective state with the described preliminary corresponding inverse model of affective state, in the described first preliminary degree of confidence under the situation within the preset range, the described first preliminary degree of confidence is defined as described degree of confidence, otherwise utilizes the affective state of voice formerly to calculate the second preliminary degree of confidence of described preliminary affective state and the described first preliminary degree of confidence and the second preliminary degree of confidence are synthesized to obtain described degree of confidence.

Remarks 10. is as remarks 8 described speech-emotion recognition methods, the step of the degree of confidence of the described preliminary affective state of wherein said calculating comprises: utilize the affective state of voice formerly to calculate the first preliminary degree of confidence of described preliminary affective state, in the described first preliminary degree of confidence under the situation within the preset range, the described first preliminary degree of confidence is defined as described degree of confidence, otherwise utilizes the second preliminary degree of confidence of calculating described preliminary affective state with the described preliminary corresponding inverse model of affective state also the described first preliminary degree of confidence and the second preliminary degree of confidence to be synthesized to obtain described degree of confidence.

Remarks 11. is as remarks 9 or 10 described speech-emotion recognition methods, wherein the described first preliminary degree of confidence and the second preliminary degree of confidence synthesized to obtain described degree of confidence and comprises that weighted sum with the described first preliminary degree of confidence and the second preliminary degree of confidence is as described degree of confidence.

Claims

1. speech emotional identification equipment comprises:

2. speech emotional identification equipment as claimed in claim 1 also comprises:

3. speech emotional identification equipment as claimed in claim 1, wherein said emotion recognition unit is used for described voice and a plurality of affective state model are mated to determine the preliminary affective state of described voice, described degree of confidence judging unit be used to utilize with the described preliminary corresponding affective state inverse model of affective state and/or formerly the affective state of voice calculate the degree of confidence of described preliminary affective state.

4. speech emotional identification equipment as claimed in claim 3, wherein said degree of confidence judging unit is configured to utilize the first preliminary degree of confidence of calculating described preliminary affective state with the described preliminary corresponding affective state inverse model of affective state, in the described first preliminary degree of confidence under the situation within the preset range, the described first preliminary degree of confidence is defined as described degree of confidence, otherwise utilizes the affective state of voice formerly to calculate the second preliminary degree of confidence of described preliminary affective state and the described first preliminary degree of confidence and the second preliminary degree of confidence are synthesized to obtain described degree of confidence.

5. speech emotional identification equipment as claimed in claim 3, wherein said degree of confidence judging unit is configured to utilize the affective state of voice formerly to calculate the first preliminary degree of confidence of described preliminary affective state, in the described first preliminary degree of confidence under the situation within the preset range, the described first preliminary degree of confidence is defined as described degree of confidence, otherwise utilizes the second preliminary degree of confidence of calculating described preliminary affective state with the described preliminary corresponding affective state inverse model of affective state also the described first preliminary degree of confidence and the second preliminary degree of confidence to be synthesized to obtain described degree of confidence.

6. speech-emotion recognition method comprises:

The preliminary affective state of identification speaker's voice; And

7. speech-emotion recognition method as claimed in claim 6 also comprises:

8. speech-emotion recognition method as claimed in claim 6, the step of the preliminary affective state of the described voice of wherein said identification comprises the preliminary affective state that described voice and a plurality of affective state model is mated to determine described voice, the step of described calculating degree of confidence comprise utilize with the described preliminary corresponding affective state inverse model of affective state and/or formerly the affective state of voice calculate the degree of confidence of described preliminary affective state.

9. speech-emotion recognition method as claimed in claim 8, the step of the degree of confidence of the described preliminary affective state of wherein said calculating comprises: utilize the first preliminary degree of confidence of calculating described preliminary affective state with the described preliminary corresponding affective state inverse model of affective state, in the described first preliminary degree of confidence under the situation within the preset range, the described first preliminary degree of confidence is defined as described degree of confidence, otherwise utilizes the affective state of voice formerly to calculate the second preliminary degree of confidence of described preliminary affective state and the described first preliminary degree of confidence and the second preliminary degree of confidence are synthesized to obtain described degree of confidence.

10. speech-emotion recognition method as claimed in claim 8, the step of the degree of confidence of the described preliminary affective state of wherein said calculating comprises: utilize the affective state of voice formerly to calculate the first preliminary degree of confidence of described preliminary affective state, in the described first preliminary degree of confidence under the situation within the preset range, the described first preliminary degree of confidence is defined as described degree of confidence, otherwise utilizes the second preliminary degree of confidence of calculating described preliminary affective state with the described preliminary corresponding affective state inverse model of affective state also the described first preliminary degree of confidence and the second preliminary degree of confidence to be synthesized to obtain described degree of confidence.