CN104464756A

CN104464756A - Small speaker emotion recognition system

Info

Publication number: CN104464756A
Application number: CN201410750977.5A
Authority: CN
Inventors: 冯秀霞
Original assignee: Heilongjiang Zhenmei Broadcasting Communications Equipment Co Ltd
Current assignee: Heilongjiang Zhenmei Broadcasting Communications Equipment Co Ltd
Priority date: 2014-12-10
Filing date: 2014-12-10
Publication date: 2015-03-25

Abstract

The invention discloses a small speaker emotion recognition system. Firstly, a small emotion voice library is built, one parts of voices in the small emotion voice library serve as training samples and are used for building a reference template, the other parts of the voices serve as testing samples and are used for follow-up emotion recognition testing, then the obtained voices in the voice library are preprocessed, and emotion parameter extracting is carried out on preprocessed voice signals, wherein emotion parameters include the fundamental voice frequency, the formants, the Mel frequency cepstrum coefficient and the related statistic parameters; then voice emotion recognition testing is carried out, the emotion parameters of the training voices are classified through an emotion classifier based on support vector machines, then the testing voices are predicated through the emotion classifier, and emotions of the testing voices are judged.

Description

A kind of small-sized speaker's emotion recognition system

Technical field

what the present invention relates to is a kind of speech emotion recognition system, particularly one small-sized speaker's emotion recognition system.

background technology

voice are interpersonal important means exchanged, and sound is the carrier of information, and people can obtain information wherein by sound, wherein naturally comprise emotion information.Voice are a kind of important tool that the mankind exchange mutually, are also the important media of one transmitting emotion.The not just Word message that voice signal comprises, is also mingled with the emotion of people simultaneously.In short equally, wherein can comprise different emotions, and emotion is different, so the meaning of the words just likely changes, if computing machine cannot obtain its emotion from the voice of operator, so just can not reach best communicative effect, even likely can misunderstand to the meaning of operator, thus generation misoperation, make troubles to operator.

speech processing is an important field of research, and research history existing very long so far, the emotion research of voice signal is then an emerging field, but it is a research topic combining multiclass subject.Wherein mainly contain the important subjects such as physiology, psychology and signal transacting.Achievement in research-speech emotion recognition the system of this problem has quite broad application prospect simultaneously, specifically can be applied in:

whether 1, distance network teaching, can add emotion recognition system in distance education system, proper by judging the emotional expression of learner, and learner can be allowed better to improve Reading ability with enriching emotion.

2, for criminal investigation field, emotion recognition system can be made into an a lie detector, utilizes it to infer the language really degree of tester.Along with improving constantly of technology, constantly can improve the function of a lie detector and use it in reality, therefore emotion recognition system also has considerable practical significance for criminal investigation field.

3, amusement game, at present great majority game is all conveyed a message by word, if add the emotion recognition of voice in gaming and express, can the transfer mode of abundant information, also more can attract player simultaneously.Can alleviate the fatigue strength of player in game process to a certain extent by the mode of this novelty, player also can obtain the sense of hearing and visual enjoyment simultaneously, adds the played degree of game.

summary of the invention

the object of this invention is to provide and a kind ofly utilize a small-sized emotional speech Cooley to do training sample with it as voice, for building reference template, to people's emotion recognition system that the discrimination of often kind of emotion is added up.

the object of the present invention is achieved like this: first step work of the present invention is on the basis of reading domestic and international great mass of data, establish a small-sized emotional speech storehouse, wherein will do training sample, for building reference template by a part of voice; Another part does test sample book, tests for follow-up emotion recognition.Second step carries out pre-service to the voice obtained in sound bank, and its step mainly comprises pre-emphasis, windowing framing and speech terminals detection.3rd step be to pre-service after voice signal carry out the extraction work of emotion parameter, emotion parameter comprises fundamental frequency, resonance peak, mel-frequency cepstrum coefficient and pertinent statistical parameters thereof.With software, emulation experiment is carried out to the extraction of parameter, obtain the distribution range of the parameters of different emotions type, and concise and to the point analysis is carried out to result.4th step carries out speech emotion recognition experiment, classified by the emotion classifiers of the emotion parameter of training utterance based on support vector machine, predict afterwards with it to tested speech again, judges which kind of emotion it belongs to.After experiment, the discrimination of often kind of emotion is added up, final statistics is analyzed.Finally, for whole system devises a simple man-machine interface, this interface can complete input test voice, display system to the recognition result of these voice and the function that empties result.

oneself records a small-scale Chinese emotional speech storehouse, and in storehouse, the emotion of voice is divided into four classes: happy, angry, sad, surprised.Producer is 6 people is all boy student, and everyone reads aloud by 4 kinds of emotions respectively to 4 speech texts, and often kind of emotion reads aloud 4 times, altogether obtains 384 samples and uses emotional speech storehouse as experiment.Adopt the method for SVM to classify to emotion, wherein SVM adopts " one to one " method to solve polytypic problem.Finally respectively with the prosodic features of voice comprise the correlation parameter of fundamental tone and resonance peak, phonetic feature MFCC correlation parameter and both be combined as affective characteristics and identify, and carried out analyzing contrast to recognition result.In experiment, when identifying by whole 11 parameters, the average recognition rate of final 4 kinds of obtained emotions is 79.15%, and sad discrimination is up to 83.3%.Find simultaneously, the most easily occur to identify phenomenon between these two kinds of emotions happy and angry by mistake.

Accompanying drawing explanation

fig. 1 is speech emotion recognition process flow diagram.

Embodiment

below in conjunction with accompanying drawing citing, the present invention is described in more detail:

embodiment 1

composition graphs 1, Fig. 1 is speech emotion recognition process flow diagram.1, the acquisition in emotional speech storehouse.Because current speech emotion recognition is all for other country's language, it is relatively less that Chinese research in this respect is then carried out, and can not find the Chinese emotional speech storehouse that is specifically designed to emotion recognition.Therefore the beam worker carried out before Study of recognition is exactly the emotional speech storehouse that oneself records a small-scale Chinese, then carries out follow-up study based on this sound bank.2, the pre-service of voice signal.Due to voice signal, can not extracting directly affective characteristics parameter wherein for the voice signal in sound bank, a step front-end processing be must first carry out, pre-emphasis, windowing framing and end-point detection comprised.3, the extraction of affective characteristics parameter.Be then extract the affective characteristics parameter in signal after pre-service, wherein mainly comprise two kinds, a class is acoustical characteristic parameters, comprises 12 rank MFCC parameter and formant parameters.Another kind of is prosodic features parameter, comprises the fundamental frequency of voice, short-time energy, the parameters such as average zero-crossing rate.And carried out refinement on this basis, finally have chosen fundamental frequency mean value, maximal value, minimum value, the first resonance peak mean value, maximal value, and the 10th of MFCC the, 11,12 parameters are as affective characteristics parameter.4, the design of emotion classifiers.Present invention employs the design of the speech emotional sorter based on support vector machine (Support Vector Machine), because current svm is only applicable to two classification, and if many classification will be realized, then need to design a svm between every two samples, when needs are classified to unknown sample, then to finally determine its classification by voting.Method that Here it is so-called " one to one ".

Claims

1. small-sized speaker's emotion recognition system, it is characterized in that: first step work of the present invention is on the basis of reading domestic and international great mass of data, establish a small-sized emotional speech storehouse, wherein will do training sample, for building reference template by a part of voice; Another part does test sample book, tests for follow-up emotion recognition; Second step carries out pre-service to the voice obtained in sound bank, and its step mainly comprises: pre-emphasis, windowing framing and speech terminals detection; 3rd step be to pre-service after voice signal carry out the extraction work of emotion parameter, emotion parameter comprises fundamental frequency, resonance peak, mel-frequency cepstrum coefficient and pertinent statistical parameters thereof; With software, emulation experiment is carried out to the extraction of parameter, obtain the distribution range of the parameters of different emotions type, and concise and to the point analysis is carried out to result; 4th step carries out speech emotion recognition experiment, classified by the emotion classifiers of the emotion parameter of training utterance based on support vector machine, predict afterwards with it to tested speech again, judges which kind of emotion it belongs to; After experiment, the discrimination of often kind of emotion is added up, final statistics is analyzed; Finally, for whole system devises a simple man-machine interface, this interface can complete input test voice, display system to the recognition result of these voice and the function that empties result.

2. one according to claim 1 small-sized speaker's emotion recognition system, is characterized in that: record a small-scale Chinese emotional speech storehouse, in storehouse, the emotion of voice is divided into four classes: happy, angry, sad, surprised; Adopt the method for SVM to classify to emotion, wherein SVM adopts " one to one " method to solve polytypic problem; Finally respectively with the prosodic features of voice comprise the correlation parameter of fundamental tone and resonance peak, phonetic feature MFCC correlation parameter and both be combined as affective characteristics and identify.