CN104464756A - Small speaker emotion recognition system - Google Patents
Small speaker emotion recognition system Download PDFInfo
- Publication number
- CN104464756A CN104464756A CN201410750977.5A CN201410750977A CN104464756A CN 104464756 A CN104464756 A CN 104464756A CN 201410750977 A CN201410750977 A CN 201410750977A CN 104464756 A CN104464756 A CN 104464756A
- Authority
- CN
- China
- Prior art keywords
- emotion
- voice
- small
- parameter
- voices
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses a small speaker emotion recognition system. Firstly, a small emotion voice library is built, one parts of voices in the small emotion voice library serve as training samples and are used for building a reference template, the other parts of the voices serve as testing samples and are used for follow-up emotion recognition testing, then the obtained voices in the voice library are preprocessed, and emotion parameter extracting is carried out on preprocessed voice signals, wherein emotion parameters include the fundamental voice frequency, the formants, the Mel frequency cepstrum coefficient and the related statistic parameters; then voice emotion recognition testing is carried out, the emotion parameters of the training voices are classified through an emotion classifier based on support vector machines, then the testing voices are predicated through the emotion classifier, and emotions of the testing voices are judged.
Description
Technical field
what the present invention relates to is a kind of speech emotion recognition system, particularly one small-sized speaker's emotion recognition system.
background technology
voice are interpersonal important means exchanged, and sound is the carrier of information, and people can obtain information wherein by sound, wherein naturally comprise emotion information.Voice are a kind of important tool that the mankind exchange mutually, are also the important media of one transmitting emotion.The not just Word message that voice signal comprises, is also mingled with the emotion of people simultaneously.In short equally, wherein can comprise different emotions, and emotion is different, so the meaning of the words just likely changes, if computing machine cannot obtain its emotion from the voice of operator, so just can not reach best communicative effect, even likely can misunderstand to the meaning of operator, thus generation misoperation, make troubles to operator.
speech processing is an important field of research, and research history existing very long so far, the emotion research of voice signal is then an emerging field, but it is a research topic combining multiclass subject.Wherein mainly contain the important subjects such as physiology, psychology and signal transacting.Achievement in research-speech emotion recognition the system of this problem has quite broad application prospect simultaneously, specifically can be applied in:
whether 1, distance network teaching, can add emotion recognition system in distance education system, proper by judging the emotional expression of learner, and learner can be allowed better to improve Reading ability with enriching emotion.
2, for criminal investigation field, emotion recognition system can be made into an a lie detector, utilizes it to infer the language really degree of tester.Along with improving constantly of technology, constantly can improve the function of a lie detector and use it in reality, therefore emotion recognition system also has considerable practical significance for criminal investigation field.
3, amusement game, at present great majority game is all conveyed a message by word, if add the emotion recognition of voice in gaming and express, can the transfer mode of abundant information, also more can attract player simultaneously.Can alleviate the fatigue strength of player in game process to a certain extent by the mode of this novelty, player also can obtain the sense of hearing and visual enjoyment simultaneously, adds the played degree of game.
summary of the invention
the object of this invention is to provide and a kind ofly utilize a small-sized emotional speech Cooley to do training sample with it as voice, for building reference template, to people's emotion recognition system that the discrimination of often kind of emotion is added up.
the object of the present invention is achieved like this: first step work of the present invention is on the basis of reading domestic and international great mass of data, establish a small-sized emotional speech storehouse, wherein will do training sample, for building reference template by a part of voice; Another part does test sample book, tests for follow-up emotion recognition.Second step carries out pre-service to the voice obtained in sound bank, and its step mainly comprises pre-emphasis, windowing framing and speech terminals detection.3rd step be to pre-service after voice signal carry out the extraction work of emotion parameter, emotion parameter comprises fundamental frequency, resonance peak, mel-frequency cepstrum coefficient and pertinent statistical parameters thereof.With software, emulation experiment is carried out to the extraction of parameter, obtain the distribution range of the parameters of different emotions type, and concise and to the point analysis is carried out to result.4th step carries out speech emotion recognition experiment, classified by the emotion classifiers of the emotion parameter of training utterance based on support vector machine, predict afterwards with it to tested speech again, judges which kind of emotion it belongs to.After experiment, the discrimination of often kind of emotion is added up, final statistics is analyzed.Finally, for whole system devises a simple man-machine interface, this interface can complete input test voice, display system to the recognition result of these voice and the function that empties result.
oneself records a small-scale Chinese emotional speech storehouse, and in storehouse, the emotion of voice is divided into four classes: happy, angry, sad, surprised.Producer is 6 people is all boy student, and everyone reads aloud by 4 kinds of emotions respectively to 4 speech texts, and often kind of emotion reads aloud 4 times, altogether obtains 384 samples and uses emotional speech storehouse as experiment.Adopt the method for SVM to classify to emotion, wherein SVM adopts " one to one " method to solve polytypic problem.Finally respectively with the prosodic features of voice comprise the correlation parameter of fundamental tone and resonance peak, phonetic feature MFCC correlation parameter and both be combined as affective characteristics and identify, and carried out analyzing contrast to recognition result.In experiment, when identifying by whole 11 parameters, the average recognition rate of final 4 kinds of obtained emotions is 79.15%, and sad discrimination is up to 83.3%.Find simultaneously, the most easily occur to identify phenomenon between these two kinds of emotions happy and angry by mistake.
Accompanying drawing explanation
fig. 1 is speech emotion recognition process flow diagram.
Embodiment
below in conjunction with accompanying drawing citing, the present invention is described in more detail:
embodiment 1
composition graphs 1, Fig. 1 is speech emotion recognition process flow diagram.1, the acquisition in emotional speech storehouse.Because current speech emotion recognition is all for other country's language, it is relatively less that Chinese research in this respect is then carried out, and can not find the Chinese emotional speech storehouse that is specifically designed to emotion recognition.Therefore the beam worker carried out before Study of recognition is exactly the emotional speech storehouse that oneself records a small-scale Chinese, then carries out follow-up study based on this sound bank.2, the pre-service of voice signal.Due to voice signal, can not extracting directly affective characteristics parameter wherein for the voice signal in sound bank, a step front-end processing be must first carry out, pre-emphasis, windowing framing and end-point detection comprised.3, the extraction of affective characteristics parameter.Be then extract the affective characteristics parameter in signal after pre-service, wherein mainly comprise two kinds, a class is acoustical characteristic parameters, comprises 12 rank MFCC parameter and formant parameters.Another kind of is prosodic features parameter, comprises the fundamental frequency of voice, short-time energy, the parameters such as average zero-crossing rate.And carried out refinement on this basis, finally have chosen fundamental frequency mean value, maximal value, minimum value, the first resonance peak mean value, maximal value, and the 10th of MFCC the, 11,12 parameters are as affective characteristics parameter.4, the design of emotion classifiers.Present invention employs the design of the speech emotional sorter based on support vector machine (Support Vector Machine), because current svm is only applicable to two classification, and if many classification will be realized, then need to design a svm between every two samples, when needs are classified to unknown sample, then to finally determine its classification by voting.Method that Here it is so-called " one to one ".
Claims (2)
1. small-sized speaker's emotion recognition system, it is characterized in that: first step work of the present invention is on the basis of reading domestic and international great mass of data, establish a small-sized emotional speech storehouse, wherein will do training sample, for building reference template by a part of voice; Another part does test sample book, tests for follow-up emotion recognition; Second step carries out pre-service to the voice obtained in sound bank, and its step mainly comprises: pre-emphasis, windowing framing and speech terminals detection; 3rd step be to pre-service after voice signal carry out the extraction work of emotion parameter, emotion parameter comprises fundamental frequency, resonance peak, mel-frequency cepstrum coefficient and pertinent statistical parameters thereof; With software, emulation experiment is carried out to the extraction of parameter, obtain the distribution range of the parameters of different emotions type, and concise and to the point analysis is carried out to result; 4th step carries out speech emotion recognition experiment, classified by the emotion classifiers of the emotion parameter of training utterance based on support vector machine, predict afterwards with it to tested speech again, judges which kind of emotion it belongs to; After experiment, the discrimination of often kind of emotion is added up, final statistics is analyzed; Finally, for whole system devises a simple man-machine interface, this interface can complete input test voice, display system to the recognition result of these voice and the function that empties result.
2. one according to claim 1 small-sized speaker's emotion recognition system, is characterized in that: record a small-scale Chinese emotional speech storehouse, in storehouse, the emotion of voice is divided into four classes: happy, angry, sad, surprised; Adopt the method for SVM to classify to emotion, wherein SVM adopts " one to one " method to solve polytypic problem; Finally respectively with the prosodic features of voice comprise the correlation parameter of fundamental tone and resonance peak, phonetic feature MFCC correlation parameter and both be combined as affective characteristics and identify.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410750977.5A CN104464756A (en) | 2014-12-10 | 2014-12-10 | Small speaker emotion recognition system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410750977.5A CN104464756A (en) | 2014-12-10 | 2014-12-10 | Small speaker emotion recognition system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104464756A true CN104464756A (en) | 2015-03-25 |
Family
ID=52910700
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410750977.5A Pending CN104464756A (en) | 2014-12-10 | 2014-12-10 | Small speaker emotion recognition system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104464756A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016183961A1 (en) * | 2015-05-18 | 2016-11-24 | 百度在线网络技术(北京)有限公司 | Method, system and device for switching interface of smart device, and nonvolatile computer storage medium |
CN106531158A (en) * | 2016-11-30 | 2017-03-22 | 北京理工大学 | Method and device for recognizing answer voice |
WO2018095167A1 (en) * | 2016-11-22 | 2018-05-31 | 北京京东尚科信息技术有限公司 | Voiceprint identification method and voiceprint identification system |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20080086791A (en) * | 2007-03-23 | 2008-09-26 | 엘지전자 주식회사 | Feeling recognition system based on voice |
CN101599271A (en) * | 2009-07-07 | 2009-12-09 | 华中科技大学 | A kind of recognition methods of digital music emotion |
CN101685634A (en) * | 2008-09-27 | 2010-03-31 | 上海盛淘智能科技有限公司 | Children speech emotion recognition method |
US20110141258A1 (en) * | 2007-02-16 | 2011-06-16 | Industrial Technology Research Institute | Emotion recognition method and system thereof |
CN102723078A (en) * | 2012-07-03 | 2012-10-10 | 武汉科技大学 | Emotion speech recognition method based on natural language comprehension |
CN103021406A (en) * | 2012-12-18 | 2013-04-03 | 台州学院 | Robust speech emotion recognition method based on compressive sensing |
CN103440863A (en) * | 2013-08-28 | 2013-12-11 | 华南理工大学 | Speech emotion recognition method based on manifold |
CN104008754A (en) * | 2014-05-21 | 2014-08-27 | 华南理工大学 | Speech emotion recognition method based on semi-supervised feature selection |
CN104091602A (en) * | 2014-07-11 | 2014-10-08 | 电子科技大学 | Speech emotion recognition method based on fuzzy support vector machine |
CN104681039A (en) * | 2013-11-26 | 2015-06-03 | 哈尔滨智晟天诚科技开发有限公司 | Voice information-based small speaker emotion recognition system |
-
2014
- 2014-12-10 CN CN201410750977.5A patent/CN104464756A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110141258A1 (en) * | 2007-02-16 | 2011-06-16 | Industrial Technology Research Institute | Emotion recognition method and system thereof |
KR20080086791A (en) * | 2007-03-23 | 2008-09-26 | 엘지전자 주식회사 | Feeling recognition system based on voice |
CN101685634A (en) * | 2008-09-27 | 2010-03-31 | 上海盛淘智能科技有限公司 | Children speech emotion recognition method |
CN101599271A (en) * | 2009-07-07 | 2009-12-09 | 华中科技大学 | A kind of recognition methods of digital music emotion |
CN102723078A (en) * | 2012-07-03 | 2012-10-10 | 武汉科技大学 | Emotion speech recognition method based on natural language comprehension |
CN103021406A (en) * | 2012-12-18 | 2013-04-03 | 台州学院 | Robust speech emotion recognition method based on compressive sensing |
CN103440863A (en) * | 2013-08-28 | 2013-12-11 | 华南理工大学 | Speech emotion recognition method based on manifold |
CN104681039A (en) * | 2013-11-26 | 2015-06-03 | 哈尔滨智晟天诚科技开发有限公司 | Voice information-based small speaker emotion recognition system |
CN104008754A (en) * | 2014-05-21 | 2014-08-27 | 华南理工大学 | Speech emotion recognition method based on semi-supervised feature selection |
CN104091602A (en) * | 2014-07-11 | 2014-10-08 | 电子科技大学 | Speech emotion recognition method based on fuzzy support vector machine |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016183961A1 (en) * | 2015-05-18 | 2016-11-24 | 百度在线网络技术(北京)有限公司 | Method, system and device for switching interface of smart device, and nonvolatile computer storage medium |
WO2018095167A1 (en) * | 2016-11-22 | 2018-05-31 | 北京京东尚科信息技术有限公司 | Voiceprint identification method and voiceprint identification system |
CN106531158A (en) * | 2016-11-30 | 2017-03-22 | 北京理工大学 | Method and device for recognizing answer voice |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102723078B (en) | Emotion speech recognition method based on natural language comprehension | |
Koolagudi et al. | IITKGP-SESC: speech database for emotion analysis | |
CN102881284B (en) | Unspecific human voice and emotion recognition method and system | |
CN101346758B (en) | Emotion recognizer | |
CN110782872A (en) | Language identification method and device based on deep convolutional recurrent neural network | |
CN102332263B (en) | Close neighbor principle based speaker recognition method for synthesizing emotional model | |
Graciarena et al. | Combining prosodic lexical and cepstral systems for deceptive speech detection | |
CN106297826A (en) | Speech emotional identification system and method | |
CN101261832A (en) | Extraction and modeling method for Chinese speech sensibility information | |
CN108899033B (en) | Method and device for determining speaker characteristics | |
Villarreal et al. | From categories to gradience: Auto-coding sociophonetic variation with random forests | |
CN110111778A (en) | A kind of method of speech processing, device, storage medium and electronic equipment | |
CN107221344A (en) | A kind of speech emotional moving method | |
Zhang et al. | Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features. | |
CN109920435A (en) | A kind of method for recognizing sound-groove and voice print identification device | |
Brown et al. | Automatic sociophonetics: Exploring corpora with a forensic accent recognition system | |
CN106710588B (en) | Speech data sentence recognition method, device and system | |
CN104464756A (en) | Small speaker emotion recognition system | |
Lawson et al. | Improving language identification robustness to highly channel-degraded speech through multiple system fusion. | |
Trabelsi et al. | Improved frame level features and SVM supervectors approach for the recogniton of emotional states from speech: Application to categorical and dimensional states | |
CN113990288B (en) | Method for automatically generating and deploying voice synthesis model by voice customer service | |
CN104681039A (en) | Voice information-based small speaker emotion recognition system | |
Koolagudi et al. | Robust speaker recognition in noisy environments: Using dynamics of speaker-specific prosody | |
Liu et al. | Supra-Segmental Feature Based Speaker Trait Detection. | |
Mansour et al. | A comparative study in emotional speaker recognition in noisy environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150325 |
|
WD01 | Invention patent application deemed withdrawn after publication |