CN108010516A - Semantic independent speech emotion feature recognition method and device - Google Patents

Semantic independent speech emotion feature recognition method and device Download PDF

Info

Publication number
CN108010516A
CN108010516A CN201711258175.2A CN201711258175A CN108010516A CN 108010516 A CN108010516 A CN 108010516A CN 201711258175 A CN201711258175 A CN 201711258175A CN 108010516 A CN108010516 A CN 108010516A
Authority
CN
China
Prior art keywords
preset
mood
features
sound spectrum
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711258175.2A
Other languages
Chinese (zh)
Inventor
郑渊中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Speakin Technologies Co ltd
Original Assignee
Speakin Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Speakin Technologies Co ltd filed Critical Speakin Technologies Co ltd
Priority to CN201711258175.2A priority Critical patent/CN108010516A/en
Publication of CN108010516A publication Critical patent/CN108010516A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Child & Adolescent Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a semantic independent speech emotion feature recognition method and device. The method can directly judge the emotion of the speaker without depending on semantics, determines emotion types corresponding to PCM data according to the matching degree by matching the PCM data with the voice spectrum characteristics, the prosody characteristics and the tone quality characteristics in the emotion database, is simple and convenient in extracting the physical characteristics, is efficient and quick in processing process, can realize accurate recognition of the emotion characteristics by comprehensively matching the voice characteristics of various types, and solves the technical problems that the current voice emotion recognition processing process is complex, the realization difficulty is high, and the method excessively depends on semantics and has long processing time.

Description

A kind of semanteme independent voice mood characteristic recognition method and device
Technical field
The present invention relates to audio identification field, more particularly to the voice mood characteristic recognition method and dress that a kind of semanteme is independent Put.
Background technology
With the deep combination of computer technology and daily life, people be not content with by computer into Row audio identification only can confirm that speaker and speech recognition, can be more intelligent it is desirable to computer, can identify semanteme, The information of the higher levels such as mood.
Emotional information is very important a kind of information resources in voice.It is different from speech recognition technology, Emotion identification system System is more concerned with the tongue of speaker, is the deeper tone and attitude hidden in surface and play, can recognize For be in voice signal hide order of information.
In fact, during person to person exchanges, same speaker says identical two with different moods, can To show the entirely different meaning.
But in traditional intelligent sound data analysis, emotional information is regarded as the difference between individual, so as to damage Very valuable information is lost.
The implementation of voice mood identification technology is the identification such as speech recognition and Expression Recognition and semantics recognition mostly at present Mode is combined.But a variety of identification methods combine and carry out Emotion identification not only complex disposal process, realize difficulty height, Need to carry out the processing method such as image and Video processing, and processing time is longer.Therefore, current voice mood is result in know Other complex disposal process, realizes difficulty height, is overly dependent upon the technical problem of length of semantic and processing time.
The content of the invention
The present invention provides the voice mood characteristic recognition method and device that a kind of semanteme is independent, solves current voice Emotion identification complex disposal process, realizes difficulty height, is overly dependent upon the technical problem of length of semantic and processing time.
The present invention provides the voice mood characteristic recognition method that a kind of semanteme is independent, including:
S1:Obtain the PCM data in the audio file of wav forms;
S2:PCM data is subjected to speech feature extraction, the sound spectrum, prosodic features and tonequality for obtaining PCM data are special Sign;
S3:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with various feelings in mood data storehouse The corresponding preset sound spectrum of thread classification, preset prosodic features and preset tonequality feature carry out pattern match, according to pattern The mood classification for the result output matching degree maximum matched somebody with somebody.
Preferably, the step S3 is specifically included:
S301:Obtain corresponding with preset sound spectrum, preset prosodic features and preset tonequality feature in mood data storehouse Preset weights;
S302:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with it is various in mood data storehouse The corresponding preset sound spectrum of mood classification, preset prosodic features and preset tonequality feature carry out pattern match;
S303:Sound spectrum, prosodic features and tonequality feature in PCM data respectively with it is each in mood data storehouse The matching degree and mood data of the corresponding preset sound spectrum of kind mood classification, preset prosodic features and preset tonequality feature Preset sound spectrum in storehouse, preset prosodic features and the corresponding preset various mood classifications of weight computing of preset tonequality feature Weighted average, using weighted average as matching degree, export the mood classification of matching degree maximum.
Preferably, the sound spectrum specifically includes:MFCC features and GFCC features.
Preferably, the prosodic features specifically includes:Pitch features, Short Term Energy features, ZCR features With Speed features.
Preferably, the tonequality feature specifically includes:Formants features.
The present invention provides the voice mood specific identification device that a kind of semanteme is independent, including:
Audio acquisition module, the PCM data in audio file for obtaining wav forms;
Characteristic extracting module, for PCM data to be carried out speech feature extraction, obtains sound spectrum, the rhythm of PCM data Learn feature and tonequality feature;
Match output module, for by the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood Various mood classifications corresponding preset sound spectrum, preset prosodic features and preset tonequality feature are into row mode in database Match somebody with somebody, the mood classification of matching degree maximum is exported according to the result of pattern match.
Preferably, the matching output module specifically includes:
Weights submodule, for obtaining and preset sound spectrum, preset prosodic features and preset sound in mood data storehouse The corresponding preset weights of matter feature;
Matched sub-block, for by the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood number According to the corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature in storehouse into row mode Match somebody with somebody;
Output sub-module, for the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood The matching degree of the corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature in database Preset weight computing corresponding with preset sound spectrum in mood data storehouse, preset prosodic features and preset tonequality feature is various The weighted average of mood classification, using weighted average as matching degree, export the mood classification of matching degree maximum.
Preferably, the sound spectrum specifically includes:MFCC features and GFCC features.
Preferably, the prosodic features specifically includes:Pitch features, Short Term Energy features, ZCR features With Speed features.
Preferably, the tonequality feature specifically includes:Formants features.
As can be seen from the above technical solutions, example of the present invention has the following advantages:
The present invention provides the voice mood characteristic recognition method that a kind of semanteme is independent, including:S1:Obtain wav forms PCM data in audio file;S2:PCM data is subjected to speech feature extraction, obtains sound spectrum, the metrics of PCM data Feature and tonequality feature;S3:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood data storehouse In the corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature carry out pattern match, root According to the mood classification of the result output matching degree maximum of pattern match.
The present invention can not depend on semantic and directly judge speaker's mood, by PCM data and mood data storehouse Sound spectrum, prosodic features and tonequality feature matched, the corresponding mood classification of PCM data is determined according to matching degree, The method for extracting these physical features more succinctly facilitates, and processing procedure is efficiently quick, and the voice of plurality of classes is special Sign comprehensive matching can realize accurately identifying for emotional characteristics, solve current voice mood identification processing procedure complexity, real Existing difficulty is high, is overly dependent upon the technical problem of length of semantic and processing time.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, without having to pay creative labor, may be used also To obtain other attached drawings according to these attached drawings.
Fig. 1 is a kind of one embodiment of the independent voice mood characteristic recognition method of semanteme provided in an embodiment of the present invention Flow diagram;
Fig. 2 is a kind of another implementation of the independent voice mood characteristic recognition method of semanteme provided in an embodiment of the present invention The flow diagram of example;
Fig. 3 is a kind of one embodiment of the independent voice mood specific identification device of semanteme provided in an embodiment of the present invention Structure diagram.
Embodiment
An embodiment of the present invention provides the voice mood characteristic recognition method and device that a kind of semanteme is independent, solves current Voice mood identification processing procedure it is complicated, realize difficulty height, be overly dependent upon the technical problem of length of semantic and processing time.
Goal of the invention, feature, advantage to enable the present invention is more obvious and understandable, below in conjunction with the present invention Attached drawing in embodiment, is clearly and completely described the technical solution in the embodiment of the present invention, it is clear that disclosed below Embodiment be only part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, this area All other embodiment that those of ordinary skill is obtained without making creative work, belongs to protection of the present invention Scope.
Referring to Fig. 1, one an embodiment of the present invention provides a kind of independent voice mood characteristic recognition method of semanteme Embodiment, including:
Step 101:Obtain the PCM data in the audio file of wav forms;
It should be noted that in actual application, it is necessary to first obtain wav forms audio file in PCM data, and PCM data is introduced directly into memory, so as to the progress of subsequent step.
Step 102:PCM data is subjected to speech feature extraction, obtain the sound spectrum of PCM data, prosodic features and Tonequality feature;
It should be noted that obtain wav forms audio file in PCM data after, it is also necessary to PCM data is carried out Speech feature extraction, obtains the sound spectrum, prosodic features and tonequality feature of PCM data;
And for accuracy, it can be extracted from each dimension of various phonetic features, composition one is more than 100 dimensions Vector, for follow-up pattern match.
Step 103:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood data storehouse The corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature carry out pattern match, according to The mood classification of the result output matching degree maximum of pattern match.
It should be noted that the present embodiment passes through to the sound spectrum in PCM data and mood data storehouse, prosodic features Matched with tonequality feature, the corresponding mood classification of PCM data, the method for extracting these physical features are determined according to matching degree It is more succinct convenient, and processing procedure is efficiently quick, and the phonetic feature comprehensive matching of plurality of classes can realize mood Feature accurately identifies, and improves flexibility, convenience, tightness and the recognition efficiency of Emotion identification, can better adapt to intelligence The demand in hardware future can be changed, the sustainable intelligent hardware progress growing to complexity is complete, rapidly configures, and solves The voice mood identification processing procedure for having determined current is complicated, realizes difficulty height, is overly dependent upon the skill of length of semantic and processing time Art problem.
It is above an a kind of implementation of the independent voice mood characteristic recognition method of semanteme provided in an embodiment of the present invention Example, is below a kind of another embodiment of the independent voice mood characteristic recognition method of semanteme provided in an embodiment of the present invention.
Referring to Fig. 2, an embodiment of the present invention provides a kind of the another of the independent voice mood characteristic recognition method of semanteme A embodiment, including:
Step 201:Obtain the PCM data in the audio file of wav forms;
Step 202:PCM data is subjected to speech feature extraction, obtain the sound spectrum of PCM data, prosodic features and Tonequality feature;
Step 203:Obtain and preset sound spectrum, preset prosodic features and preset tonequality feature pair in mood data storehouse The preset weights answered;
Step 204:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood data storehouse The corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature carry out pattern match;
Step 205:Sound spectrum, prosodic features and tonequality feature in PCM data respectively with mood data storehouse In the corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature matching degree and mood Preset sound spectrum, preset prosodic features and the corresponding preset various mood classes of weight computing of preset tonequality feature in database Other weighted average, using weighted average as matching degree, export the mood classification of matching degree maximum.
It should be noted that the calculating of matching degree can pass through weighted average, neural network model or clustering algorithm etc. Mode is calculated, and a kind of embodiment only therein is calculated by weighted average;
The calculation formula of the weighted average of matching degree is as follows:
P=A*a+B*b+C*c
Wherein, P is matching degree, and A is the matching degree of the sound spectrum in PCM data and preset sound spectrum, B PCM The matching degree of prosodic features and preset prosodic features in data, C are tonequality feature and preset tonequality in PCM data The matching degree of feature, a are the corresponding preset weights of preset sound spectrum, and b is the corresponding preset weights of preset prosodic features, C is the corresponding preset weights of preset tonequality feature.
Further, the sound spectrum specifically includes:MFCC features and GFCC features.
It should be noted that MFCC is the abbreviation of Mel frequency cepstral coefficients;
Mel frequencies are extracted based on human hearing characteristic, it is with Hz frequencies into nonlinear correspondence relation, and Mel is frequently Rate cepstrum coefficient (MFCC) is then the Hz spectrum signatures being calculated using this relation between them;
GFCC is characterized as the aural signature based on Gammatone wave filters.
Further, the prosodic features specifically includes:Pitch features, Short Term Energy features, ZCR are special Seek peace Speed features.
It should be noted that Pitch features are related with the fundamental frequency (fundamental frequency) of sound, reflection It is the information of pitch;
Short Term Energy are characterized as short-time energy feature;
ZCR (zero-crossing rate, zero-crossing rate) feature refers to the ratio of the sign change of a signal, such as believes Number from positive number become negative or reversely, be to tap sound the main feature classify;
Speed is characterized as word speed feature.
Further, the tonequality feature specifically includes:Formants features.
It should be noted that the translator of Chinese of Formants features is formant feature, formant refers to the frequency in sound Some regions of energy Relatively centralized in spectrum, the formant not still determinant of tonequality, and reflect sound channel (resonant cavity) Physical features.
The present embodiment by the sound spectrum in PCM data and mood data storehouse, prosodic features and tonequality feature into Row matching, the corresponding mood classification of PCM data is determined according to matching degree, extracts the more succinct side of method of these physical features Just, and processing procedure is efficiently quick;
The comprehensive matching of the phonetic feature of plurality of classes is used at the same time, it is possible to achieve emotional characteristics accurately identifies;
The present invention improves flexibility, convenience, tightness and the recognition efficiency of Emotion identification, can better adapt to intelligence Change the demand in hardware future, the sustainable intelligent hardware progress growing to complexity is complete, rapidly configures;
Solve current voice mood identification processing procedure complexity, realize that difficulty is high and the technology of processing time length is asked Topic.
It is above a kind of another reality of the independent voice mood characteristic recognition method of semanteme provided in an embodiment of the present invention Example is applied, is below a kind of one embodiment of the independent voice mood specific identification device of semanteme provided in an embodiment of the present invention.
Referring to Fig. 3, one an embodiment of the present invention provides a kind of independent voice mood specific identification device of semanteme Embodiment, including:
Audio acquisition module 301, the PCM data in audio file for obtaining wav forms;
Characteristic extracting module 302, for PCM data to be carried out speech feature extraction, obtain PCM data sound spectrum, Prosodic features and tonequality feature;
Match output module 303, for by the sound spectrum in PCM data, prosodic features and tonequality feature respectively with The corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature carry out mould in mood data storehouse Formula matches, and the mood classification of matching degree maximum is exported according to the result of pattern match.
Further, matching output module 303 specifically includes:
Weights submodule 3031, for obtaining and preset sound spectrum in mood data storehouse, preset prosodic features and pre- Put the corresponding preset weights of tonequality feature;
Matched sub-block 3032, for by the sound spectrum in PCM data, prosodic features and tonequality feature respectively with feelings Various mood classifications corresponding preset sound spectrum, preset prosodic features and preset tonequality feature are into row mode in thread database Matching;
Output sub-module 3033, for the sound spectrum in PCM data, prosodic features and tonequality feature respectively with The matching of the corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature in mood data storehouse Preset sound spectrum, preset prosodic features and the corresponding preset weight computing of preset tonequality feature in degree and mood data storehouse The weighted average of various mood classifications, using weighted average as matching degree, export the mood classification of matching degree maximum.
Further, sound spectrum specifically includes:MFCC features and GFCC features.
Further, prosodic features specifically includes:Pitch features, Short Term Energy features, ZCR features and Speed features.
Further, tonequality feature specifically includes:Formants features.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description With the specific work process of module, the corresponding process in preceding method embodiment is may be referred to, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can pass through it Its mode is realized.For example, device embodiment described above is only schematical, for example, the division of the module, only Only a kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple module or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be the INDIRECT COUPLING or logical by some interfaces, device or module Letter connection, can be electrical, machinery or other forms.
The module illustrated as separating component may or may not be physically separate, be shown as module The component shown may or may not be physical module, you can with positioned at a place, or can also be distributed to multiple On mixed-media network modules mixed-media.Some or all of module therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each function module in each embodiment of the present invention can be integrated in a processing module, can also That modules are individually physically present, can also two or more modules be integrated in a module.Above-mentioned integrated mould Block can both be realized in the form of hardware, can also be realized in the form of software function module.
If the integrated module is realized in the form of software function module and is used as independent production marketing or use When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially The part to contribute in other words to the prior art or all or part of the technical solution can be in the form of software products Embody, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment the method for the present invention Portion or part steps.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to before Embodiment is stated the present invention is described in detail, it will be understood by those of ordinary skill in the art that:It still can be to preceding State the technical solution described in each embodiment to modify, or equivalent substitution is carried out to which part technical characteristic;And these Modification is replaced, and the essence of appropriate technical solution is departed from the spirit and scope of various embodiments of the present invention technical solution.

Claims (10)

  1. A kind of 1. independent voice mood characteristic recognition method of semanteme, it is characterised in that including:
    S1:Obtain the PCM data in the audio file of wav forms;
    S2:PCM data is subjected to speech feature extraction, obtains the sound spectrum, prosodic features and tonequality feature of PCM data;
    S3:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with various mood classes in mood data storehouse Not corresponding preset sound spectrum, preset prosodic features and preset tonequality feature carry out pattern match, according to pattern match As a result the mood classification of matching degree maximum is exported.
  2. A kind of 2. independent voice mood characteristic recognition method of semanteme according to claim 1, it is characterised in that the step Rapid S3 is specifically included:
    S301:Obtain corresponding pre- with preset sound spectrum, preset prosodic features and preset tonequality feature in mood data storehouse Put weights;
    S302:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with various moods in mood data storehouse The corresponding preset sound spectrum of classification, preset prosodic features and preset tonequality feature carry out pattern match;
    S303:Various mood classes in sound spectrum, prosodic features and tonequality feature and mood data storehouse in PCM data It is preset in not corresponding preset sound spectrum, preset prosodic features and the matching degree of preset tonequality feature and mood data storehouse The weighted average of sound spectrum, preset prosodic features and the corresponding preset various mood classifications of weight computing of preset tonequality feature Number, using weighted average as matching degree, exports the mood classification of matching degree maximum.
  3. A kind of 3. independent voice mood characteristic recognition method of semanteme according to claim 1, it is characterised in that the sound Spectrum signature specifically includes:MFCC features and GFCC features.
  4. A kind of 4. independent voice mood characteristic recognition method of semanteme according to claim 1, it is characterised in that the rhythm Study of law feature specifically includes:Pitch features, Short Term Energy features, ZCR features and Speed features.
  5. A kind of 5. independent voice mood characteristic recognition method of semanteme according to claim 1, it is characterised in that the sound Matter feature specifically includes:Formants features.
  6. A kind of 6. independent voice mood specific identification device of semanteme, it is characterised in that including:
    Audio acquisition module, the PCM data in audio file for obtaining wav forms;
    Characteristic extracting module, for PCM data to be carried out speech feature extraction, sound spectrum, the metrics for obtaining PCM data are special Tonequality of seeking peace feature;
    Match output module, for by the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood data The corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature carry out pattern match in storehouse, The mood classification of matching degree maximum is exported according to the result of pattern match.
  7. 7. the independent voice mood specific identification device of a kind of semanteme according to claim 6, it is characterised in that described Specifically included with output module:
    Weights submodule, it is special with preset sound spectrum, preset prosodic features and preset tonequality in mood data storehouse for obtaining Levy corresponding preset weights;
    Matched sub-block, for will in the sound spectrum in PCM data, prosodic features and tonequality feature and mood data storehouse it is each The corresponding preset sound spectrum of kind mood classification, preset prosodic features and preset tonequality feature carry out pattern match;
    Output sub-module, for the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood data The matching degree and feelings of the corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature in storehouse Preset sound spectrum, preset prosodic features and the corresponding preset various moods of weight computing of preset tonequality feature in thread database The weighted average of classification, using weighted average as matching degree, export the mood classification of matching degree maximum.
  8. A kind of 8. independent voice mood specific identification device of semanteme according to claim 6, it is characterised in that the sound Spectrum signature specifically includes:MFCC features and GFCC features.
  9. A kind of 9. independent voice mood specific identification device of semanteme according to claim 6, it is characterised in that the rhythm Study of law feature specifically includes:Pitch features, Short Term Energy features, ZCR features and Speed features.
  10. 10. the independent voice mood specific identification device of a kind of semanteme according to claim 6, it is characterised in that described Tonequality feature specifically includes:Formants features.
CN201711258175.2A 2017-12-04 2017-12-04 Semantic independent speech emotion feature recognition method and device Pending CN108010516A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711258175.2A CN108010516A (en) 2017-12-04 2017-12-04 Semantic independent speech emotion feature recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711258175.2A CN108010516A (en) 2017-12-04 2017-12-04 Semantic independent speech emotion feature recognition method and device

Publications (1)

Publication Number Publication Date
CN108010516A true CN108010516A (en) 2018-05-08

Family

ID=62056007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711258175.2A Pending CN108010516A (en) 2017-12-04 2017-12-04 Semantic independent speech emotion feature recognition method and device

Country Status (1)

Country Link
CN (1) CN108010516A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108806667A (en) * 2018-05-29 2018-11-13 重庆大学 The method for synchronously recognizing of voice and mood based on neural network
CN109087670A (en) * 2018-08-30 2018-12-25 西安闻泰电子科技有限公司 Mood analysis method, system, server and storage medium
CN110110135A (en) * 2019-04-17 2019-08-09 西安极蜂天下信息科技有限公司 Voice characteristics data library update method and device
CN110970113A (en) * 2018-09-30 2020-04-07 宁波方太厨具有限公司 Intelligent menu recommendation method based on user emotion
CN111182409A (en) * 2019-11-26 2020-05-19 广东小天才科技有限公司 Screen control method based on intelligent sound box, intelligent sound box and storage medium
CN111583968A (en) * 2020-05-25 2020-08-25 桂林电子科技大学 Speech emotion recognition method and system
CN112002304A (en) * 2020-08-27 2020-11-27 上海添力网络科技有限公司 Speech synthesis method and device
CN113408503A (en) * 2021-08-19 2021-09-17 明品云(北京)数据科技有限公司 Emotion recognition method and device, computer readable storage medium and equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101261832A (en) * 2008-04-21 2008-09-10 北京航空航天大学 Extraction and modeling method for Chinese speech sensibility information
CN102737629A (en) * 2011-11-11 2012-10-17 东南大学 Embedded type speech emotion recognition method and device
CN103854645A (en) * 2014-03-05 2014-06-11 东南大学 Speech emotion recognition method based on punishment of speaker and independent of speaker
KR20150045967A (en) * 2015-04-09 2015-04-29 이상민 Algorithm that converts the voice data into emotion data
CN105159979A (en) * 2015-08-27 2015-12-16 广东小天才科技有限公司 friend recommendation method and device
CN106297826A (en) * 2016-08-18 2017-01-04 竹间智能科技(上海)有限公司 Speech emotional identification system and method
CN106448652A (en) * 2016-09-12 2017-02-22 珠海格力电器股份有限公司 Control method and device of air conditioner
CN107221318A (en) * 2017-05-12 2017-09-29 广东外语外贸大学 Oral English Practice pronunciation methods of marking and system
CN107305773A (en) * 2016-04-15 2017-10-31 美特科技(苏州)有限公司 Voice mood discrimination method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101261832A (en) * 2008-04-21 2008-09-10 北京航空航天大学 Extraction and modeling method for Chinese speech sensibility information
CN102737629A (en) * 2011-11-11 2012-10-17 东南大学 Embedded type speech emotion recognition method and device
CN103854645A (en) * 2014-03-05 2014-06-11 东南大学 Speech emotion recognition method based on punishment of speaker and independent of speaker
KR20150045967A (en) * 2015-04-09 2015-04-29 이상민 Algorithm that converts the voice data into emotion data
CN105159979A (en) * 2015-08-27 2015-12-16 广东小天才科技有限公司 friend recommendation method and device
CN107305773A (en) * 2016-04-15 2017-10-31 美特科技(苏州)有限公司 Voice mood discrimination method
CN106297826A (en) * 2016-08-18 2017-01-04 竹间智能科技(上海)有限公司 Speech emotional identification system and method
CN106448652A (en) * 2016-09-12 2017-02-22 珠海格力电器股份有限公司 Control method and device of air conditioner
CN107221318A (en) * 2017-05-12 2017-09-29 广东外语外贸大学 Oral English Practice pronunciation methods of marking and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张海龙: "基于语音信号的情感识别技术研究", 《延安大学学报(自然科学版)》 *
曹鹏: "语音情感识别技术的研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
韩文静: "语音情感识别关键技术研究", 《中国优秀博士学位论文全文数据库信息科技辑》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108806667A (en) * 2018-05-29 2018-11-13 重庆大学 The method for synchronously recognizing of voice and mood based on neural network
CN109087670A (en) * 2018-08-30 2018-12-25 西安闻泰电子科技有限公司 Mood analysis method, system, server and storage medium
CN109087670B (en) * 2018-08-30 2021-04-20 西安闻泰电子科技有限公司 Emotion analysis method, system, server and storage medium
CN110970113A (en) * 2018-09-30 2020-04-07 宁波方太厨具有限公司 Intelligent menu recommendation method based on user emotion
CN110970113B (en) * 2018-09-30 2023-04-14 宁波方太厨具有限公司 Intelligent menu recommendation method based on user emotion
CN110110135A (en) * 2019-04-17 2019-08-09 西安极蜂天下信息科技有限公司 Voice characteristics data library update method and device
CN111182409A (en) * 2019-11-26 2020-05-19 广东小天才科技有限公司 Screen control method based on intelligent sound box, intelligent sound box and storage medium
CN111182409B (en) * 2019-11-26 2022-03-25 广东小天才科技有限公司 Screen control method based on intelligent sound box, intelligent sound box and storage medium
CN111583968A (en) * 2020-05-25 2020-08-25 桂林电子科技大学 Speech emotion recognition method and system
CN112002304A (en) * 2020-08-27 2020-11-27 上海添力网络科技有限公司 Speech synthesis method and device
CN112002304B (en) * 2020-08-27 2024-03-29 上海添力网络科技有限公司 Speech synthesis method and device
CN113408503A (en) * 2021-08-19 2021-09-17 明品云(北京)数据科技有限公司 Emotion recognition method and device, computer readable storage medium and equipment

Similar Documents

Publication Publication Date Title
CN108010516A (en) Semantic independent speech emotion feature recognition method and device
CN108564942B (en) Voice emotion recognition method and system based on adjustable sensitivity
CN110400579B (en) Speech emotion recognition based on direction self-attention mechanism and bidirectional long-time and short-time network
Koolagudi et al. IITKGP-SEHSC: Hindi speech corpus for emotion analysis
Iliev et al. Spoken emotion recognition through optimum-path forest classification using glottal features
Demircan et al. Feature extraction from speech data for emotion recognition
Sinith et al. Emotion recognition from audio signals using Support Vector Machine
CN110473566A (en) Audio separation method, device, electronic equipment and computer readable storage medium
Meyer et al. Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition
Bhat et al. Automatic assessment of sentence-level dysarthria intelligibility using BLSTM
CN108597496A (en) Voice generation method and device based on generation type countermeasure network
Yeh et al. Segment-based emotion recognition from continuous Mandarin Chinese speech
CN107610707A (en) A kind of method for recognizing sound-groove and device
CN107731233A (en) A kind of method for recognizing sound-groove based on RNN
CN104867489B (en) A kind of simulation true man read aloud the method and system of pronunciation
CN110827857B (en) Speech emotion recognition method based on spectral features and ELM
Deshmukh et al. Speech based emotion recognition using machine learning
Samantaray et al. A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages
Casale et al. Multistyle classification of speech under stress using feature subset selection based on genetic algorithms
Hasrul et al. Human affective (emotion) behaviour analysis using speech signals: a review
Javidi et al. Speech emotion recognition by using combinations of C5. 0, neural network (NN), and support vector machines (SVM) classification methods
Patni et al. Speech emotion recognition using MFCC, GFCC, chromagram and RMSE features
CN109065073A (en) Speech-emotion recognition method based on depth S VM network model
Besbes et al. Multi-class SVM for stressed speech recognition
Gallardo-Antolín et al. On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180508