CN108010516A - Semantic independent speech emotion feature recognition method and device - Google Patents
Semantic independent speech emotion feature recognition method and device Download PDFInfo
- Publication number
- CN108010516A CN108010516A CN201711258175.2A CN201711258175A CN108010516A CN 108010516 A CN108010516 A CN 108010516A CN 201711258175 A CN201711258175 A CN 201711258175A CN 108010516 A CN108010516 A CN 108010516A
- Authority
- CN
- China
- Prior art keywords
- preset
- mood
- features
- sound spectrum
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000008451 emotion Effects 0.000 title abstract description 11
- 238000001228 spectrum Methods 0.000 claims abstract description 74
- 230000036651 mood Effects 0.000 claims description 109
- 238000000605 extraction Methods 0.000 claims description 9
- 230000033764 rhythmic process Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 7
- 230000008909 emotion recognition Effects 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 6
- 230000001419 dependent effect Effects 0.000 description 5
- 230000002996 emotional effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Child & Adolescent Psychology (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a semantic independent speech emotion feature recognition method and device. The method can directly judge the emotion of the speaker without depending on semantics, determines emotion types corresponding to PCM data according to the matching degree by matching the PCM data with the voice spectrum characteristics, the prosody characteristics and the tone quality characteristics in the emotion database, is simple and convenient in extracting the physical characteristics, is efficient and quick in processing process, can realize accurate recognition of the emotion characteristics by comprehensively matching the voice characteristics of various types, and solves the technical problems that the current voice emotion recognition processing process is complex, the realization difficulty is high, and the method excessively depends on semantics and has long processing time.
Description
Technical field
The present invention relates to audio identification field, more particularly to the voice mood characteristic recognition method and dress that a kind of semanteme is independent
Put.
Background technology
With the deep combination of computer technology and daily life, people be not content with by computer into
Row audio identification only can confirm that speaker and speech recognition, can be more intelligent it is desirable to computer, can identify semanteme,
The information of the higher levels such as mood.
Emotional information is very important a kind of information resources in voice.It is different from speech recognition technology, Emotion identification system
System is more concerned with the tongue of speaker, is the deeper tone and attitude hidden in surface and play, can recognize
For be in voice signal hide order of information.
In fact, during person to person exchanges, same speaker says identical two with different moods, can
To show the entirely different meaning.
But in traditional intelligent sound data analysis, emotional information is regarded as the difference between individual, so as to damage
Very valuable information is lost.
The implementation of voice mood identification technology is the identification such as speech recognition and Expression Recognition and semantics recognition mostly at present
Mode is combined.But a variety of identification methods combine and carry out Emotion identification not only complex disposal process, realize difficulty height,
Need to carry out the processing method such as image and Video processing, and processing time is longer.Therefore, current voice mood is result in know
Other complex disposal process, realizes difficulty height, is overly dependent upon the technical problem of length of semantic and processing time.
The content of the invention
The present invention provides the voice mood characteristic recognition method and device that a kind of semanteme is independent, solves current voice
Emotion identification complex disposal process, realizes difficulty height, is overly dependent upon the technical problem of length of semantic and processing time.
The present invention provides the voice mood characteristic recognition method that a kind of semanteme is independent, including:
S1:Obtain the PCM data in the audio file of wav forms;
S2:PCM data is subjected to speech feature extraction, the sound spectrum, prosodic features and tonequality for obtaining PCM data are special
Sign;
S3:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with various feelings in mood data storehouse
The corresponding preset sound spectrum of thread classification, preset prosodic features and preset tonequality feature carry out pattern match, according to pattern
The mood classification for the result output matching degree maximum matched somebody with somebody.
Preferably, the step S3 is specifically included:
S301:Obtain corresponding with preset sound spectrum, preset prosodic features and preset tonequality feature in mood data storehouse
Preset weights;
S302:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with it is various in mood data storehouse
The corresponding preset sound spectrum of mood classification, preset prosodic features and preset tonequality feature carry out pattern match;
S303:Sound spectrum, prosodic features and tonequality feature in PCM data respectively with it is each in mood data storehouse
The matching degree and mood data of the corresponding preset sound spectrum of kind mood classification, preset prosodic features and preset tonequality feature
Preset sound spectrum in storehouse, preset prosodic features and the corresponding preset various mood classifications of weight computing of preset tonequality feature
Weighted average, using weighted average as matching degree, export the mood classification of matching degree maximum.
Preferably, the sound spectrum specifically includes:MFCC features and GFCC features.
Preferably, the prosodic features specifically includes:Pitch features, Short Term Energy features, ZCR features
With Speed features.
Preferably, the tonequality feature specifically includes:Formants features.
The present invention provides the voice mood specific identification device that a kind of semanteme is independent, including:
Audio acquisition module, the PCM data in audio file for obtaining wav forms;
Characteristic extracting module, for PCM data to be carried out speech feature extraction, obtains sound spectrum, the rhythm of PCM data
Learn feature and tonequality feature;
Match output module, for by the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood
Various mood classifications corresponding preset sound spectrum, preset prosodic features and preset tonequality feature are into row mode in database
Match somebody with somebody, the mood classification of matching degree maximum is exported according to the result of pattern match.
Preferably, the matching output module specifically includes:
Weights submodule, for obtaining and preset sound spectrum, preset prosodic features and preset sound in mood data storehouse
The corresponding preset weights of matter feature;
Matched sub-block, for by the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood number
According to the corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature in storehouse into row mode
Match somebody with somebody;
Output sub-module, for the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood
The matching degree of the corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature in database
Preset weight computing corresponding with preset sound spectrum in mood data storehouse, preset prosodic features and preset tonequality feature is various
The weighted average of mood classification, using weighted average as matching degree, export the mood classification of matching degree maximum.
Preferably, the sound spectrum specifically includes:MFCC features and GFCC features.
Preferably, the prosodic features specifically includes:Pitch features, Short Term Energy features, ZCR features
With Speed features.
Preferably, the tonequality feature specifically includes:Formants features.
As can be seen from the above technical solutions, example of the present invention has the following advantages:
The present invention provides the voice mood characteristic recognition method that a kind of semanteme is independent, including:S1:Obtain wav forms
PCM data in audio file;S2:PCM data is subjected to speech feature extraction, obtains sound spectrum, the metrics of PCM data
Feature and tonequality feature;S3:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood data storehouse
In the corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature carry out pattern match, root
According to the mood classification of the result output matching degree maximum of pattern match.
The present invention can not depend on semantic and directly judge speaker's mood, by PCM data and mood data storehouse
Sound spectrum, prosodic features and tonequality feature matched, the corresponding mood classification of PCM data is determined according to matching degree,
The method for extracting these physical features more succinctly facilitates, and processing procedure is efficiently quick, and the voice of plurality of classes is special
Sign comprehensive matching can realize accurately identifying for emotional characteristics, solve current voice mood identification processing procedure complexity, real
Existing difficulty is high, is overly dependent upon the technical problem of length of semantic and processing time.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, without having to pay creative labor, may be used also
To obtain other attached drawings according to these attached drawings.
Fig. 1 is a kind of one embodiment of the independent voice mood characteristic recognition method of semanteme provided in an embodiment of the present invention
Flow diagram;
Fig. 2 is a kind of another implementation of the independent voice mood characteristic recognition method of semanteme provided in an embodiment of the present invention
The flow diagram of example;
Fig. 3 is a kind of one embodiment of the independent voice mood specific identification device of semanteme provided in an embodiment of the present invention
Structure diagram.
Embodiment
An embodiment of the present invention provides the voice mood characteristic recognition method and device that a kind of semanteme is independent, solves current
Voice mood identification processing procedure it is complicated, realize difficulty height, be overly dependent upon the technical problem of length of semantic and processing time.
Goal of the invention, feature, advantage to enable the present invention is more obvious and understandable, below in conjunction with the present invention
Attached drawing in embodiment, is clearly and completely described the technical solution in the embodiment of the present invention, it is clear that disclosed below
Embodiment be only part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, this area
All other embodiment that those of ordinary skill is obtained without making creative work, belongs to protection of the present invention
Scope.
Referring to Fig. 1, one an embodiment of the present invention provides a kind of independent voice mood characteristic recognition method of semanteme
Embodiment, including:
Step 101:Obtain the PCM data in the audio file of wav forms;
It should be noted that in actual application, it is necessary to first obtain wav forms audio file in PCM data, and
PCM data is introduced directly into memory, so as to the progress of subsequent step.
Step 102:PCM data is subjected to speech feature extraction, obtain the sound spectrum of PCM data, prosodic features and
Tonequality feature;
It should be noted that obtain wav forms audio file in PCM data after, it is also necessary to PCM data is carried out
Speech feature extraction, obtains the sound spectrum, prosodic features and tonequality feature of PCM data;
And for accuracy, it can be extracted from each dimension of various phonetic features, composition one is more than 100 dimensions
Vector, for follow-up pattern match.
Step 103:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood data storehouse
The corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature carry out pattern match, according to
The mood classification of the result output matching degree maximum of pattern match.
It should be noted that the present embodiment passes through to the sound spectrum in PCM data and mood data storehouse, prosodic features
Matched with tonequality feature, the corresponding mood classification of PCM data, the method for extracting these physical features are determined according to matching degree
It is more succinct convenient, and processing procedure is efficiently quick, and the phonetic feature comprehensive matching of plurality of classes can realize mood
Feature accurately identifies, and improves flexibility, convenience, tightness and the recognition efficiency of Emotion identification, can better adapt to intelligence
The demand in hardware future can be changed, the sustainable intelligent hardware progress growing to complexity is complete, rapidly configures, and solves
The voice mood identification processing procedure for having determined current is complicated, realizes difficulty height, is overly dependent upon the skill of length of semantic and processing time
Art problem.
It is above an a kind of implementation of the independent voice mood characteristic recognition method of semanteme provided in an embodiment of the present invention
Example, is below a kind of another embodiment of the independent voice mood characteristic recognition method of semanteme provided in an embodiment of the present invention.
Referring to Fig. 2, an embodiment of the present invention provides a kind of the another of the independent voice mood characteristic recognition method of semanteme
A embodiment, including:
Step 201:Obtain the PCM data in the audio file of wav forms;
Step 202:PCM data is subjected to speech feature extraction, obtain the sound spectrum of PCM data, prosodic features and
Tonequality feature;
Step 203:Obtain and preset sound spectrum, preset prosodic features and preset tonequality feature pair in mood data storehouse
The preset weights answered;
Step 204:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood data storehouse
The corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature carry out pattern match;
Step 205:Sound spectrum, prosodic features and tonequality feature in PCM data respectively with mood data storehouse
In the corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature matching degree and mood
Preset sound spectrum, preset prosodic features and the corresponding preset various mood classes of weight computing of preset tonequality feature in database
Other weighted average, using weighted average as matching degree, export the mood classification of matching degree maximum.
It should be noted that the calculating of matching degree can pass through weighted average, neural network model or clustering algorithm etc.
Mode is calculated, and a kind of embodiment only therein is calculated by weighted average;
The calculation formula of the weighted average of matching degree is as follows:
P=A*a+B*b+C*c
Wherein, P is matching degree, and A is the matching degree of the sound spectrum in PCM data and preset sound spectrum, B PCM
The matching degree of prosodic features and preset prosodic features in data, C are tonequality feature and preset tonequality in PCM data
The matching degree of feature, a are the corresponding preset weights of preset sound spectrum, and b is the corresponding preset weights of preset prosodic features,
C is the corresponding preset weights of preset tonequality feature.
Further, the sound spectrum specifically includes:MFCC features and GFCC features.
It should be noted that MFCC is the abbreviation of Mel frequency cepstral coefficients;
Mel frequencies are extracted based on human hearing characteristic, it is with Hz frequencies into nonlinear correspondence relation, and Mel is frequently
Rate cepstrum coefficient (MFCC) is then the Hz spectrum signatures being calculated using this relation between them;
GFCC is characterized as the aural signature based on Gammatone wave filters.
Further, the prosodic features specifically includes:Pitch features, Short Term Energy features, ZCR are special
Seek peace Speed features.
It should be noted that Pitch features are related with the fundamental frequency (fundamental frequency) of sound, reflection
It is the information of pitch;
Short Term Energy are characterized as short-time energy feature;
ZCR (zero-crossing rate, zero-crossing rate) feature refers to the ratio of the sign change of a signal, such as believes
Number from positive number become negative or reversely, be to tap sound the main feature classify;
Speed is characterized as word speed feature.
Further, the tonequality feature specifically includes:Formants features.
It should be noted that the translator of Chinese of Formants features is formant feature, formant refers to the frequency in sound
Some regions of energy Relatively centralized in spectrum, the formant not still determinant of tonequality, and reflect sound channel (resonant cavity)
Physical features.
The present embodiment by the sound spectrum in PCM data and mood data storehouse, prosodic features and tonequality feature into
Row matching, the corresponding mood classification of PCM data is determined according to matching degree, extracts the more succinct side of method of these physical features
Just, and processing procedure is efficiently quick;
The comprehensive matching of the phonetic feature of plurality of classes is used at the same time, it is possible to achieve emotional characteristics accurately identifies;
The present invention improves flexibility, convenience, tightness and the recognition efficiency of Emotion identification, can better adapt to intelligence
Change the demand in hardware future, the sustainable intelligent hardware progress growing to complexity is complete, rapidly configures;
Solve current voice mood identification processing procedure complexity, realize that difficulty is high and the technology of processing time length is asked
Topic.
It is above a kind of another reality of the independent voice mood characteristic recognition method of semanteme provided in an embodiment of the present invention
Example is applied, is below a kind of one embodiment of the independent voice mood specific identification device of semanteme provided in an embodiment of the present invention.
Referring to Fig. 3, one an embodiment of the present invention provides a kind of independent voice mood specific identification device of semanteme
Embodiment, including:
Audio acquisition module 301, the PCM data in audio file for obtaining wav forms;
Characteristic extracting module 302, for PCM data to be carried out speech feature extraction, obtain PCM data sound spectrum,
Prosodic features and tonequality feature;
Match output module 303, for by the sound spectrum in PCM data, prosodic features and tonequality feature respectively with
The corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature carry out mould in mood data storehouse
Formula matches, and the mood classification of matching degree maximum is exported according to the result of pattern match.
Further, matching output module 303 specifically includes:
Weights submodule 3031, for obtaining and preset sound spectrum in mood data storehouse, preset prosodic features and pre-
Put the corresponding preset weights of tonequality feature;
Matched sub-block 3032, for by the sound spectrum in PCM data, prosodic features and tonequality feature respectively with feelings
Various mood classifications corresponding preset sound spectrum, preset prosodic features and preset tonequality feature are into row mode in thread database
Matching;
Output sub-module 3033, for the sound spectrum in PCM data, prosodic features and tonequality feature respectively with
The matching of the corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature in mood data storehouse
Preset sound spectrum, preset prosodic features and the corresponding preset weight computing of preset tonequality feature in degree and mood data storehouse
The weighted average of various mood classifications, using weighted average as matching degree, export the mood classification of matching degree maximum.
Further, sound spectrum specifically includes:MFCC features and GFCC features.
Further, prosodic features specifically includes:Pitch features, Short Term Energy features, ZCR features and
Speed features.
Further, tonequality feature specifically includes:Formants features.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description
With the specific work process of module, the corresponding process in preceding method embodiment is may be referred to, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can pass through it
Its mode is realized.For example, device embodiment described above is only schematical, for example, the division of the module, only
Only a kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple module or components can be tied
Another system is closed or is desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or discussed
Mutual coupling, direct-coupling or communication connection can be the INDIRECT COUPLING or logical by some interfaces, device or module
Letter connection, can be electrical, machinery or other forms.
The module illustrated as separating component may or may not be physically separate, be shown as module
The component shown may or may not be physical module, you can with positioned at a place, or can also be distributed to multiple
On mixed-media network modules mixed-media.Some or all of module therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each function module in each embodiment of the present invention can be integrated in a processing module, can also
That modules are individually physically present, can also two or more modules be integrated in a module.Above-mentioned integrated mould
Block can both be realized in the form of hardware, can also be realized in the form of software function module.
If the integrated module is realized in the form of software function module and is used as independent production marketing or use
When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially
The part to contribute in other words to the prior art or all or part of the technical solution can be in the form of software products
Embody, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment the method for the present invention
Portion or part steps.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey
The medium of sequence code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to before
Embodiment is stated the present invention is described in detail, it will be understood by those of ordinary skill in the art that:It still can be to preceding
State the technical solution described in each embodiment to modify, or equivalent substitution is carried out to which part technical characteristic;And these
Modification is replaced, and the essence of appropriate technical solution is departed from the spirit and scope of various embodiments of the present invention technical solution.
Claims (10)
- A kind of 1. independent voice mood characteristic recognition method of semanteme, it is characterised in that including:S1:Obtain the PCM data in the audio file of wav forms;S2:PCM data is subjected to speech feature extraction, obtains the sound spectrum, prosodic features and tonequality feature of PCM data;S3:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with various mood classes in mood data storehouse Not corresponding preset sound spectrum, preset prosodic features and preset tonequality feature carry out pattern match, according to pattern match As a result the mood classification of matching degree maximum is exported.
- A kind of 2. independent voice mood characteristic recognition method of semanteme according to claim 1, it is characterised in that the step Rapid S3 is specifically included:S301:Obtain corresponding pre- with preset sound spectrum, preset prosodic features and preset tonequality feature in mood data storehouse Put weights;S302:By the sound spectrum in PCM data, prosodic features and tonequality feature respectively with various moods in mood data storehouse The corresponding preset sound spectrum of classification, preset prosodic features and preset tonequality feature carry out pattern match;S303:Various mood classes in sound spectrum, prosodic features and tonequality feature and mood data storehouse in PCM data It is preset in not corresponding preset sound spectrum, preset prosodic features and the matching degree of preset tonequality feature and mood data storehouse The weighted average of sound spectrum, preset prosodic features and the corresponding preset various mood classifications of weight computing of preset tonequality feature Number, using weighted average as matching degree, exports the mood classification of matching degree maximum.
- A kind of 3. independent voice mood characteristic recognition method of semanteme according to claim 1, it is characterised in that the sound Spectrum signature specifically includes:MFCC features and GFCC features.
- A kind of 4. independent voice mood characteristic recognition method of semanteme according to claim 1, it is characterised in that the rhythm Study of law feature specifically includes:Pitch features, Short Term Energy features, ZCR features and Speed features.
- A kind of 5. independent voice mood characteristic recognition method of semanteme according to claim 1, it is characterised in that the sound Matter feature specifically includes:Formants features.
- A kind of 6. independent voice mood specific identification device of semanteme, it is characterised in that including:Audio acquisition module, the PCM data in audio file for obtaining wav forms;Characteristic extracting module, for PCM data to be carried out speech feature extraction, sound spectrum, the metrics for obtaining PCM data are special Tonequality of seeking peace feature;Match output module, for by the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood data The corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature carry out pattern match in storehouse, The mood classification of matching degree maximum is exported according to the result of pattern match.
- 7. the independent voice mood specific identification device of a kind of semanteme according to claim 6, it is characterised in that described Specifically included with output module:Weights submodule, it is special with preset sound spectrum, preset prosodic features and preset tonequality in mood data storehouse for obtaining Levy corresponding preset weights;Matched sub-block, for will in the sound spectrum in PCM data, prosodic features and tonequality feature and mood data storehouse it is each The corresponding preset sound spectrum of kind mood classification, preset prosodic features and preset tonequality feature carry out pattern match;Output sub-module, for the sound spectrum in PCM data, prosodic features and tonequality feature respectively with mood data The matching degree and feelings of the corresponding preset sound spectrum of various mood classifications, preset prosodic features and preset tonequality feature in storehouse Preset sound spectrum, preset prosodic features and the corresponding preset various moods of weight computing of preset tonequality feature in thread database The weighted average of classification, using weighted average as matching degree, export the mood classification of matching degree maximum.
- A kind of 8. independent voice mood specific identification device of semanteme according to claim 6, it is characterised in that the sound Spectrum signature specifically includes:MFCC features and GFCC features.
- A kind of 9. independent voice mood specific identification device of semanteme according to claim 6, it is characterised in that the rhythm Study of law feature specifically includes:Pitch features, Short Term Energy features, ZCR features and Speed features.
- 10. the independent voice mood specific identification device of a kind of semanteme according to claim 6, it is characterised in that described Tonequality feature specifically includes:Formants features.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711258175.2A CN108010516A (en) | 2017-12-04 | 2017-12-04 | Semantic independent speech emotion feature recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711258175.2A CN108010516A (en) | 2017-12-04 | 2017-12-04 | Semantic independent speech emotion feature recognition method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108010516A true CN108010516A (en) | 2018-05-08 |
Family
ID=62056007
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711258175.2A Pending CN108010516A (en) | 2017-12-04 | 2017-12-04 | Semantic independent speech emotion feature recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108010516A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108806667A (en) * | 2018-05-29 | 2018-11-13 | 重庆大学 | The method for synchronously recognizing of voice and mood based on neural network |
CN109087670A (en) * | 2018-08-30 | 2018-12-25 | 西安闻泰电子科技有限公司 | Mood analysis method, system, server and storage medium |
CN110110135A (en) * | 2019-04-17 | 2019-08-09 | 西安极蜂天下信息科技有限公司 | Voice characteristics data library update method and device |
CN110970113A (en) * | 2018-09-30 | 2020-04-07 | 宁波方太厨具有限公司 | Intelligent menu recommendation method based on user emotion |
CN111182409A (en) * | 2019-11-26 | 2020-05-19 | 广东小天才科技有限公司 | Screen control method based on intelligent sound box, intelligent sound box and storage medium |
CN111583968A (en) * | 2020-05-25 | 2020-08-25 | 桂林电子科技大学 | Speech emotion recognition method and system |
CN112002304A (en) * | 2020-08-27 | 2020-11-27 | 上海添力网络科技有限公司 | Speech synthesis method and device |
CN113408503A (en) * | 2021-08-19 | 2021-09-17 | 明品云(北京)数据科技有限公司 | Emotion recognition method and device, computer readable storage medium and equipment |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101261832A (en) * | 2008-04-21 | 2008-09-10 | 北京航空航天大学 | Extraction and modeling method for Chinese speech sensibility information |
CN102737629A (en) * | 2011-11-11 | 2012-10-17 | 东南大学 | Embedded type speech emotion recognition method and device |
CN103854645A (en) * | 2014-03-05 | 2014-06-11 | 东南大学 | Speech emotion recognition method based on punishment of speaker and independent of speaker |
KR20150045967A (en) * | 2015-04-09 | 2015-04-29 | 이상민 | Algorithm that converts the voice data into emotion data |
CN105159979A (en) * | 2015-08-27 | 2015-12-16 | 广东小天才科技有限公司 | friend recommendation method and device |
CN106297826A (en) * | 2016-08-18 | 2017-01-04 | 竹间智能科技(上海)有限公司 | Speech emotional identification system and method |
CN106448652A (en) * | 2016-09-12 | 2017-02-22 | 珠海格力电器股份有限公司 | Control method and device of air conditioner |
CN107221318A (en) * | 2017-05-12 | 2017-09-29 | 广东外语外贸大学 | Oral English Practice pronunciation methods of marking and system |
CN107305773A (en) * | 2016-04-15 | 2017-10-31 | 美特科技(苏州)有限公司 | Voice mood discrimination method |
-
2017
- 2017-12-04 CN CN201711258175.2A patent/CN108010516A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101261832A (en) * | 2008-04-21 | 2008-09-10 | 北京航空航天大学 | Extraction and modeling method for Chinese speech sensibility information |
CN102737629A (en) * | 2011-11-11 | 2012-10-17 | 东南大学 | Embedded type speech emotion recognition method and device |
CN103854645A (en) * | 2014-03-05 | 2014-06-11 | 东南大学 | Speech emotion recognition method based on punishment of speaker and independent of speaker |
KR20150045967A (en) * | 2015-04-09 | 2015-04-29 | 이상민 | Algorithm that converts the voice data into emotion data |
CN105159979A (en) * | 2015-08-27 | 2015-12-16 | 广东小天才科技有限公司 | friend recommendation method and device |
CN107305773A (en) * | 2016-04-15 | 2017-10-31 | 美特科技(苏州)有限公司 | Voice mood discrimination method |
CN106297826A (en) * | 2016-08-18 | 2017-01-04 | 竹间智能科技(上海)有限公司 | Speech emotional identification system and method |
CN106448652A (en) * | 2016-09-12 | 2017-02-22 | 珠海格力电器股份有限公司 | Control method and device of air conditioner |
CN107221318A (en) * | 2017-05-12 | 2017-09-29 | 广东外语外贸大学 | Oral English Practice pronunciation methods of marking and system |
Non-Patent Citations (3)
Title |
---|
张海龙: "基于语音信号的情感识别技术研究", 《延安大学学报(自然科学版)》 * |
曹鹏: "语音情感识别技术的研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
韩文静: "语音情感识别关键技术研究", 《中国优秀博士学位论文全文数据库信息科技辑》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108806667A (en) * | 2018-05-29 | 2018-11-13 | 重庆大学 | The method for synchronously recognizing of voice and mood based on neural network |
CN109087670A (en) * | 2018-08-30 | 2018-12-25 | 西安闻泰电子科技有限公司 | Mood analysis method, system, server and storage medium |
CN109087670B (en) * | 2018-08-30 | 2021-04-20 | 西安闻泰电子科技有限公司 | Emotion analysis method, system, server and storage medium |
CN110970113A (en) * | 2018-09-30 | 2020-04-07 | 宁波方太厨具有限公司 | Intelligent menu recommendation method based on user emotion |
CN110970113B (en) * | 2018-09-30 | 2023-04-14 | 宁波方太厨具有限公司 | Intelligent menu recommendation method based on user emotion |
CN110110135A (en) * | 2019-04-17 | 2019-08-09 | 西安极蜂天下信息科技有限公司 | Voice characteristics data library update method and device |
CN111182409A (en) * | 2019-11-26 | 2020-05-19 | 广东小天才科技有限公司 | Screen control method based on intelligent sound box, intelligent sound box and storage medium |
CN111182409B (en) * | 2019-11-26 | 2022-03-25 | 广东小天才科技有限公司 | Screen control method based on intelligent sound box, intelligent sound box and storage medium |
CN111583968A (en) * | 2020-05-25 | 2020-08-25 | 桂林电子科技大学 | Speech emotion recognition method and system |
CN112002304A (en) * | 2020-08-27 | 2020-11-27 | 上海添力网络科技有限公司 | Speech synthesis method and device |
CN112002304B (en) * | 2020-08-27 | 2024-03-29 | 上海添力网络科技有限公司 | Speech synthesis method and device |
CN113408503A (en) * | 2021-08-19 | 2021-09-17 | 明品云(北京)数据科技有限公司 | Emotion recognition method and device, computer readable storage medium and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108010516A (en) | Semantic independent speech emotion feature recognition method and device | |
CN108564942B (en) | Voice emotion recognition method and system based on adjustable sensitivity | |
CN110400579B (en) | Speech emotion recognition based on direction self-attention mechanism and bidirectional long-time and short-time network | |
Koolagudi et al. | IITKGP-SEHSC: Hindi speech corpus for emotion analysis | |
Iliev et al. | Spoken emotion recognition through optimum-path forest classification using glottal features | |
Demircan et al. | Feature extraction from speech data for emotion recognition | |
Sinith et al. | Emotion recognition from audio signals using Support Vector Machine | |
CN110473566A (en) | Audio separation method, device, electronic equipment and computer readable storage medium | |
Meyer et al. | Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition | |
Bhat et al. | Automatic assessment of sentence-level dysarthria intelligibility using BLSTM | |
CN108597496A (en) | Voice generation method and device based on generation type countermeasure network | |
Yeh et al. | Segment-based emotion recognition from continuous Mandarin Chinese speech | |
CN107610707A (en) | A kind of method for recognizing sound-groove and device | |
CN107731233A (en) | A kind of method for recognizing sound-groove based on RNN | |
CN104867489B (en) | A kind of simulation true man read aloud the method and system of pronunciation | |
CN110827857B (en) | Speech emotion recognition method based on spectral features and ELM | |
Deshmukh et al. | Speech based emotion recognition using machine learning | |
Samantaray et al. | A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages | |
Casale et al. | Multistyle classification of speech under stress using feature subset selection based on genetic algorithms | |
Hasrul et al. | Human affective (emotion) behaviour analysis using speech signals: a review | |
Javidi et al. | Speech emotion recognition by using combinations of C5. 0, neural network (NN), and support vector machines (SVM) classification methods | |
Patni et al. | Speech emotion recognition using MFCC, GFCC, chromagram and RMSE features | |
CN109065073A (en) | Speech-emotion recognition method based on depth S VM network model | |
Besbes et al. | Multi-class SVM for stressed speech recognition | |
Gallardo-Antolín et al. | On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180508 |