CN113409821A - Method for recognizing unknown emotional state of voice signal - Google Patents
Method for recognizing unknown emotional state of voice signal Download PDFInfo
- Publication number
- CN113409821A CN113409821A CN202110584445.9A CN202110584445A CN113409821A CN 113409821 A CN113409821 A CN 113409821A CN 202110584445 A CN202110584445 A CN 202110584445A CN 113409821 A CN113409821 A CN 113409821A
- Authority
- CN
- China
- Prior art keywords
- emotion
- unknown
- category
- sample
- emotion category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000002996 emotional effect Effects 0.000 title claims abstract description 25
- 230000008451 emotion Effects 0.000 claims abstract description 126
- 238000012549 training Methods 0.000 claims abstract description 33
- 238000012360 testing method Methods 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 20
- 239000013598 vector Substances 0.000 claims description 18
- 238000005457 optimization Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 238000009827 uniform distribution Methods 0.000 claims description 2
- 239000012634 fragment Substances 0.000 claims 1
- 230000008909 emotion recognition Effects 0.000 abstract description 7
- 230000015572 biosynthetic process Effects 0.000 abstract description 5
- 238000003786 synthesis reaction Methods 0.000 abstract description 5
- 238000000605 extraction Methods 0.000 abstract description 2
- 238000002474 experimental method Methods 0.000 description 11
- 230000000052 comparative effect Effects 0.000 description 6
- 238000002372 labelling Methods 0.000 description 3
- 208000019901 Anxiety disease Diseases 0.000 description 2
- 230000036506 anxiety Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 241000764238 Isis Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000002902 bimodal effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000009885 systemic effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- General Health & Medical Sciences (AREA)
- Child & Adolescent Psychology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a speech signal unknown emotional state identification method, which is characterized in that a speech segment signal sample with unknown emotional state information is subjected to secondary language feature extraction, semantic embedding of an emotional state label is combined, and classification judgment is carried out through a method of a synthesis classifier. In the training stage, extracting sublingual features from a known emotion category training speech segment sample, processing according to a known emotion category name to obtain a known emotion category prototype weight, and solving by combining with a known emotion category training speech segment sample label to obtain an optimal virtual classifier; and in the testing stage, an optimal virtual classifier is used, and the unknown emotion classification judgment is carried out on the testing sample by combining the secondary language features of the unknown emotion classification testing corpus sample and the unknown emotion classification prototype weight. The invention provides a method for recognizing unknown emotion voice signals based on semantic embedding in the aspect of voice signal emotion recognition, and can effectively distinguish unknown emotion from voice signals.
Description
Technical Field
The invention belongs to the field of speech signal emotion recognition, and particularly relates to a method for recognizing unknown emotion states of speech signals.
Background
Speech Emotion Recognition (SER) has a wide application background in the fields of human-computer interaction and the like, and can judge the subjective intention of a speaker to be conveyed in a Speech segment and deeper Emotion expression of the speaker by researching Emotion information in a Speech signal. In addition, it is possible to perform speech synthesis of emotion expression for a speech signal by analyzing emotion information in speech. In the aspect of psychological disease diagnosis, the preliminary screening of patients with depression and the like can be realized through related technologies, and a basis is provided for further diagnosis and treatment; in terms of virtual reality, the robot can be enabled to have more powerful emotion analysis and expression capability.
The prior art scheme has the problem that the unknown emotion state in the voice signal cannot be effectively recognized, and in a large amount of prior work related to an SER, the emotion state never seen in a training sample cannot be recognized, so that the unknown emotion type cannot be judged and recognized on the voice signal sample. For example, in human-computer interaction, a machine may decide upon receiving an utterance whether the speaker is a complex emotion of credibility, friendliness, or violence. However, the machine will not be able to accomplish this task without teaching it how to estimate the complex emotions or intentions of these speakers.
In the prior art, for example, the following publications are published: xu X, Deng J, Cummins N, et al, Autonomous emission Learning in Speech, AView of zero-Shot Speech emission Recognition [ C ]// Proc. INTERSPEECH 2019,2019: 949-. In the disclosed recognition scheme, emotion space dimension values corresponding to samples in training of known emotion samples need to be labeled, which may bring high manual workload and labeling cost, and increase complexity of a calculation process.
Disclosure of Invention
The invention provides a method for recognizing unknown emotion states of voice signals, which aims to solve the problems that unknown emotion in the voice signals cannot be recognized in the prior art and the problem that in the existing scheme for recognizing the unknown emotion of the voice signals, the known emotion samples need to be labeled in each dimension of each sample emotion space.
In order to solve the technical problems, the invention adopts the following technical scheme:
a speech signal unknown emotion state recognition method comprises the steps of firstly establishing a speech emotion database which comprises a plurality of speech segment samples, wherein each sample has an emotion category label corresponding to the sample; dividing a voice emotion database into a training set consisting of known emotion type samples and a test set consisting of unknown emotion type samples; each sample has a known and unique emotion category label. The method comprises the following steps of:
step one, extracting and generating nFOriginal features of dimension: processing each language segment sample in the training sample set and the test sample set respectively, extracting corresponding secondary language features as original features, and regularizing the original features to obtain N(S)Regularization features corresponding to individual training samplesAnd normalized feature x corresponding to any one test sample(U);
Secondly, carrying out semantic embedding mapping on the known emotion category names to generate semantic embedding prototypes of the known emotion categoriesWherein c is(S)Number of known emotion classes, nAEmbedding dimensions for semantics of emotion category names;
step three, a prototype matrix A of the known emotion category(S)And a virtual class prototype matrixCalculating to obtain a prototype weight matrix of the known emotion category
Step four, useKnown emotion category corpus sample sublingual feature X(S)And emotion category label of corresponding sampleSemantic embedding of known emotion classes into prototypes A(S)The emotion category prototype weight matrix S is used for aligning the linear virtual classifier according to the optimization target fOptimizing:
Step five, testing: sample feature x is tested for each unknown emotion class corpus(U)And D, performing classification judgment of unknown emotion classes on each test sample by using the classifier obtained in the step four.
Further, the normalization processing method in the step one is as follows:
the characteristic column vector of any sample in all the language segment samples before normalization is x(0)In which N is(S)The training sample set composed of the characteristic column vectors of the training samples with known emotion classes isIs provided withIs composed ofThe jth feature element of (1);
the feature column vector x for any sample(0)Feature j corresponds to an elementThe formula for regularization is:
whereinRepresents X(0)The largest element in the j-th row,represents X(0)The smallest element in row j; x is the number of·jIs composed ofRegularization of the results;
calculating all elements in any sample according to the formula (2) to obtain a normalized characteristic column vector x ═ x of any training or testing sample·1,x·2,...,x·n]TWherein, the normalized feature vectors of the speech segment signal samples belonging to the known emotion classification training sample set form the normalized feature vector set of the training sample
Further, the semantic embedding mapping in the second step can be implemented by using a word vector pre-training model for the emotion category names:
obtaining n corresponding to the category by inputting emotion category name into pre-trainingASemantic embedding vector of emotion category of dimension, and expressing the semantic embedding vector of known emotion category corresponding to the training set asC to be predicted for a set of test samples(U)An unknown emotion class whose semantic embedded vector is expressed as
Further, the prototype weight matrix of the known emotion classification in step threeMiddle virtual class cPKnown emotion category cSThe corresponding elements are:
Further, the virtual category prototype matrix B in step three can be constructed in the following two ways:
(1) randomly generating n according to uniform distribution between 0 and 1A×c(P) A matrix of dimensions;
(2) semantic embedding matrix A set as known emotion category(S)。
Further, the optimal virtual classifier is obtained in the fourth stepThe optimization target of (1) is as follows:
wherein, the regularization term weight tau is more than 0, and the known emotion classification linear classifier:
loss function:
wherein the content of the first and second substances,for the c-th known emotion category label information whenWhen the temperature of the water is higher than the set temperature,otherwise
Further, the classification decision on the unknown emotion category test sample in the fifth step includes the following steps executed in sequence:
(1) embedding prototypes according to calculated unknown emotion category semanticsComputing emotion category prototype weight vectors for category m
(2) Side language feature x corresponding to test sample(U)Predicting the unknown emotion category to which the sample belongsCorresponding reference numerals
And obtaining the judgment result of the unknown emotion classification of the test sample.
Has the advantages that: as shown in fig. 1, the method for recognizing an unknown emotional state of a speech signal according to the present invention extracts a sublingual feature of a speech segment signal sample with unknown emotional state information, and performs classification and decision by a method of synthesizing a classifier in combination with semantic embedding of an emotional state label. Specifically, in the training stage, the sublingual features are extracted from a known emotion category training speech segment sample, meanwhile, the known emotion category prototype weight is obtained through processing according to the known emotion category name, and then the optimal virtual classifier is obtained through solving by combining with the known emotion category training speech segment sample label; and in the testing stage, an optimal virtual classifier is used, and the unknown emotion classification judgment is carried out on the testing sample by combining the secondary language features of the unknown emotion classification testing corpus sample and the unknown emotion classification prototype weight.
The existing speech emotion recognition method has two problems: for a general SER method, the method can be only used for identifying the emotion class providing a sample in a training set, and the identification processing of an unknown class has problems; although solutions have been presented in the work on unknown emotion recognition of speech signals, successful implementation of the solutions still relies on adequate labeling of the dimensions of the emotion space of the training sample.
Therefore, the method for recognizing the unknown emotion of the voice signal, which is disclosed by the invention, adopts the method for recognizing the unknown emotion of the voice signal based on the synthesis classifier and semantic embedding, and can provide help for the cognition of the unknown emotion in the voice signal on the basis of not increasing the cost of manual labeling, so that the unknown emotion in the voice can be effectively recognized.
Experiments prove that the method for recognizing the unknown emotion voice signal is provided based on semantic embedding in the aspect of voice signal emotion recognition, and the unknown emotion can be effectively distinguished aiming at the voice signal.
Drawings
Fig. 1 is a flow chart of a method for recognizing an unknown emotional state of a speech signal according to the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and the detailed description.
As shown in FIG. 1, the method of the present invention firstly performs secondary language feature extraction on speech segment signal samples with unknown emotional state information, and performs classification judgment by a method of a synthesis classifier in combination with semantic embedding of emotional state labels. In the training stage, extracting sublingual features from a known emotion category training speech segment sample, processing according to a known emotion category name to obtain a known emotion category prototype weight, and solving by combining with a known emotion category training speech segment sample label to obtain an optimal virtual classifier; and in the testing stage, an optimal virtual classifier is used, and the unknown emotion classification judgment is carried out on the testing sample by combining the secondary language features of the unknown emotion classification testing corpus sample and the unknown emotion classification prototype weight.
In the following, the method of the present invention is compared with the existing zero sample learning method by an experimental method for identifying the Unweighted Accuracy (UA) recognition rate.
The effectiveness verification of the method of the embodiment of the invention is carried out by adopting the voice signal part in a GEMEP (GEneva Multimodal enumeration Portrayals) database.
The bimodal database GEMEP includes a set of speech samples and its corresponding set of video samples GEMEP-FERA. The database contains 18 emotion categories, namely, opinion, amument, anxiety, collenger, contentmp, despain, distust, animation, hot anger, interest, systemic rear, pleasure, pride, relief, sadness, name, surrise, tenderness. The database was recorded in french for 1260 samples, which were assigned to 10 speakers, including 5 females. The experiment uses 12 types of emotion categories, specifically, amusement, anxiety, cold anger, despain, elation, hot anger, interest, panic fear, pleasure, pride, relief and sadness, wherein the average number of each type is 90 samples, and 1080 samples are total; all samples of every two types of emotions in the data set are used as unknown emotion testing phrase sample sets, other emotion category samples are used as known emotion training phrase sample sets, and 66 different sample type combination modes exist, so that the experiment is trained and tested for 66 times.
The original secondary language features of the experiment adopt a Japanese interior tile reduced Acoustic Parameter Set (eGeMAPS) feature Set, and the original feature dimension nF88, the features are derived from 25 Low-Level Descriptors (LLDs) combined with High-Level statistical functions (HSFs), and temporal features and equivalent sound levels, and are extracted by using openSMILE 2.3 tool box in the experiment.
Semantic embedding prototypes for emotion classes using nAThese prototypes were derived based on pre-trained models of word2vec, GloVe, and fastText, for a 300-dimensional english word semantic vector. The semantic embedding model in the experiment was a Google pre-trained word2vec model trained on a Google News corpus containing 300 ten thousand words; the GloVe model in the experiment uses Wikipedia 2014 and Gigaword 5 as training data, and comprises 40 ten thousand words; fastText uses 200 ten thousand word vectors trained on web-based crawl data and 100 thousand word vectors trained on Wikipedia 2017, UMBC webbase and stat.
In the experiment, in order to show the effect of the method of the present invention, the methods for comparison were: SAE (semantic AutoEncoder), DEM (Deep Embedding Model; DEM), LatEm (Laten Embedding), ESZSL (Embarrasingly Simple Zero-Shot Learning), and EXEM (EXEMPLAR Synthesis).
The speech signal emotion state recognition model comprises two models which are respectively as follows: SYNC (origin) (example 1, using a prototype of a known emotion class as a prototype matrix B of a virtual class in step three, i.e. B ═ A(S)) SYNC (rand) (example 2, using the prototype of the known emotion category as the prototype matrix B of the virtual category in step three, wherein the number of the virtual categories c(P)=1000)。
In the experiment, the training set is subjected to optimal parameter selection by adopting emotion class independent five-fold cross validation, and the experiment is repeated for 10 times for the random generation of the virtual class prototype in the third step. For the embodiment of the invention, the range of the selected parameters is as follows: regularizationTerm weight τ ═ {2 ═ 2-24,2-23,…,2-9}, distance weight σ2={2-5,2-4,…,25}。
The average result of the optimal UA for all semantic embedded prototypes on the GEMEP database is shown in table 1:
TABLE 1
By means of | UA | |
Comparative example 1 | SAE | 57.2% |
Comparative example 2 | DEM | 59.3% |
Comparative example 3 | LatEm | 64.2% |
Comparative example 4 | ESZSL | 64.6% |
Comparative example 5 | EXEM | 62.3% |
Practice ofExample 1 | SYNC(origin) | 64.4% |
Example 2 | SYNC(rand) | 65.0%±0.9% |
As can be seen from table 1, the SYNC methods in the present embodiment 1 and embodiment 2 can achieve better UA performance for recognition of unknown emotion of voice signal than other related comparative example methods.
Further, we present three results of example 2 using SYNC (rand) method for best performance in 10 replicates, compared to the results of SYNC (origin), as shown in Table 2. As can be seen from table 2, the randomly selected virtual class prototype matrix can enable the method of the present invention to achieve better performance.
TABLE 2
Method | UA |
SYNC(origin) | 64.4% |
SYNC(rand)best | 66.6% |
SYNC(rand)2ndbest | 66.2% |
SYNC(rand)3rdbest | 65.7% |
In summary, the SYNC method adopted in this embodiment can provide better performance on the recognition problem of unknown emotion of a voice signal by learning the discrimination information between known emotion categories for the secondary language features used in emotion recognition of the voice signal.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.
Claims (10)
1. A method for recognizing an unknown emotional state of a speech signal, the method comprising: extracting sublingual features from the known emotion category training speech segment samples, processing according to the known emotion category names to obtain the prototype weight of the known emotion category, and solving by combining with the known emotion category training speech segment sample labels to obtain an optimal virtual classifier; and extracting secondary language features of the speech segment signal samples with unknown emotional state information, and performing unknown emotional category judgment on the test samples by using the optimal virtual classifier in combination with semantic embedding of emotional state labels and unknown emotional category prototype weights.
2. The method for recognizing an unknown emotional state of a speech signal according to claim 1, wherein the method specifically comprises the following steps:
step one, each language segment sample in the training sample set and the testing sample set is processed respectively, corresponding secondary language features are extracted and used as original features, and n is extracted and generatedFMaintaining the original characteristics, and regularizing the original characteristics to obtain N(S)Regularization features corresponding to individual training samplesAnd normalized feature x corresponding to any one test sample(U);
Step two, inputting emotion category names into pre-training to obtain n corresponding to the categoryASemantic embedding vector of emotion category of dimension, and expressing the semantic embedding vector of known emotion category corresponding to the training set asWherein c is(S)Number of known emotion classes, nAEmbedding dimensions for semantics of emotion category names; c to be predicted for a set of test samples(U)An unknown emotion class whose semantic embedded vector is expressed as
Step three, a prototype matrix A of the known emotion category(S)And a virtual class prototype matrixCalculating to obtain a prototype weight matrix of the known emotion category
Step four, using the sublingual feature X of the known emotion category speech fragment sample(S)And emotion category label of corresponding sampleSemantic embedding of known emotion classes into prototypes A(S)The emotion category prototype weight matrix S is used for aligning the linear virtual classifier according to the optimization target fOptimizing:
3. The method of claim 2, wherein the prototype weight matrix of the known emotion category is used to identify the unknown emotional state of the speech signalVirtual class c in (1)PKnown emotion category cSThe corresponding elements are:
4. The method of claim 3, wherein the distance weight σ is used to identify the unknown emotional state of the speech signal2={2-5,2-4,…,25}。
5. The method for recognizing the unknown emotional state of the speech signal according to claim 2, wherein the virtual class prototype matrix B in step three is specifically constructed in a manner that: randomly generating n according to uniform distribution between 0 and 1A×c(P) A matrix of dimensions.
6. The method for recognizing the unknown emotional state of the speech signal according to claim 2, wherein the virtual class prototype matrix B in step three is specifically constructed in a manner that: semantic embedding matrix A set as known emotion category(S)。
7. The method for recognizing the unknown emotional state of the speech signal according to claim 2, wherein the optimal virtual classifier is obtained in the fourth stepThe optimization target of (1) is as follows:
wherein the regularization term weight tau is more than 0, and the known emotion classification linear classifierThe loss function is:
8. The method according to claim 7, wherein the regularization term weight τ ═ 2-24,2-23,…,2-9}。
9. The method for recognizing the unknown emotional state of the speech signal according to claim 2, wherein in the fifth step, the unknown emotional category judgment is performed on the test sample by using the optimal virtual classifier, specifically: embedding prototypes according to calculated unknown emotion category semanticsCalculating emotion category prototype weight vector for category m:
side language feature x corresponding to test sample(U)Predicting the unknown emotion category to which the sample belongsCorresponding reference numerals:
and obtaining the judgment result of the unknown emotion classification of the test sample.
10. A voice message according to claim 2The unknown emotional state identification method is characterized in that the original features adopt an EGeMAPS (EGeMAPS) and an original feature dimension nF=88。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110584445.9A CN113409821B (en) | 2021-05-27 | 2021-05-27 | Method for recognizing unknown emotional state of voice signal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110584445.9A CN113409821B (en) | 2021-05-27 | 2021-05-27 | Method for recognizing unknown emotional state of voice signal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113409821A true CN113409821A (en) | 2021-09-17 |
CN113409821B CN113409821B (en) | 2023-04-18 |
Family
ID=77674667
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110584445.9A Active CN113409821B (en) | 2021-05-27 | 2021-05-27 | Method for recognizing unknown emotional state of voice signal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113409821B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110141258A1 (en) * | 2007-02-16 | 2011-06-16 | Industrial Technology Research Institute | Emotion recognition method and system thereof |
CN103854645A (en) * | 2014-03-05 | 2014-06-11 | 东南大学 | Speech emotion recognition method based on punishment of speaker and independent of speaker |
CN107886942A (en) * | 2017-10-31 | 2018-04-06 | 东南大学 | A kind of voice signal emotion identification method returned based on local punishment random spectrum |
CN108615052A (en) * | 2018-04-13 | 2018-10-02 | 南京邮电大学 | A kind of image-recognizing method without under similar training sample situation |
CN109933664A (en) * | 2019-03-12 | 2019-06-25 | 中南大学 | A kind of fine granularity mood analysis improved method based on emotion word insertion |
CN111324734A (en) * | 2020-02-17 | 2020-06-23 | 昆明理工大学 | Case microblog comment emotion classification method integrating emotion knowledge |
US20200335086A1 (en) * | 2019-04-19 | 2020-10-22 | Behavioral Signal Technologies, Inc. | Speech data augmentation |
CN112466284A (en) * | 2020-11-25 | 2021-03-09 | 南京邮电大学 | Mask voice identification method |
-
2021
- 2021-05-27 CN CN202110584445.9A patent/CN113409821B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110141258A1 (en) * | 2007-02-16 | 2011-06-16 | Industrial Technology Research Institute | Emotion recognition method and system thereof |
CN103854645A (en) * | 2014-03-05 | 2014-06-11 | 东南大学 | Speech emotion recognition method based on punishment of speaker and independent of speaker |
CN107886942A (en) * | 2017-10-31 | 2018-04-06 | 东南大学 | A kind of voice signal emotion identification method returned based on local punishment random spectrum |
CN108615052A (en) * | 2018-04-13 | 2018-10-02 | 南京邮电大学 | A kind of image-recognizing method without under similar training sample situation |
CN109933664A (en) * | 2019-03-12 | 2019-06-25 | 中南大学 | A kind of fine granularity mood analysis improved method based on emotion word insertion |
US20200335086A1 (en) * | 2019-04-19 | 2020-10-22 | Behavioral Signal Technologies, Inc. | Speech data augmentation |
CN111324734A (en) * | 2020-02-17 | 2020-06-23 | 昆明理工大学 | Case microblog comment emotion classification method integrating emotion knowledge |
CN112466284A (en) * | 2020-11-25 | 2021-03-09 | 南京邮电大学 | Mask voice identification method |
Non-Patent Citations (1)
Title |
---|
赵小蕾 等: "融合功能性副语言的语音情感识别方法", 《计算机科学与探索》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113409821B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zehra et al. | Cross corpus multi-lingual speech emotion recognition using ensemble learning | |
US20200335086A1 (en) | Speech data augmentation | |
Luo et al. | Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network. | |
Anagnostopoulos et al. | Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 | |
CN107943784B (en) | Relationship extraction method based on generation of countermeasure network | |
Kumar et al. | Multilayer Neural Network Based Speech Emotion Recognition for Smart Assistance. | |
Sultana et al. | Bangla speech emotion recognition and cross-lingual study using deep CNN and BLSTM networks | |
Mamyrbayev et al. | Voice identification using classification algorithms | |
Albadr et al. | Extreme learning machine for automatic language identification utilizing emotion speech data | |
Chen et al. | Music emotion recognition using deep Gaussian process | |
KR20200105057A (en) | Apparatus and method for extracting inquiry features for alalysis of inquery sentence | |
CN111563373A (en) | Attribute-level emotion classification method for focused attribute-related text | |
CN110992988A (en) | Speech emotion recognition method and device based on domain confrontation | |
CN112397092A (en) | Unsupervised cross-library speech emotion recognition method based on field adaptive subspace | |
CN112466284B (en) | Mask voice identification method | |
CN112489689A (en) | Cross-database voice emotion recognition method and device based on multi-scale difference confrontation | |
CN113409821B (en) | Method for recognizing unknown emotional state of voice signal | |
Sasidharan Rajeswari et al. | Speech Emotion Recognition Using Machine Learning Techniques | |
CN113626553B (en) | Cascade binary Chinese entity relation extraction method based on pre-training model | |
Deshmukh et al. | Application of probabilistic neural network for speech emotion recognition | |
Saputri et al. | Identifying Indonesian local languages on spontaneous speech data | |
Durrani et al. | Transfer learning based speech affect recognition in Urdu | |
CN107886942B (en) | Voice signal emotion recognition method based on local punishment random spectral regression | |
Das et al. | Towards interpretable and transferable speech emotion recognition: Latent representation based analysis of features, methods and corpora | |
Vasuki | Design of Hierarchical Classifier to Improve Speech Emotion Recognition. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |