CN113409821A - Method for recognizing unknown emotional state of voice signal - Google Patents

Method for recognizing unknown emotional state of voice signal Download PDF

Info

Publication number
CN113409821A
CN113409821A CN202110584445.9A CN202110584445A CN113409821A CN 113409821 A CN113409821 A CN 113409821A CN 202110584445 A CN202110584445 A CN 202110584445A CN 113409821 A CN113409821 A CN 113409821A
Authority
CN
China
Prior art keywords
emotion
unknown
category
sample
emotion category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110584445.9A
Other languages
Chinese (zh)
Other versions
CN113409821B (en
Inventor
徐新洲
顾正
吕震
刘硕
吴尘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110584445.9A priority Critical patent/CN113409821B/en
Publication of CN113409821A publication Critical patent/CN113409821A/en
Application granted granted Critical
Publication of CN113409821B publication Critical patent/CN113409821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Child & Adolescent Psychology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a speech signal unknown emotional state identification method, which is characterized in that a speech segment signal sample with unknown emotional state information is subjected to secondary language feature extraction, semantic embedding of an emotional state label is combined, and classification judgment is carried out through a method of a synthesis classifier. In the training stage, extracting sublingual features from a known emotion category training speech segment sample, processing according to a known emotion category name to obtain a known emotion category prototype weight, and solving by combining with a known emotion category training speech segment sample label to obtain an optimal virtual classifier; and in the testing stage, an optimal virtual classifier is used, and the unknown emotion classification judgment is carried out on the testing sample by combining the secondary language features of the unknown emotion classification testing corpus sample and the unknown emotion classification prototype weight. The invention provides a method for recognizing unknown emotion voice signals based on semantic embedding in the aspect of voice signal emotion recognition, and can effectively distinguish unknown emotion from voice signals.

Description

Method for recognizing unknown emotional state of voice signal
Technical Field
The invention belongs to the field of speech signal emotion recognition, and particularly relates to a method for recognizing unknown emotion states of speech signals.
Background
Speech Emotion Recognition (SER) has a wide application background in the fields of human-computer interaction and the like, and can judge the subjective intention of a speaker to be conveyed in a Speech segment and deeper Emotion expression of the speaker by researching Emotion information in a Speech signal. In addition, it is possible to perform speech synthesis of emotion expression for a speech signal by analyzing emotion information in speech. In the aspect of psychological disease diagnosis, the preliminary screening of patients with depression and the like can be realized through related technologies, and a basis is provided for further diagnosis and treatment; in terms of virtual reality, the robot can be enabled to have more powerful emotion analysis and expression capability.
The prior art scheme has the problem that the unknown emotion state in the voice signal cannot be effectively recognized, and in a large amount of prior work related to an SER, the emotion state never seen in a training sample cannot be recognized, so that the unknown emotion type cannot be judged and recognized on the voice signal sample. For example, in human-computer interaction, a machine may decide upon receiving an utterance whether the speaker is a complex emotion of credibility, friendliness, or violence. However, the machine will not be able to accomplish this task without teaching it how to estimate the complex emotions or intentions of these speakers.
In the prior art, for example, the following publications are published: xu X, Deng J, Cummins N, et al, Autonomous emission Learning in Speech, AView of zero-Shot Speech emission Recognition [ C ]// Proc. INTERSPEECH 2019,2019: 949-. In the disclosed recognition scheme, emotion space dimension values corresponding to samples in training of known emotion samples need to be labeled, which may bring high manual workload and labeling cost, and increase complexity of a calculation process.
Disclosure of Invention
The invention provides a method for recognizing unknown emotion states of voice signals, which aims to solve the problems that unknown emotion in the voice signals cannot be recognized in the prior art and the problem that in the existing scheme for recognizing the unknown emotion of the voice signals, the known emotion samples need to be labeled in each dimension of each sample emotion space.
In order to solve the technical problems, the invention adopts the following technical scheme:
a speech signal unknown emotion state recognition method comprises the steps of firstly establishing a speech emotion database which comprises a plurality of speech segment samples, wherein each sample has an emotion category label corresponding to the sample; dividing a voice emotion database into a training set consisting of known emotion type samples and a test set consisting of unknown emotion type samples; each sample has a known and unique emotion category label. The method comprises the following steps of:
step one, extracting and generating nFOriginal features of dimension: processing each language segment sample in the training sample set and the test sample set respectively, extracting corresponding secondary language features as original features, and regularizing the original features to obtain N(S)Regularization features corresponding to individual training samples
Figure BDA0003087629330000021
And normalized feature x corresponding to any one test sample(U)
Secondly, carrying out semantic embedding mapping on the known emotion category names to generate semantic embedding prototypes of the known emotion categories
Figure BDA0003087629330000022
Wherein c is(S)Number of known emotion classes, nAEmbedding dimensions for semantics of emotion category names;
step three, a prototype matrix A of the known emotion category(S)And a virtual class prototype matrix
Figure BDA0003087629330000023
Calculating to obtain a prototype weight matrix of the known emotion category
Figure BDA0003087629330000024
Step four, useKnown emotion category corpus sample sublingual feature X(S)And emotion category label of corresponding sample
Figure BDA0003087629330000025
Semantic embedding of known emotion classes into prototypes A(S)The emotion category prototype weight matrix S is used for aligning the linear virtual classifier according to the optimization target f
Figure BDA0003087629330000026
Optimizing:
Figure BDA0003087629330000027
finding an optimal virtual classifier
Figure BDA0003087629330000028
Step five, testing: sample feature x is tested for each unknown emotion class corpus(U)And D, performing classification judgment of unknown emotion classes on each test sample by using the classifier obtained in the step four.
Further, the normalization processing method in the step one is as follows:
the characteristic column vector of any sample in all the language segment samples before normalization is x(0)In which N is(S)The training sample set composed of the characteristic column vectors of the training samples with known emotion classes is
Figure BDA0003087629330000029
Is provided with
Figure BDA00030876293300000210
Is composed of
Figure BDA00030876293300000211
The jth feature element of (1);
the feature column vector x for any sample(0)Feature j corresponds to an element
Figure BDA0003087629330000031
The formula for regularization is:
Figure BDA0003087629330000032
wherein
Figure BDA0003087629330000033
Represents X(0)The largest element in the j-th row,
Figure BDA0003087629330000034
represents X(0)The smallest element in row j; x is the number of·jIs composed of
Figure BDA0003087629330000035
Regularization of the results;
calculating all elements in any sample according to the formula (2) to obtain a normalized characteristic column vector x ═ x of any training or testing sample·1,x·2,...,x·n]TWherein, the normalized feature vectors of the speech segment signal samples belonging to the known emotion classification training sample set form the normalized feature vector set of the training sample
Figure BDA0003087629330000036
Further, the semantic embedding mapping in the second step can be implemented by using a word vector pre-training model for the emotion category names:
obtaining n corresponding to the category by inputting emotion category name into pre-trainingASemantic embedding vector of emotion category of dimension, and expressing the semantic embedding vector of known emotion category corresponding to the training set as
Figure BDA0003087629330000037
C to be predicted for a set of test samples(U)An unknown emotion class whose semantic embedded vector is expressed as
Figure BDA0003087629330000038
Further, the prototype weight matrix of the known emotion classification in step three
Figure BDA0003087629330000039
Middle virtual class cPKnown emotion category cSThe corresponding elements are:
Figure BDA00030876293300000310
wherein
Figure BDA0003087629330000041
Measure of distance between
Figure BDA0003087629330000042
Wherein sigma2Is the distance weight.
Further, the virtual category prototype matrix B in step three can be constructed in the following two ways:
(1) randomly generating n according to uniform distribution between 0 and 1A×c(P) A matrix of dimensions;
(2) semantic embedding matrix A set as known emotion category(S)
Further, the optimal virtual classifier is obtained in the fourth step
Figure BDA0003087629330000043
The optimization target of (1) is as follows:
Figure BDA0003087629330000044
wherein, the regularization term weight tau is more than 0, and the known emotion classification linear classifier:
Figure BDA0003087629330000045
loss function:
Figure BDA0003087629330000046
wherein the content of the first and second substances,
Figure BDA0003087629330000047
for the c-th known emotion category label information when
Figure BDA0003087629330000048
When the temperature of the water is higher than the set temperature,
Figure BDA0003087629330000049
otherwise
Figure BDA00030876293300000410
Further, the classification decision on the unknown emotion category test sample in the fifth step includes the following steps executed in sequence:
(1) embedding prototypes according to calculated unknown emotion category semantics
Figure BDA00030876293300000411
Computing emotion category prototype weight vectors for category m
Figure BDA0003087629330000051
(2) Side language feature x corresponding to test sample(U)Predicting the unknown emotion category to which the sample belongs
Figure BDA0003087629330000052
Corresponding reference numerals
Figure BDA0003087629330000053
And obtaining the judgment result of the unknown emotion classification of the test sample.
Has the advantages that: as shown in fig. 1, the method for recognizing an unknown emotional state of a speech signal according to the present invention extracts a sublingual feature of a speech segment signal sample with unknown emotional state information, and performs classification and decision by a method of synthesizing a classifier in combination with semantic embedding of an emotional state label. Specifically, in the training stage, the sublingual features are extracted from a known emotion category training speech segment sample, meanwhile, the known emotion category prototype weight is obtained through processing according to the known emotion category name, and then the optimal virtual classifier is obtained through solving by combining with the known emotion category training speech segment sample label; and in the testing stage, an optimal virtual classifier is used, and the unknown emotion classification judgment is carried out on the testing sample by combining the secondary language features of the unknown emotion classification testing corpus sample and the unknown emotion classification prototype weight.
The existing speech emotion recognition method has two problems: for a general SER method, the method can be only used for identifying the emotion class providing a sample in a training set, and the identification processing of an unknown class has problems; although solutions have been presented in the work on unknown emotion recognition of speech signals, successful implementation of the solutions still relies on adequate labeling of the dimensions of the emotion space of the training sample.
Therefore, the method for recognizing the unknown emotion of the voice signal, which is disclosed by the invention, adopts the method for recognizing the unknown emotion of the voice signal based on the synthesis classifier and semantic embedding, and can provide help for the cognition of the unknown emotion in the voice signal on the basis of not increasing the cost of manual labeling, so that the unknown emotion in the voice can be effectively recognized.
Experiments prove that the method for recognizing the unknown emotion voice signal is provided based on semantic embedding in the aspect of voice signal emotion recognition, and the unknown emotion can be effectively distinguished aiming at the voice signal.
Drawings
Fig. 1 is a flow chart of a method for recognizing an unknown emotional state of a speech signal according to the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and the detailed description.
As shown in FIG. 1, the method of the present invention firstly performs secondary language feature extraction on speech segment signal samples with unknown emotional state information, and performs classification judgment by a method of a synthesis classifier in combination with semantic embedding of emotional state labels. In the training stage, extracting sublingual features from a known emotion category training speech segment sample, processing according to a known emotion category name to obtain a known emotion category prototype weight, and solving by combining with a known emotion category training speech segment sample label to obtain an optimal virtual classifier; and in the testing stage, an optimal virtual classifier is used, and the unknown emotion classification judgment is carried out on the testing sample by combining the secondary language features of the unknown emotion classification testing corpus sample and the unknown emotion classification prototype weight.
In the following, the method of the present invention is compared with the existing zero sample learning method by an experimental method for identifying the Unweighted Accuracy (UA) recognition rate.
The effectiveness verification of the method of the embodiment of the invention is carried out by adopting the voice signal part in a GEMEP (GEneva Multimodal enumeration Portrayals) database.
The bimodal database GEMEP includes a set of speech samples and its corresponding set of video samples GEMEP-FERA. The database contains 18 emotion categories, namely, opinion, amument, anxiety, collenger, contentmp, despain, distust, animation, hot anger, interest, systemic rear, pleasure, pride, relief, sadness, name, surrise, tenderness. The database was recorded in french for 1260 samples, which were assigned to 10 speakers, including 5 females. The experiment uses 12 types of emotion categories, specifically, amusement, anxiety, cold anger, despain, elation, hot anger, interest, panic fear, pleasure, pride, relief and sadness, wherein the average number of each type is 90 samples, and 1080 samples are total; all samples of every two types of emotions in the data set are used as unknown emotion testing phrase sample sets, other emotion category samples are used as known emotion training phrase sample sets, and 66 different sample type combination modes exist, so that the experiment is trained and tested for 66 times.
The original secondary language features of the experiment adopt a Japanese interior tile reduced Acoustic Parameter Set (eGeMAPS) feature Set, and the original feature dimension nF88, the features are derived from 25 Low-Level Descriptors (LLDs) combined with High-Level statistical functions (HSFs), and temporal features and equivalent sound levels, and are extracted by using openSMILE 2.3 tool box in the experiment.
Semantic embedding prototypes for emotion classes using nAThese prototypes were derived based on pre-trained models of word2vec, GloVe, and fastText, for a 300-dimensional english word semantic vector. The semantic embedding model in the experiment was a Google pre-trained word2vec model trained on a Google News corpus containing 300 ten thousand words; the GloVe model in the experiment uses Wikipedia 2014 and Gigaword 5 as training data, and comprises 40 ten thousand words; fastText uses 200 ten thousand word vectors trained on web-based crawl data and 100 thousand word vectors trained on Wikipedia 2017, UMBC webbase and stat.
In the experiment, in order to show the effect of the method of the present invention, the methods for comparison were: SAE (semantic AutoEncoder), DEM (Deep Embedding Model; DEM), LatEm (Laten Embedding), ESZSL (Embarrasingly Simple Zero-Shot Learning), and EXEM (EXEMPLAR Synthesis).
The speech signal emotion state recognition model comprises two models which are respectively as follows: SYNC (origin) (example 1, using a prototype of a known emotion class as a prototype matrix B of a virtual class in step three, i.e. B ═ A(S)) SYNC (rand) (example 2, using the prototype of the known emotion category as the prototype matrix B of the virtual category in step three, wherein the number of the virtual categories c(P)=1000)。
In the experiment, the training set is subjected to optimal parameter selection by adopting emotion class independent five-fold cross validation, and the experiment is repeated for 10 times for the random generation of the virtual class prototype in the third step. For the embodiment of the invention, the range of the selected parameters is as follows: regularizationTerm weight τ ═ {2 ═ 2-24,2-23,…,2-9}, distance weight σ2={2-5,2-4,…,25}。
The average result of the optimal UA for all semantic embedded prototypes on the GEMEP database is shown in table 1:
TABLE 1
By means of UA
Comparative example 1 SAE 57.2%
Comparative example 2 DEM 59.3%
Comparative example 3 LatEm 64.2%
Comparative example 4 ESZSL 64.6%
Comparative example 5 EXEM 62.3%
Practice ofExample 1 SYNC(origin) 64.4%
Example 2 SYNC(rand) 65.0%±0.9%
As can be seen from table 1, the SYNC methods in the present embodiment 1 and embodiment 2 can achieve better UA performance for recognition of unknown emotion of voice signal than other related comparative example methods.
Further, we present three results of example 2 using SYNC (rand) method for best performance in 10 replicates, compared to the results of SYNC (origin), as shown in Table 2. As can be seen from table 2, the randomly selected virtual class prototype matrix can enable the method of the present invention to achieve better performance.
TABLE 2
Method UA
SYNC(origin) 64.4%
SYNC(rand)best 66.6%
SYNC(rand)2ndbest 66.2%
SYNC(rand)3rdbest 65.7%
In summary, the SYNC method adopted in this embodiment can provide better performance on the recognition problem of unknown emotion of a voice signal by learning the discrimination information between known emotion categories for the secondary language features used in emotion recognition of the voice signal.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (10)

1. A method for recognizing an unknown emotional state of a speech signal, the method comprising: extracting sublingual features from the known emotion category training speech segment samples, processing according to the known emotion category names to obtain the prototype weight of the known emotion category, and solving by combining with the known emotion category training speech segment sample labels to obtain an optimal virtual classifier; and extracting secondary language features of the speech segment signal samples with unknown emotional state information, and performing unknown emotional category judgment on the test samples by using the optimal virtual classifier in combination with semantic embedding of emotional state labels and unknown emotional category prototype weights.
2. The method for recognizing an unknown emotional state of a speech signal according to claim 1, wherein the method specifically comprises the following steps:
step one, each language segment sample in the training sample set and the testing sample set is processed respectively, corresponding secondary language features are extracted and used as original features, and n is extracted and generatedFMaintaining the original characteristics, and regularizing the original characteristics to obtain N(S)Regularization features corresponding to individual training samples
Figure FDA0003087629320000011
And normalized feature x corresponding to any one test sample(U)
Step two, inputting emotion category names into pre-training to obtain n corresponding to the categoryASemantic embedding vector of emotion category of dimension, and expressing the semantic embedding vector of known emotion category corresponding to the training set as
Figure FDA0003087629320000012
Wherein c is(S)Number of known emotion classes, nAEmbedding dimensions for semantics of emotion category names; c to be predicted for a set of test samples(U)An unknown emotion class whose semantic embedded vector is expressed as
Figure FDA0003087629320000013
Step three, a prototype matrix A of the known emotion category(S)And a virtual class prototype matrix
Figure FDA0003087629320000014
Calculating to obtain a prototype weight matrix of the known emotion category
Figure FDA0003087629320000015
Step four, using the sublingual feature X of the known emotion category speech fragment sample(S)And emotion category label of corresponding sample
Figure FDA0003087629320000016
Semantic embedding of known emotion classes into prototypes A(S)The emotion category prototype weight matrix S is used for aligning the linear virtual classifier according to the optimization target f
Figure FDA0003087629320000017
Optimizing:
Figure FDA0003087629320000021
finding an optimal virtual classifier
Figure FDA0003087629320000022
Step five, testing sample characteristics x for each unknown emotion category speech segment(U)Using the optimal virtual classifier obtained in step four
Figure FDA0003087629320000023
And carrying out classification judgment on unknown emotion classes for each test sample.
3. The method of claim 2, wherein the prototype weight matrix of the known emotion category is used to identify the unknown emotional state of the speech signal
Figure FDA0003087629320000024
Virtual class c in (1)PKnown emotion category cSThe corresponding elements are:
Figure FDA0003087629320000025
wherein the content of the first and second substances,
Figure FDA0003087629320000026
measure of distance between
Figure FDA0003087629320000027
Wherein sigma2Is the distance weight.
4. The method of claim 3, wherein the distance weight σ is used to identify the unknown emotional state of the speech signal2={2-5,2-4,…,25}。
5. The method for recognizing the unknown emotional state of the speech signal according to claim 2, wherein the virtual class prototype matrix B in step three is specifically constructed in a manner that: randomly generating n according to uniform distribution between 0 and 1A×c(P) A matrix of dimensions.
6. The method for recognizing the unknown emotional state of the speech signal according to claim 2, wherein the virtual class prototype matrix B in step three is specifically constructed in a manner that: semantic embedding matrix A set as known emotion category(S)
7. The method for recognizing the unknown emotional state of the speech signal according to claim 2, wherein the optimal virtual classifier is obtained in the fourth step
Figure FDA0003087629320000028
The optimization target of (1) is as follows:
Figure FDA0003087629320000031
wherein the regularization term weight tau is more than 0, and the known emotion classification linear classifier
Figure FDA0003087629320000032
The loss function is:
Figure FDA0003087629320000033
wherein the content of the first and second substances,
Figure FDA0003087629320000034
for the c-th known emotion category label information when
Figure FDA0003087629320000035
Time of flight
Figure FDA0003087629320000036
Otherwise
Figure FDA0003087629320000037
8. The method according to claim 7, wherein the regularization term weight τ ═ 2-24,2-23,…,2-9}。
9. The method for recognizing the unknown emotional state of the speech signal according to claim 2, wherein in the fifth step, the unknown emotional category judgment is performed on the test sample by using the optimal virtual classifier, specifically: embedding prototypes according to calculated unknown emotion category semantics
Figure FDA0003087629320000038
Calculating emotion category prototype weight vector for category m:
Figure FDA0003087629320000039
side language feature x corresponding to test sample(U)Predicting the unknown emotion category to which the sample belongs
Figure FDA00030876293200000310
Corresponding reference numerals:
Figure FDA00030876293200000311
and obtaining the judgment result of the unknown emotion classification of the test sample.
10. A voice message according to claim 2The unknown emotional state identification method is characterized in that the original features adopt an EGeMAPS (EGeMAPS) and an original feature dimension nF=88。
CN202110584445.9A 2021-05-27 2021-05-27 Method for recognizing unknown emotional state of voice signal Active CN113409821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110584445.9A CN113409821B (en) 2021-05-27 2021-05-27 Method for recognizing unknown emotional state of voice signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110584445.9A CN113409821B (en) 2021-05-27 2021-05-27 Method for recognizing unknown emotional state of voice signal

Publications (2)

Publication Number Publication Date
CN113409821A true CN113409821A (en) 2021-09-17
CN113409821B CN113409821B (en) 2023-04-18

Family

ID=77674667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110584445.9A Active CN113409821B (en) 2021-05-27 2021-05-27 Method for recognizing unknown emotional state of voice signal

Country Status (1)

Country Link
CN (1) CN113409821B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110141258A1 (en) * 2007-02-16 2011-06-16 Industrial Technology Research Institute Emotion recognition method and system thereof
CN103854645A (en) * 2014-03-05 2014-06-11 东南大学 Speech emotion recognition method based on punishment of speaker and independent of speaker
CN107886942A (en) * 2017-10-31 2018-04-06 东南大学 A kind of voice signal emotion identification method returned based on local punishment random spectrum
CN108615052A (en) * 2018-04-13 2018-10-02 南京邮电大学 A kind of image-recognizing method without under similar training sample situation
CN109933664A (en) * 2019-03-12 2019-06-25 中南大学 A kind of fine granularity mood analysis improved method based on emotion word insertion
CN111324734A (en) * 2020-02-17 2020-06-23 昆明理工大学 Case microblog comment emotion classification method integrating emotion knowledge
US20200335086A1 (en) * 2019-04-19 2020-10-22 Behavioral Signal Technologies, Inc. Speech data augmentation
CN112466284A (en) * 2020-11-25 2021-03-09 南京邮电大学 Mask voice identification method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110141258A1 (en) * 2007-02-16 2011-06-16 Industrial Technology Research Institute Emotion recognition method and system thereof
CN103854645A (en) * 2014-03-05 2014-06-11 东南大学 Speech emotion recognition method based on punishment of speaker and independent of speaker
CN107886942A (en) * 2017-10-31 2018-04-06 东南大学 A kind of voice signal emotion identification method returned based on local punishment random spectrum
CN108615052A (en) * 2018-04-13 2018-10-02 南京邮电大学 A kind of image-recognizing method without under similar training sample situation
CN109933664A (en) * 2019-03-12 2019-06-25 中南大学 A kind of fine granularity mood analysis improved method based on emotion word insertion
US20200335086A1 (en) * 2019-04-19 2020-10-22 Behavioral Signal Technologies, Inc. Speech data augmentation
CN111324734A (en) * 2020-02-17 2020-06-23 昆明理工大学 Case microblog comment emotion classification method integrating emotion knowledge
CN112466284A (en) * 2020-11-25 2021-03-09 南京邮电大学 Mask voice identification method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵小蕾 等: "融合功能性副语言的语音情感识别方法", 《计算机科学与探索》 *

Also Published As

Publication number Publication date
CN113409821B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
Zehra et al. Cross corpus multi-lingual speech emotion recognition using ensemble learning
US20200335086A1 (en) Speech data augmentation
Luo et al. Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network.
Anagnostopoulos et al. Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011
CN107943784B (en) Relationship extraction method based on generation of countermeasure network
Kumar et al. Multilayer Neural Network Based Speech Emotion Recognition for Smart Assistance.
Sultana et al. Bangla speech emotion recognition and cross-lingual study using deep CNN and BLSTM networks
Mamyrbayev et al. Voice identification using classification algorithms
Albadr et al. Extreme learning machine for automatic language identification utilizing emotion speech data
Chen et al. Music emotion recognition using deep Gaussian process
KR20200105057A (en) Apparatus and method for extracting inquiry features for alalysis of inquery sentence
CN111563373A (en) Attribute-level emotion classification method for focused attribute-related text
CN110992988A (en) Speech emotion recognition method and device based on domain confrontation
CN112397092A (en) Unsupervised cross-library speech emotion recognition method based on field adaptive subspace
CN112466284B (en) Mask voice identification method
CN112489689A (en) Cross-database voice emotion recognition method and device based on multi-scale difference confrontation
CN113409821B (en) Method for recognizing unknown emotional state of voice signal
Sasidharan Rajeswari et al. Speech Emotion Recognition Using Machine Learning Techniques
CN113626553B (en) Cascade binary Chinese entity relation extraction method based on pre-training model
Deshmukh et al. Application of probabilistic neural network for speech emotion recognition
Saputri et al. Identifying Indonesian local languages on spontaneous speech data
Durrani et al. Transfer learning based speech affect recognition in Urdu
CN107886942B (en) Voice signal emotion recognition method based on local punishment random spectral regression
Das et al. Towards interpretable and transferable speech emotion recognition: Latent representation based analysis of features, methods and corpora
Vasuki Design of Hierarchical Classifier to Improve Speech Emotion Recognition.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant