CN110516696B - Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression - Google Patents

Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression Download PDF

Info

Publication number
CN110516696B
CN110516696B CN201910632006.3A CN201910632006A CN110516696B CN 110516696 B CN110516696 B CN 110516696B CN 201910632006 A CN201910632006 A CN 201910632006A CN 110516696 B CN110516696 B CN 110516696B
Authority
CN
China
Prior art keywords
emotion
voice
expression
data
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910632006.3A
Other languages
Chinese (zh)
Other versions
CN110516696A (en
Inventor
肖婧
黄永明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201910632006.3A priority Critical patent/CN110516696B/en
Publication of CN110516696A publication Critical patent/CN110516696A/en
Application granted granted Critical
Publication of CN110516696B publication Critical patent/CN110516696B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a self-adaptive weight bimodal fusion emotion recognition method based on voice and facial expression, which comprises the following steps: acquiring emotion voice and facial expression data, corresponding the emotion data to emotion types, and selecting a training sample set test sample set; extracting voice emotion characteristics from voice data, and extracting dynamic expression characteristics from the emotion data; based on the voice emotion characteristics and the expression characteristics, learning is carried out by adopting a deep learning method based on a semi-supervised automatic encoder, and a classification result and various output probabilities are obtained through a softmax classifier; and finally, fusing the two single-mode emotion recognition results in a decision layer, and obtaining a final emotion recognition result by adopting a self-adaptive weighting method. The invention adopts a self-adaptive weight fusion method aiming at the difference of the characteristic capability of different modal emotion characteristics of individuals, and has higher accuracy and objectivity.

Description

Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression
Technical Field
The invention relates to the field of emotion recognition in emotion calculation, in particular to a self-adaptive weight dual-mode fusion emotion recognition method based on voice and facial expression.
Background
In recent years, under the development of artificial intelligence and robotics, the conventional man-machine interaction mode cannot meet the requirements, and novel man-machine interaction needs emotion communication, so emotion recognition becomes a key for the development of man-machine interaction technology and also becomes a research subject of a academic hotspot. Emotion recognition is a research topic related to multidisciplinary, and by enabling a computer to understand and recognize human emotion, and further predicting and understanding the behavioral trend and psychological state of human, efficient and harmonious human-computer emotion interaction is realized.
There are various expressions of human emotion, such as speech, expression, gesture, text, etc., from which we can extract valid information to correctly analyze emotion. And the expression and voice information are the most obvious and most easily analyzed characteristics, so that the method has been widely studied and applied. Psychologist mehrabaian gives an equation: the emotion exposure=7% of words +38% of sounds +55% of facial expressions, and the speech information and facial expression information of a visible person cover 93% of emotion information, which is the core of human communication information. In the emotion expression process, the facial deformation can effectively and intuitively express the emotion of the heart, is one of the most important characteristic information for emotion recognition, and the voice characteristics can also express rich emotion.
In recent years, due to the development of the Internet and the layering of various social media, communication modes of people are greatly enriched, such as video, audio and the like, so that multi-mode emotion recognition is possible. Conventional single-mode recognition may have a problem that a single emotion feature does not represent an emotion state well, for example, people may not have a large change in facial expression when expressing sad emotion, but at this time, sad and missed emotion can be distinguished from low and slow voice. The multi-mode identification enables information of different modes to be complementary, provides more emotion information for emotion identification, and improves the accuracy of emotion identification. However, at present, single-mode emotion recognition research is mature, and a multi-mode emotion recognition method is still to be developed and perfected. Therefore, the multi-mode emotion recognition has very important practical application significance. The dual-mode emotion recognition based on the most dominant expression and voice features has important research significance and practical value. Conventional weighting methods ignore individual variability and therefore, an adaptive weighting method is needed for weight distribution.
Disclosure of Invention
The invention aims to provide a self-adaptive weight dual-mode fusion emotion recognition method based on voice and facial expression, so that complementation of modal information is realized, and self-adaptive weight distribution aiming at personal difference is realized.
For this purpose, the invention adopts the following technical scheme:
an identification method based on self-adaptive weight bimodal fusion of voice and facial expression is characterized by comprising the following steps:
s1, acquiring emotion voice and facial expression data, corresponding the emotion data to emotion categories, selecting a training sample set test sample set,
s2, extracting voice emotion characteristics from voice data, extracting dynamic expression characteristics from the voice data, firstly, automatically extracting expression peak value frames, obtaining a dynamic image sequence from the beginning of expression to the expression peak value, normalizing the image sequence with non-fixed length into an image sequence with fixed length as the dynamic expression characteristics,
s3, learning by adopting a deep learning method based on a semi-supervised automatic encoder based on the voice emotion characteristics and the expression characteristics respectively, obtaining a classification result and each class output probability by a softmax classifier,
and S4, fusing the two single-mode emotion recognition results in a decision layer, and obtaining a final emotion recognition result by adopting a self-adaptive weight distribution method.
Further, the specific steps of the step S2 are as follows:
s2a.1: for voice emotion data, the obtained voice sample section is subjected to framing treatment and divided into multi-frame voice sections, windowing treatment is carried out on the voice sections after framing treatment to obtain voice emotion signals,
s2a.2: for the voice emotion signals obtained by S2A.1, extracting low-level feature extraction at the frame level, and extracting fundamental tone F0, short-time energy, frequency perturbation amplitude perturbation, harmonic to noise ratio, mel cepstrum coefficient and the like,
s2a.3: counting the low-level features obtained in the step of one frame level on the voice sample level formed by a plurality of frames, and applying a plurality of statistical functions, maximum values, minimum values, average values, standard deviations and the like to the low-level features to obtain voice emotion features;
S2B.1. for facial expression data, firstly, carrying out coordinate change on the obtained three-dimensional coordinate data of facial expression feature points, taking the nose tip as a center point, obtaining a rotation matrix by utilizing the SVD principle, and carrying out rotation change by multiplying the rotation matrix so as to eliminate the influence of head posture change.
S2B.2, extracting peak expression frames by using a slow feature analysis method, wherein the specific steps are as follows:
1) Treating each moving image sequence sample as a time input signal
2) Will beNormalizing to obtain a mean difference value of 0 and a variance of 1,
x(t)=[x 1 (t),x 2 (t),…,x I (t)] T
3) The input signal is subjected to nonlinear expansion, the problem is converted into a linear SFA problem,
4) Performing data whitening;
5) The linear SFA method solves.
S2B.3, after the dynamic expression sequence from the expression initial frame to the expression peak frame is obtained, normalizing by using the dynamic characteristics of non-fixed length by a linear interpolation method.
Further, the specific steps of the step S3 are as follows:
s3.1, inputting unlabeled and labeled input training samples aiming at a certain mode data, respectively generating reconstruction data and category output through encoding by a self-encoder, decoding and outputting by a softmax classifier,
s3.2, calculating an unsupervised learning representation reconstruction error and a supervised learning classification error,
s3.3, constructing an optimized objective function, simultaneously taking reconstruction errors and classification errors into consideration,
E(θ)=αE r +(1-α)E c
and S3.4, updating parameters by a gradient descent method until the objective function converges.
Further, the specific steps of the step S4 are as follows:
s4.1, acquiring various output probabilities of two modes of a test sample of a softmax classifier, and calculating a variable delta k ,δ k Can be used for measuring the quality of emotion characterization of the mode according to delta of each sample k The different sizes realize the self-adaptive distribution of weights, wherein J is the number of classes in the system, and P is a vector formed by sample output probabilities. P= { P j |j=1,…,J},p j The probability of belonging to each class output for the softmax classifier, d, represents the Euclidean distance between the two vectors.
S4.2. delta. Will be k Mapping to [0,1 ] according to]And a and b are self-selected parameters as weights, and are determined according to specific conditions. ,
u k =1-1/[1+exp(-a(δ k -b))];
s4.3, obtaining P in the fused output probability vector according to the following formula final ={p final_j I j=1, …, J }, the category to which the highest probability belongs is the identification category. P is p j_k In order to obtain the j-th class probability output by utilizing the K-th mode to perform single-mode emotion recognition, K modes are used.
Compared with the prior art, the invention has the following beneficial effects: the emotion recognition method based on the self-adaptive weight double-mode fusion of the voice and the facial expression obtains more accurate and efficient recognition effects based on a standard database, adopts the self-adaptive weight fusion method aiming at the difference of the characteristic capacities of different modal emotion characteristics of individuals, has higher accuracy and objectivity, obtains 83% recognition rate based on an IEMOCAP emotion library, and obtains about 3% recognition rate improvement compared with the traditional fixed weight distribution.
Drawings
FIG. 1 is a schematic diagram of the overall flow of the identification method of the present invention.
Fig. 2 is a flow chart of step S3 of the present invention.
FIG. 3 is a flow chart of adaptive weight distribution according to the present invention.
Detailed Description
The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.
Example 1: referring to fig. 1-3, a recognition method based on adaptive weight bimodal fusion of speech and facial expressions, the method comprising the steps of:
s1, acquiring emotion voice and facial expression data, corresponding the emotion data to emotion categories, selecting a training sample set test sample set,
s2, extracting voice emotion characteristics from voice data, extracting dynamic expression characteristics from the voice data, firstly, automatically extracting expression peak value frames, obtaining a dynamic image sequence from the beginning of expression to the expression peak value, normalizing the image sequence with non-fixed length into an image sequence with fixed length as the dynamic expression characteristics,
s3, learning by adopting a deep learning method based on a semi-supervised automatic encoder based on the voice emotion characteristics and the expression characteristics respectively, obtaining a classification result and each class output probability by a softmax classifier,
and S4, fusing the two single-mode emotion recognition results in a decision layer, and obtaining a final emotion recognition result by adopting a self-adaptive weight distribution method.
Further, the specific steps of the step S2 are as follows:
s2a.1: for voice emotion data, the obtained voice sample section is subjected to framing treatment and divided into multi-frame voice sections, windowing treatment is carried out on the voice sections after framing treatment to obtain voice emotion signals,
s2a.2: for the voice emotion signals obtained by S2A.1, extracting low-level feature extraction at the frame level, and extracting fundamental tone F0, short-time energy, frequency perturbation amplitude perturbation, harmonic to noise ratio, mel cepstrum coefficient and the like,
s2a.3: the low-level features obtained in the step of one frame level are counted on the voice sample level formed by a plurality of frames, a plurality of statistical functions, a maximum value, a minimum value, an average value, a standard deviation and the like are applied to the low-level features, so as to obtain voice emotion features,
S2B.1. for facial expression data, firstly, carrying out coordinate change on the obtained three-dimensional coordinate data of facial expression feature points, taking the nose tip as a center point, obtaining a rotation matrix by utilizing the SVD principle, and carrying out rotation change by multiplying the rotation matrix so as to eliminate the influence of head posture change.
S2B.2, extracting peak expression frames by using a slow feature analysis method, wherein the specific steps are as follows:
1) Treating each moving image sequence sample as a time input signal
2) Will beNormalizing to obtain a mean difference value of 0 and a variance of 1,
x(t)=[x 1 (t),x 2 (t),…,x I (t)] T
3) The input signal is subjected to nonlinear expansion, the problem is converted into a linear SFA problem,
4) Performing data whitening;
5) The linear SFA method solves.
S2B.3, after the dynamic expression sequence from the expression initial frame to the expression peak frame is obtained, normalizing by using the dynamic characteristics of non-fixed length by a linear interpolation method.
Further, the specific steps of the step S3 are as follows:
s3.1, inputting unlabeled and labeled input training samples aiming at a certain mode data, respectively generating reconstruction data and category output through encoding by a self-encoder, decoding and outputting by a softmax classifier,
s3.2, calculating an unsupervised learning representation reconstruction error and a supervised learning classification error,
s3.3, constructing an optimized objective function, simultaneously taking reconstruction errors and classification errors into consideration,
E(θ)=αE r +(1-α)E c
and S3.4, updating parameters by a gradient descent method until the objective function converges.
Further, the specific steps of the step S4 are as follows:
s4.1, acquiring various output probabilities of two modes of a test sample of a softmax classifier, and calculating a variable delta k ,δ k Can be used for measuring the quality of emotion characterization of the mode according to delta of each sample k The different sizes realize the self-adaptive distribution of weights, wherein J is the number of classes in the system. P is the vector of sample output probabilities. P= { P j |j=1,…,J},p j The probability of belonging to each class output for the softmax classifier, d, represents the Euclidean distance between the two vectors.
S4.2. delta. Will be k Mapping to [0,1 ] according to]And (3) taking the weight as a weight, wherein a and b are self-selected parameters.
u k =1-1/[1+exp(-a(δ k -b))];
S4.3, obtaining P in the fused output probability vector according to the following formula final ={p final_j I j=1, …, J }, the category to which the highest probability belongs is the identification category. P is p j_k In order to obtain the j-th class probability output by utilizing the K-th mode to perform single-mode emotion recognition, K modes are used.
Application examples: referring to fig. 1-3, in this example, the IEMOCAP emotion database is used as a material, and the simulation platform is MATLAB R2014a.
As shown in FIG. 1, the emotion recognition method based on the self-adaptive weight double-mode fusion of voice and expression mainly comprises the following steps:
s1, acquiring emotion voice and facial expression data, corresponding the emotion data to emotion types, and selecting a training sample set test sample set. Four emotion categories, neutral, happy, sad and anger, are selected.
S2, extracting voice emotion characteristics from the voice data. Extracting dynamic expression features from expression data, firstly automatically extracting expression peak frames, obtaining a dynamic image sequence from the beginning of expression to the expression peak, and normalizing an image sequence with a non-fixed length into an image sequence with a fixed length to serve as the dynamic expression features. The extraction of the voice features is to extract INTERSPEECH 2010Paralinguistic Challenge standard feature sets and 1582-dimensional features by using an open-source voice feature extraction toolbox openSMILE. And extracting dynamic characteristics of facial expression. And extracting the peak expression frame by using a slow feature analysis method. And then setting a threshold value to find an expression initial frame, obtaining a dynamic expression sequence from the expression initial frame to an expression peak value frame, and normalizing by using a linear interpolation method and non-fixed-length dynamic characteristics.
S3, learning is carried out by adopting a deep learning method based on a semi-supervised automatic encoder based on the voice emotion characteristics and the expression characteristics respectively, and a classification result and various class output probabilities are obtained through a softmax classifier.
And S4, fusing the two single-mode emotion recognition results in a decision layer, and obtaining a final emotion recognition result by adopting a self-adaptive weight distribution method.
As shown in fig. 2, the step S3 of semi-supervised classification specifically includes:
s3.1, inputting training samples without labels and with labels aiming at a certain mode data. The self-encoder encodes, decodes and softmax classifier outputs produce reconstructed data and class outputs, respectively.
And S3.2, calculating an unsupervised learning representation reconstruction error and a supervised learning classification error.
And S3.3, constructing an optimization objective function, and simultaneously taking reconstruction errors and classification errors into consideration.
E(θ)=αE r +(1-α)E c
And S3.4, updating parameters by a gradient descent method until the objective function converges.
As shown in fig. 3, the specific steps of the step S4 are as follows:
s4.1, acquiring various output probabilities of two modes of a test sample of the softmax classifier. Calculating the variable delta k ,δ k Can be used for measuring the quality of emotion characterization of the mode according to delta of each sample k The different sizes enable adaptive allocation of weights. Wherein J is the number of classes in the system. P is the vector of sample output probabilities. P= { P j |j=1,…,J},p j The probability of belonging to each class output for the softmax classifier, d, represents the Euclidean distance between the two vectors.
S4.2. delta. Will be k Mapping to [0,1 ] according to]As a weight. Wherein a and b are discretionary parameters.
u k =1-1/[1+exp(-a(δ k -b))]
S4.3, obtaining P in the fused output probability vector according to the following formula final ={p final_j I j=1, …, J }, the category to which the highest probability belongs is the identification category, p j_k In order to obtain the j-th class probability output by utilizing the K-th mode to perform single-mode emotion recognition, K modes are used.
It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and the equivalent substitutions or alternatives made on the basis of the above-mentioned technical solutions are all included in the scope of the present invention.

Claims (4)

1. The self-adaptive weight bimodal fusion emotion recognition method based on voice and facial expression is characterized by comprising the following steps of:
s1, acquiring emotion voice data and facial expression data, corresponding the emotion data to emotion types, and selecting a training sample set test sample set;
s2, extracting voice emotion characteristics from voice data, extracting dynamic expression characteristics from the voice data, firstly automatically extracting expression peak frames, obtaining a dynamic image sequence from the beginning of expression to the expression peak, and normalizing the non-fixed-length image sequence into a fixed-length image sequence serving as the dynamic expression characteristics;
s3, learning is carried out by adopting a deep learning method based on a semi-supervised automatic encoder based on voice emotion characteristics and expression characteristics respectively, and classification results and various class output probabilities are obtained through a softmax classifier;
s4, fusing the two single-mode emotion recognition results in a decision layer, adopting a self-adaptive weight distribution method to obtain a final emotion recognition result,
the decision layer fusion step based on the self-adaptive weight in the step S4 is as follows:
s4.1, acquiring various output probabilities of two modes of a test sample of a softmax classifier, and calculating a variable delta k ,δ k Can be used for measuring the quality of emotion characterization of the mode according to delta of each sample k Adaptive allocation of weights for different sizes is realized, wherein J is the number of classes in the system, P is a vector consisting of sample output probabilities, and P= { P j |j=1,…,J},p j The probability of belonging to each category output by the softmax classifier is d, and the Euclidean distance between two vectors is represented;
s4.2. delta. Will be k Mapping to [0,1 ] according to]Wherein, a and b are self-selected parameters as weights,
u k =1-1/[1+exp(-a(δ k -b))];
s4.3, obtaining P in the fused output probability vector according to the following formula final ={p final_j I j=1, …, J }, the category to which the highest probability belongs is the identification category, p j_k The probability output of the j-th class obtained by carrying out single-mode emotion recognition by using the K-th mode is K modes in total;
2. the method for identifying the self-adaptive weight bimodal fusion emotion based on voice and facial expression according to claim 1, wherein the specific steps of extracting emotion features in the step S2 are as follows:
s2a.1: for voice emotion data, the obtained voice sample section is subjected to framing treatment and divided into multi-frame voice sections, windowing treatment is carried out on the voice sections after framing treatment to obtain voice emotion signals,
s2a.2: for the voice emotion signals obtained by S2A.1, extracting low-level feature extraction at the frame level, and extracting fundamental tone F0, short-time energy, frequency perturbation amplitude perturbation, harmonic to noise ratio, mel cepstrum coefficient and the like,
s2a.3: counting the low-level features obtained in the step of one frame level on the voice sample level formed by a plurality of frames, and applying a plurality of statistical functions, maximum values, minimum values, average values, standard deviations and the like to the low-level features to obtain voice emotion features;
S2B.1. for facial expression data, firstly, carrying out coordinate change on the obtained three-dimensional coordinate data of facial expression characteristic points, taking the nose tip as a center point, obtaining a rotation matrix by utilizing the SVD principle, carrying out rotation change by multiplying the rotation matrix so as to eliminate the influence of head posture change,
S2B.2. extracting peak expression frame by slow feature analysis method,
s2B.3, after the dynamic expression sequence from the expression initial frame to the expression peak frame is obtained, normalizing by using the dynamic characteristics of non-fixed length by a linear interpolation method.
3. The method for identifying the self-adaptive weight bimodal fusion emotion based on voice and facial expression according to claim 1, wherein the specific steps of semi-supervised learning in the step S3 are as follows:
s3.1, inputting unlabeled and labeled input training samples aiming at a certain mode data, respectively generating reconstruction data and category output through encoding by a self-encoder, decoding and outputting by a softmax classifier,
s3.2, calculating an unsupervised learning representation reconstruction error E r And supervised learning classification errors E c
S3.3 constructing an optimization objective function while taking the reconstruction error E into account r And classification error E r
E(θ)=αE r +(1-α)E c
And S3.4, updating parameters by a gradient descent method until the objective function converges.
4. The method for identifying the self-adaptive weight bimodal fusion emotion based on voice and facial expressions according to claim 2, wherein the method is characterized in that S2B.2 comprises the following steps of:
1) Treating each moving image sequence sample as a time input signal
2) Will beNormalizing to obtain a mean difference value of 0 and a variance of 1,
x(t)=[x 1 (t),x 2 (t),…,x I (t)] T
3) Performing nonlinear expansion and expansion on an input signal, converting a problem into a linear SFA problem, and 4) performing data whitening;
5) The linear SFA method solves.
CN201910632006.3A 2019-07-12 2019-07-12 Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression Active CN110516696B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910632006.3A CN110516696B (en) 2019-07-12 2019-07-12 Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910632006.3A CN110516696B (en) 2019-07-12 2019-07-12 Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression

Publications (2)

Publication Number Publication Date
CN110516696A CN110516696A (en) 2019-11-29
CN110516696B true CN110516696B (en) 2023-07-25

Family

ID=68623425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910632006.3A Active CN110516696B (en) 2019-07-12 2019-07-12 Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression

Country Status (1)

Country Link
CN (1) CN110516696B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110677598B (en) * 2019-09-18 2022-04-12 北京市商汤科技开发有限公司 Video generation method and device, electronic equipment and computer storage medium
CN111027215B (en) * 2019-12-11 2024-02-20 中国人民解放军陆军工程大学 Character training system and method for virtual person
CN111401268B (en) * 2020-03-19 2022-11-15 内蒙古工业大学 Multi-mode emotion recognition method and device for open environment
CN111460494B (en) * 2020-03-24 2023-04-07 广州大学 Multi-mode deep learning-oriented privacy protection method and system
CN112006697B (en) * 2020-06-02 2022-11-01 东南大学 Voice signal-based gradient lifting decision tree depression degree recognition system
CN112101096B (en) * 2020-08-02 2023-09-22 华南理工大学 Multi-mode fusion suicide emotion perception method based on voice and micro-expression
CN112401886B (en) * 2020-10-22 2023-01-31 北京大学 Processing method, device and equipment for emotion recognition and storage medium
CN112418034A (en) * 2020-11-12 2021-02-26 元梦人文智能国际有限公司 Multi-modal emotion recognition method and device, electronic equipment and storage medium
CN112528835B (en) * 2020-12-08 2023-07-04 北京百度网讯科技有限公司 Training method and device of expression prediction model, recognition method and device and electronic equipment
CN113076847B (en) * 2021-03-29 2022-06-17 济南大学 Multi-mode emotion recognition method and system
CN113033450B (en) * 2021-04-02 2022-06-24 山东大学 Multi-mode continuous emotion recognition method, service inference method and system
CN113343860A (en) * 2021-06-10 2021-09-03 南京工业大学 Bimodal fusion emotion recognition method based on video image and voice
CN113780198B (en) * 2021-09-15 2023-11-24 南京邮电大学 Multi-mode emotion classification method for image generation
CN114912502B (en) * 2021-12-28 2024-03-29 天翼数字生活科技有限公司 Double-mode deep semi-supervised emotion classification method based on expressions and voices
CN114626430B (en) * 2021-12-30 2022-10-18 华院计算技术(上海)股份有限公司 Emotion recognition model training method, emotion recognition device and emotion recognition medium
CN115240649B (en) * 2022-07-19 2023-04-18 于振华 Voice recognition method and system based on deep learning
CN116561533B (en) * 2023-07-05 2023-09-29 福建天晴数码有限公司 Emotion evolution method and terminal for virtual avatar in educational element universe

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976809B (en) * 2016-05-25 2019-12-17 中国地质大学(武汉) Identification method and system based on speech and facial expression bimodal emotion fusion

Also Published As

Publication number Publication date
CN110516696A (en) 2019-11-29

Similar Documents

Publication Publication Date Title
CN110516696B (en) Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression
Wani et al. A comprehensive review of speech emotion recognition systems
Jahangir et al. Deep learning approaches for speech emotion recognition: State of the art and research challenges
CN110853680B (en) double-BiLSTM speech emotion recognition method with multi-input multi-fusion strategy
Bhat et al. Automatic assessment of sentence-level dysarthria intelligibility using BLSTM
He et al. Multimodal depression recognition with dynamic visual and audio cues
Huang et al. Natural language processing methods for acoustic and landmark event-based features in speech-based depression detection
CN103996155A (en) Intelligent interaction and psychological comfort robot service system
Samantaray et al. A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages
CN112006697A (en) Gradient boosting decision tree depression recognition method based on voice signals
CN110147548A (en) The emotion identification method initialized based on bidirectional valve controlled cycling element network and new network
CN113297383B (en) Speech emotion classification method based on knowledge distillation
Huang et al. Speech emotion recognition using convolutional neural network with audio word-based embedding
Swain et al. A DCRNN-based ensemble classifier for speech emotion recognition in Odia language
CN116304973A (en) Classroom teaching emotion recognition method and system based on multi-mode fusion
CN114898779A (en) Multi-mode fused speech emotion recognition method and system
CN110348482A (en) A kind of speech emotion recognition system based on depth model integrated architecture
Ling An acoustic model for English speech recognition based on deep learning
Shah et al. Articulation constrained learning with application to speech emotion recognition
Zhao et al. [Retracted] Standardized Evaluation Method of Pronunciation Teaching Based on Deep Learning
Rangra et al. Emotional speech-based personality prediction using NPSO architecture in deep learning
Zhang et al. Emotion recognition in speech using multi-classification SVM
Cao et al. Emotion recognition from children speech signals using attention based time series deep learning
Yang [Retracted] Design of Service Robot Based on User Emotion Recognition and Environmental Monitoring
CN112951270B (en) Voice fluency detection method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant