CN112800875A - Multi-mode emotion recognition method based on mixed feature fusion and decision fusion - Google Patents

Multi-mode emotion recognition method based on mixed feature fusion and decision fusion Download PDF

Info

Publication number
CN112800875A
CN112800875A CN202110048664.5A CN202110048664A CN112800875A CN 112800875 A CN112800875 A CN 112800875A CN 202110048664 A CN202110048664 A CN 202110048664A CN 112800875 A CN112800875 A CN 112800875A
Authority
CN
China
Prior art keywords
image
text
fusion
emotion
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110048664.5A
Other languages
Chinese (zh)
Inventor
刘兴旺
廣田薰
程智鹏
李文龙
戴亚平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202110048664.5A priority Critical patent/CN112800875A/en
Publication of CN112800875A publication Critical patent/CN112800875A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Image Analysis (AREA)

Abstract

A multi-mode emotion recognition method with mixed feature fusion and decision fusion belongs to the field of mode recognition and emotion recognition. The implementation method of the invention comprises the following steps: firstly, constructing an image emotion recognition network by using a convolutional neural network framework, and acquiring image characteristics and image emotion states; secondly, constructing a text emotion recognition network by using a recurrent neural network framework, and acquiring text characteristics and a text emotion state; and thirdly, constructing a multi-mode information fusion emotion recognition network, constructing a main classifier for fusing image emotion states and text emotion states and acquiring a main emotion classification, constructing an auxiliary classifier for fusing image features and text features and acquiring an auxiliary emotion classification, and fusing the main emotion classification and the auxiliary emotion classification to acquire a final emotion classification. The invention utilizes the information complementation among the multi-mode information to avoid the problem of low emotion recognition accuracy rate of single-mode information caused by factors such as information blurring or missing and provides a new idea for multi-mode data fusion and emotion recognition.

Description

Multi-mode emotion recognition method based on mixed feature fusion and decision fusion
Technical Field
The invention relates to the fields of data fusion, neural networks, emotion recognition and the like, in particular to a multi-mode information fusion emotion recognition method based on hybrid fusion.
Background
Human beings express emotional information through various modalities such as expressions, postures, sounds, languages, and the like, and emotional behaviors are important indexes reflecting human satisfaction. With the development of artificial intelligence technology, emotion recognition is an important means for realizing good human-computer interaction. The emotion recognition is to extract the features of the emotion signals to obtain the mapping relation between the external appearance features and the internal emotion states of the emotion, thereby recognizing the internal emotion types of the recognized objects. The emotion recognition has very wide application prospects in the fields of machine service, health medical treatment, remote education, unmanned driving and the like.
A modality is a way to characterize information, such as images, text, sound, etc. Multimodal, i.e., various forms of combinations of two or more modalities. The same object has expressions of different modalities, and information of different modalities is independent and has potential relevance. Currently, emotion recognition mainly acquires and analyzes single-mode emotion information to obtain the emotional state of a tested person. Due to the fact that the single-mode information is weak in anti-interference capacity and is easy to dope some redundant signals or lack part of information, the accuracy of classification identification is low and even classification errors can be caused.
Human cognitive process is multi-modal, and an individual perceives a scene through signals such as vision, hearing and even touch, and obtains high-dimensional information such as emotion through fusion processing and semantic understanding of the information. The multi-modal information fusion aims to simulate the human perception understanding process, and the aim of removing redundant information in the modalities or supplementing missing information of a certain modality is fulfilled by establishing a model capable of processing, associating and reasoning information from multiple modalities and capturing potential association among different modality information by utilizing complementarity among the modality information.
Multimodal fusion is mainly divided into three aspects according to the fusion hierarchy: data level fusion, feature level fusion and decision level fusion. The data level fusion is only suitable for signals with similar types, and cannot process signals with larger differences, such as image and sound signals. The feature level fusion converts different modal data extraction into high-dimensional feature expression, combines high-order features of different modalities in a certain mode, fuses the high-order features into a new feature vector, and can capture complementary information among different modalities. The decision-level fusion takes different modal data as the input of a trained classifier to obtain each classification result, a final decision vector is output according to a fusion method, the difference of different modal information is fully considered, errors of the decision-level fusion come from different classifiers, the errors of the different classifiers are usually not related to each other, and the accumulation of the errors is not caused.
Disclosure of Invention
The invention aims to overcome the defect that the existing single-mode emotion recognition method is weak in anti-interference capability, and provides a high-precision multi-mode information emotion recognition method by utilizing information complementation among multi-mode information. The invention adopts an information fusion method of mixed feature layer fusion and decision layer fusion, and a mixed feature fusion and decision fusion multi-modal emotion recognition method is constructed by fusing multi-modal information.
The purpose of the invention is realized by the following technical scheme.
The invention discloses a multi-modal information fusion emotion recognition method based on hybrid fusion, which comprises the following steps:
step 1: constructing an image emotion recognition network based on a Convolutional Neural Network (CNN) framework, extracting the characteristics of image information through a stacked convolutional structure, acquiring the image characteristics by capturing high-dimensional characteristics and classifying the acquired image information emotion states;
step 2: extracting edge information of the face feature area, obtaining a single image feature matrix by judging whether the edge information exists or not, obtaining an emotion feature matrix by accumulation processing of the single image feature matrix, removing redundant area feature information and reserving significant area feature information.
And step 3: and constructing a mixed and fused multi-mode information fusion network. And performing decision-making level fusion on the image emotion label and the text emotion label by using a main classifier to obtain a fused main classification result. And performing feature level fusion on the image features and the text features by using an auxiliary classifier to obtain an auxiliary classification result. And fusing the main classification result and the auxiliary classification result to obtain the final emotional state. And constructing a feature fusion layer and a decision fusion layer, and comprehensively utilizing the correlation and complementarity between the two modal information to realize the final emotion recognition and classification task.
The implementation method of the step 1 comprises the following steps:
and constructing an image emotion recognition network by using a Convolutional Neural Network (CNN) for extracting image features and acquiring emotion classification. The portion may employ a variety of image feature extraction networks, such as VGGnet, Resnet, and the like. Inputting image emotion recognition network for image data in a format with the size of (B, C, H, W), wherein B is Batch size (Batch size), namely the number of pieces of image information input at the same time; c is the number of image channels, if the color image is RGB three channels, the gray image is single channel; h and W are the height and width of the image, respectively. The network extracts image characteristics I1, sends I1 to a full connection layer and obtains the final image information emotional state I, wherein I is a vector of [ batch _ size, num _ class ] dimension, and num _ class is the predicted category number.
The step 2 is realized by the following steps:
and constructing a text emotion recognition network by using a Recurrent Neural Network (RNN) for extracting text features and acquiring emotion classification. The part can adopt various text feature extraction frameworks, such as LSTM, BilTM and other mainstream frameworks. For text data, each word in the text is input into a word embedding layer to be encoded to obtain a word vector, and the input dimension of the network model is [ batch _ size, seq _ len ], wherein the batch _ size is the size of the batch text, and the seq _ len is the length of a sentence. And after the specified word embedding layer is subjected to random initialization, the word vector dimension is [ batch _ size, seq _ len, embed _ size ], and the embed _ size is the word vector dimension. And inputting the obtained word vector into the RNN to obtain hidden layer vectors [ batch _ size, seq _ len, hidden _ size x 2] of all the moments, wherein hidden _ size is the size of a hidden layer. And extracting text characteristics T1 by the network, sending T1 into a full connection layer and acquiring a final text information emotional state T, wherein T is a vector of [ batch _ size, num _ class ] dimension, and num _ class is the predicted category number.
The implementation method of the step 3 is as follows:
step 3.1: and constructing a main classifier for multi-modal information fusion. Splicing the image emotional state A and the text emotional state B, and sending the spliced image emotional state A and the text emotional state B into a main classifier to obtain a main classification result (Class) with the dimension of 1 × 4;
step 3.2: acquiring image features and text feature weights of feature fusion, and performing cascading (collocation) operation on the image features and the text features on batch dimensions, wherein for image data, the feature weights are as follows:
Figure BDA0002898397320000031
wherein B is the batch size and C is the number of image data channels. For text data, the characteristic weights are:
Figure BDA0002898397320000032
wherein, B is the size of the text batch, and S is the length of the text. And mapping the two to a 0-1 interval through normalization to obtain a new feature Fused _ feature as follows:
Figure BDA0002898397320000033
and taking the new features as the input of an Auxiliary classifier to obtain an Auxiliary classification result (Auxiliary).
Step 3.3: the fusion layer routes the input vectors to a plurality of nodes by adopting a dynamic routing mode, and generates final fusion vectors through vector compression and splicing. Firstly, the input feature vector passes through a hidden layer:
u1=W1v1,u2=W2v2,
wherein v is1And v2W is the weight for the feature vector of the input text and image. Adopting dynamic route mode to make last oneThe feature vectors obtained in the step are routed to three nodes:
s1=c11u1+c12u2,
s2=c21u1+c22u2,
s3=c31u1+c32u2,
generating an auxiliary classifier with dimension 1 x 4 by compressing and splicing vectors:
v=Concat(Squash(si)),
Figure BDA0002898397320000034
step 3.4: fusing a main classification result and an auxiliary classification result by a decision-level fusion method, and acquiring a final classification result by using a softmax function:
Finally_class=softmax(Auxiliary+class)。
compared with the prior art, the invention has the following advantages:
1. the invention discloses a mixed feature fusion and decision fusion multi-mode emotion recognition method, which comprises the steps of extracting features of image and text information and recognizing emotion classification results, constructing a decision fusion-based main classifier and a feature fusion-based auxiliary classifier, and obtaining a final classification result by weighting the results of the main classifier and the feature fusion-based auxiliary classifier, so that the problem of poor performance of the emotion recognition method due to information loss or fuzziness under a single-mode condition is solved, and a good recognition effect is achieved;
2. the invention discloses a multi-modal emotion recognition method with mixed feature fusion and decision fusion, which comprises the steps of constructing a fusion layer of various modal features for fusing features of different modes, constructing the feature fusion layer in a dynamic routing mode, routing input vectors to a plurality of nodes, generating fusion vectors through compression and splicing of the vectors, and fully considering the correlation and difference among different modal information;
3. the multi-modal emotion recognition method with mixed feature fusion and decision fusion disclosed by the invention can be replaced by using a network framework with good feature extraction capability in each modal information, and has good variability and expansibility.
Drawings
The invention will be further described with reference to the following examples and embodiments, in which:
FIG. 1 is a flowchart of a mixed feature fusion and decision fusion multimodal emotion recognition method in an embodiment of the present invention;
FIG. 2 is a block diagram of a mixed feature fusion and decision fusion multimodal emotion recognition method according to an embodiment of the present invention;
FIG. 3 is a fusion layer framework diagram of a multi-modal emotion recognition method with hybrid feature fusion and decision fusion according to an embodiment of the present invention;
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments, which are given by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a flowchart of a mixed feature fusion and decision fusion multimodal emotion recognition method in an embodiment of the present invention, and fig. 2 is a frame diagram of a mixed feature fusion and decision fusion multimodal emotion recognition method in an embodiment of the present invention. FIG. 3 is a fusion layer framework diagram of a multi-modal emotion recognition method with hybrid feature fusion and decision fusion according to an embodiment of the present invention. Fig. 3 disclosed in this embodiment is a frame diagram fused by a multi-modal emotion recognition method based on hybrid feature fusion and decision fusion in the embodiment of the present invention, and the specific implementation steps are as follows:
step 1: and (4) synthesis of a multi-modal data set. The training data set of the network model is divided into an image data set and a text data set and is used for training and verifying the feasibility and superiority of the algorithm. The text data set is derived from yf _ amazon, contains 72 ten thousand shopping comment/score data from 14 ten thousand users, has 5 emotion classifications, and is subjected to data cleaning to remove texts with null values, messy codes and no actual meanings. The image data set is derived from original data sets such as Kaggle and FER2013, has 6 emotion classifications, and for matching the text data set, the image data set retains 5 emotion classifications: vitality generation, heart injury, normality, happiness and surprise. And (3) carrying out one-to-one correspondence on the text data and the image data, and constructing a triple data set with a structure of < label, image and text > for training the multi-modal emotion recognition model.
Step 2: generation of a master classifier. As shown in fig. 2, the embedding layer of the image adopts a residual error network Resnet50 architecture, the original vector of the image is M × N, and the feature I of S × T is obtained after the embedding layer is encoded. The imbedding network of the text adopts a long-short term memory recognition network BilSTM structure, the original vector length of the text adopts the size of 128 x 1, the length is insufficient and is supplemented by 0, the length exceeds 128 to be cut off, and 256 x 1 characteristic T is obtained after the imbedding layer coding. And dimension integration is carried out on the features I and the features T through different full connection layers, finally, features with the same dimension B1 are generated, and finally, the main classifier array of B2 is generated through splicing.
And step 3: generation of an auxiliary classifier. As shown in fig. 1, feature vectors a and B of an image and a text are input to a weighting layer to obtain shallow feature vectors, and in order to retain semantic information as much as possible, dot product operation is performed with the original feature vectors, and the obtained two vectors are used as input of a fusion layer. As shown in fig. 2, the fusion layer routes the input vector to N nodes in a dynamic routing manner, where N is 3. And then generating a final fusion vector through vector compression and splicing, and fusing image and text features as much as possible through the method.
And 4, step 4: and performing decision-level fusion on the main classifier and the auxiliary classifier, and identifying the emotional characteristics by adopting a Softmax regression model to obtain the emotional categories. The expression categories are 5 categories, which are angry, sad, neutral, happy and surprised, respectively. And fusing the main classifier and the auxiliary classifier by three methods of mean fusion, DS evidence theory fusion and dynamic weight fusion, and then identifying the emotional characteristics by using a Softmax regression model to obtain 5 types of emotional probabilities, wherein the maximum probability is an expression identification result.
Through the steps, a pre-synthesized multi-modal data set is subjected to experiment and randomly divided into a training set and a verification set, wherein the training set accounts for 70% of the total amount, the verification set accounts for 15%, and the test set accounts for 15%. Three sets of comparative experiments were performed, experiment one: training an LSTM network by using a single text data set to obtain the text emotion recognition accuracy; experiment two: training a ResNet50 network by using a single image data set to obtain the image emotion recognition accuracy; experiment three: and training the multi-modal emotion recognition model by using an image text data set, wherein the data fusion method respectively adopts mean fusion, DS evidence theory fusion and dynamic weight fusion to obtain the emotion recognition accuracy of the multi-modal emotion recognition model. Finally, compared with the experimental results of the first experiment and the second experiment, the experimental result of the third experiment is improved by 3.22%, 3.68% and 10.54%.
The above embodiments are preferred identification modes of the present invention, but the present invention is not limited to the above embodiments, and various changes can be made within the scope of knowledge in the art without departing from the spirit of the present invention. Any modification, equivalent replacement, or improvement made without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (4)

1. A multi-mode emotion recognition method based on mixed feature fusion and decision fusion is characterized by comprising the following steps: comprises the following steps of preparing a mixture of a plurality of raw materials,
step 1: an image emotion recognition network is constructed based on a Convolutional Neural Network (CNN) framework, and image information is subjected to feature extraction through a stacked convolutional structure, so that the capability of capturing multi-dimensional features is achieved, the image features are further obtained, and the image information emotion states are obtained in a classified mode;
step 2: and constructing a text emotion recognition network based on a Recurrent Neural Network (RNN) framework. The RNN takes the output of the previous node as the input of the next node, so that the memory function of the RNN is realized, the model can better extract the characteristics of the long text information and recognize the emotional state of the text information;
and step 3: and constructing a mixed and fused multi-mode information fusion network. And performing decision-making level fusion on the image emotion label and the text emotion label by using a main classifier to obtain a fused main classification result. And performing feature level fusion on the image features and the text features by using an auxiliary classifier to obtain an auxiliary classification result. And fusing the main classification result and the auxiliary classification result to obtain the final emotional state. And constructing a feature fusion layer and a decision fusion layer, and comprehensively utilizing the correlation and complementarity between the two modal information to realize the final emotion recognition and classification task.
2. The method of claim 1, wherein the method comprises the following steps: the implementation method of the step 1 is that,
and constructing an image emotion recognition network by using a Convolutional Neural Network (CNN) for extracting image features and acquiring emotion classification. The portion may employ a variety of image feature extraction networks, such as VGGnet, Resnet, and the like. Inputting image emotion recognition network for image data in a format with the size of (B, C, H, W), wherein B is Batch size (Batch size), namely the number of pieces of image information input at the same time; c is the number of image channels, if the color image is RGB three channels, the gray image is single channel; h and W are the height and width of the image, respectively. The network extracts image characteristics I1, sends I1 to a full connection layer and obtains the final image information emotional state I, wherein I is a vector of [ batch _ size, num _ class ] dimension, and num _ class is the predicted category number.
3. The method of claim 1, wherein the method comprises the following steps: the implementation method of the step 2 is that,
and constructing a text emotion recognition network by using a Recurrent Neural Network (RNN) for extracting text features and acquiring emotion classification. The part can adopt various text feature extraction frameworks, such as LSTM, BilTM and other mainstream frameworks. For text data, each word in the text is input into a word embedding layer to be encoded to obtain a word vector, and the input dimension of the network model is [ batch _ size, seq _ len ], wherein the batch _ size is the size of the batch text, and the seq _ len is the length of a sentence. And after the specified word embedding layer is subjected to random initialization, the word vector dimension is [ batch _ size, seq _ len, embed _ size ], and the embed _ size is the word vector dimension. And inputting the obtained word vector into the RNN to obtain hidden layer vectors [ batch _ size, seq _ len, hidden _ size x 2] of all the moments, wherein hidden _ size is the size of a hidden layer. And extracting text characteristics T1 by the network, sending T1 into a full connection layer and acquiring a final text information emotional state T, wherein T is a vector of [ batch _ size, num _ class ] dimension, and num _ class is the predicted category number.
4. The method of claim 1, wherein the method comprises the following steps: the implementation method of the step 3 is that,
step 3.1: and constructing a main classifier for multi-modal information fusion. Splicing the image emotional state I and the text emotional state T and sending the spliced image emotional state I and the text emotional state T into a main classifier to obtain a main classification result (Class) with the dimension of 1 × 4;
step 3.2: acquiring image features and text feature weights of feature fusion, and performing cascading (collocation) operation on the image features and the text features on batch dimensions, wherein for image data, the feature weights are as follows:
Figure FDA0002898397310000021
wherein, B is the image batch size, and C is the image data channel number. For text data, the characteristic weights are:
Figure FDA0002898397310000022
wherein, B is the size of the text batch, and S is the length of the text. And mapping the two to a 0-1 interval through normalization to obtain a new feature Fused _ feature as follows:
Figure FDA0002898397310000023
and taking the new features as the input of an Auxiliary classifier to obtain an Auxiliary classification result (Auxiliary).
And 3.3, the fusion layer routes the input vector to a plurality of nodes in a dynamic routing mode and generates a final fusion vector through vector compression and splicing. Firstly, the input feature vector passes through a hidden layer:
u1=W1v1,u2=W2v2,
wherein v is1And v2W is the weight for the feature vector of the input text and image. And routing the feature vectors obtained in the last step to three nodes in a dynamic routing mode:
s1=c11u1+c12u2,
s2=c21u1+c22u2,
s3=c31u1+c32u2,
generating an auxiliary classifier with dimension 1 x 4 by compressing and splicing vectors:
v=Concat(Squash(si)),
Figure FDA0002898397310000024
step 3.4: fusing a main classification result and an auxiliary classification result by a decision-level fusion method, and acquiring a final classification result by using a softmax function:
Finally_class=softmax(Auxiliary+class)。
CN202110048664.5A 2021-01-14 2021-01-14 Multi-mode emotion recognition method based on mixed feature fusion and decision fusion Pending CN112800875A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110048664.5A CN112800875A (en) 2021-01-14 2021-01-14 Multi-mode emotion recognition method based on mixed feature fusion and decision fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110048664.5A CN112800875A (en) 2021-01-14 2021-01-14 Multi-mode emotion recognition method based on mixed feature fusion and decision fusion

Publications (1)

Publication Number Publication Date
CN112800875A true CN112800875A (en) 2021-05-14

Family

ID=75810844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110048664.5A Pending CN112800875A (en) 2021-01-14 2021-01-14 Multi-mode emotion recognition method based on mixed feature fusion and decision fusion

Country Status (1)

Country Link
CN (1) CN112800875A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673567A (en) * 2021-07-20 2021-11-19 华南理工大学 Panorama emotion recognition method and system based on multi-angle subregion self-adaption
CN113688938A (en) * 2021-09-07 2021-11-23 北京百度网讯科技有限公司 Method for determining object emotion and method and device for training emotion classification model
CN113988201A (en) * 2021-11-03 2022-01-28 哈尔滨工程大学 Multi-mode emotion classification method based on neural network
CN114218380A (en) * 2021-12-03 2022-03-22 淮阴工学院 Multi-mode-based cold chain loading user portrait label extraction method and device
CN114330454A (en) * 2022-01-05 2022-04-12 东北农业大学 Live pig cough sound identification method based on DS evidence theory fusion characteristics
CN115034257A (en) * 2022-05-09 2022-09-09 西北工业大学 Cross-modal information target identification method and device based on feature fusion
CN116383426A (en) * 2023-05-30 2023-07-04 深圳大学 Visual emotion recognition method, device, equipment and storage medium based on attribute
CN116543283A (en) * 2023-07-05 2023-08-04 合肥工业大学 Multimode target detection method considering modal uncertainty
CN116580436A (en) * 2023-05-08 2023-08-11 长春理工大学 Lightweight convolutional network facial emotion recognition method with auxiliary classifier
CN116994069A (en) * 2023-09-22 2023-11-03 武汉纺织大学 Image analysis method and system based on multi-mode information
CN117235605A (en) * 2023-11-10 2023-12-15 湖南马栏山视频先进技术研究院有限公司 Sensitive information classification method and device based on multi-mode attention fusion

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508640A (en) * 2018-10-12 2019-03-22 咪咕文化科技有限公司 A kind of crowd's sentiment analysis method, apparatus and storage medium
CN109934260A (en) * 2019-01-31 2019-06-25 中国科学院信息工程研究所 Image, text and data fusion sensibility classification method and device based on random forest
US20190311188A1 (en) * 2018-12-05 2019-10-10 Sichuan University Face emotion recognition method based on dual-stream convolutional neural network
CN110674339A (en) * 2019-09-18 2020-01-10 北京工业大学 Chinese song emotion classification method based on multi-mode fusion
CN110826336A (en) * 2019-09-18 2020-02-21 华南师范大学 Emotion classification method, system, storage medium and equipment
CN111881291A (en) * 2020-06-19 2020-11-03 山东师范大学 Text emotion classification method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508640A (en) * 2018-10-12 2019-03-22 咪咕文化科技有限公司 A kind of crowd's sentiment analysis method, apparatus and storage medium
US20190311188A1 (en) * 2018-12-05 2019-10-10 Sichuan University Face emotion recognition method based on dual-stream convolutional neural network
CN109934260A (en) * 2019-01-31 2019-06-25 中国科学院信息工程研究所 Image, text and data fusion sensibility classification method and device based on random forest
CN110674339A (en) * 2019-09-18 2020-01-10 北京工业大学 Chinese song emotion classification method based on multi-mode fusion
CN110826336A (en) * 2019-09-18 2020-02-21 华南师范大学 Emotion classification method, system, storage medium and equipment
CN111881291A (en) * 2020-06-19 2020-11-03 山东师范大学 Text emotion classification method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张戈: "多模态连续维度情感识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
徐志栋 等: "基于胶囊网络的方面级情感分类研究", 《智能科学与技术学报》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673567B (en) * 2021-07-20 2023-07-21 华南理工大学 Panorama emotion recognition method and system based on multi-angle sub-region self-adaption
CN113673567A (en) * 2021-07-20 2021-11-19 华南理工大学 Panorama emotion recognition method and system based on multi-angle subregion self-adaption
CN113688938A (en) * 2021-09-07 2021-11-23 北京百度网讯科技有限公司 Method for determining object emotion and method and device for training emotion classification model
CN113688938B (en) * 2021-09-07 2023-07-28 北京百度网讯科技有限公司 Method for determining emotion of object, method and device for training emotion classification model
CN113988201A (en) * 2021-11-03 2022-01-28 哈尔滨工程大学 Multi-mode emotion classification method based on neural network
CN113988201B (en) * 2021-11-03 2024-04-26 哈尔滨工程大学 Multi-mode emotion classification method based on neural network
CN114218380A (en) * 2021-12-03 2022-03-22 淮阴工学院 Multi-mode-based cold chain loading user portrait label extraction method and device
CN114218380B (en) * 2021-12-03 2022-07-29 淮阴工学院 Multi-mode-based cold chain loading user portrait label extraction method and device
CN114330454A (en) * 2022-01-05 2022-04-12 东北农业大学 Live pig cough sound identification method based on DS evidence theory fusion characteristics
CN115034257A (en) * 2022-05-09 2022-09-09 西北工业大学 Cross-modal information target identification method and device based on feature fusion
CN116580436A (en) * 2023-05-08 2023-08-11 长春理工大学 Lightweight convolutional network facial emotion recognition method with auxiliary classifier
CN116383426A (en) * 2023-05-30 2023-07-04 深圳大学 Visual emotion recognition method, device, equipment and storage medium based on attribute
CN116383426B (en) * 2023-05-30 2023-08-22 深圳大学 Visual emotion recognition method, device, equipment and storage medium based on attribute
CN116543283A (en) * 2023-07-05 2023-08-04 合肥工业大学 Multimode target detection method considering modal uncertainty
CN116543283B (en) * 2023-07-05 2023-09-15 合肥工业大学 Multimode target detection method considering modal uncertainty
CN116994069A (en) * 2023-09-22 2023-11-03 武汉纺织大学 Image analysis method and system based on multi-mode information
CN116994069B (en) * 2023-09-22 2023-12-22 武汉纺织大学 Image analysis method and system based on multi-mode information
CN117235605A (en) * 2023-11-10 2023-12-15 湖南马栏山视频先进技术研究院有限公司 Sensitive information classification method and device based on multi-mode attention fusion
CN117235605B (en) * 2023-11-10 2024-02-02 湖南马栏山视频先进技术研究院有限公司 Sensitive information classification method and device based on multi-mode attention fusion

Similar Documents

Publication Publication Date Title
CN112800875A (en) Multi-mode emotion recognition method based on mixed feature fusion and decision fusion
CN108596039B (en) Bimodal emotion recognition method and system based on 3D convolutional neural network
CN110046656B (en) Multi-mode scene recognition method based on deep learning
Mino et al. Logan: Generating logos with a generative adversarial neural network conditioned on color
CN111292765B (en) Bimodal emotion recognition method integrating multiple deep learning models
CN112818861A (en) Emotion classification method and system based on multi-mode context semantic features
CN112131383A (en) Specific target emotion polarity classification method
CN109829499B (en) Image-text data fusion emotion classification method and device based on same feature space
CN113343974B (en) Multi-modal fusion classification optimization method considering inter-modal semantic distance measurement
CN111506732A (en) Text multi-level label classification method
CN112580555B (en) Spontaneous micro-expression recognition method
CN114662497A (en) False news detection method based on cooperative neural network
CN114092742A (en) Small sample image classification device and method based on multiple angles
CN112183465A (en) Social relationship identification method based on character attributes and context
CN114283482A (en) Facial expression recognition model of double-branch generation countermeasure network based on self-attention feature filtering classifier
CN113128284A (en) Multi-mode emotion recognition method and device
CN111859925B (en) Emotion analysis system and method based on probability emotion dictionary
Ruan et al. Facial expression recognition in facial occlusion scenarios: A path selection multi-network
Sun et al. Weak supervised learning based abnormal behavior detection
CN116758451A (en) Audio-visual emotion recognition method and system based on multi-scale and global cross attention
CN112541469B (en) Crowd counting method and system based on self-adaptive classification
CN112613405B (en) Method for recognizing actions at any visual angle
Majumder et al. Variational fusion for multimodal sentiment analysis
Soysal et al. Facial action unit recognition using data mining integrated deep learning
Almana et al. Real-time Arabic Sign Language Recognition using CNN and OpenCV

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210514

WD01 Invention patent application deemed withdrawn after publication