CN112800875A - Multi-mode emotion recognition method based on mixed feature fusion and decision fusion - Google Patents
Multi-mode emotion recognition method based on mixed feature fusion and decision fusion Download PDFInfo
- Publication number
- CN112800875A CN112800875A CN202110048664.5A CN202110048664A CN112800875A CN 112800875 A CN112800875 A CN 112800875A CN 202110048664 A CN202110048664 A CN 202110048664A CN 112800875 A CN112800875 A CN 112800875A
- Authority
- CN
- China
- Prior art keywords
- image
- text
- fusion
- emotion
- size
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
- G06F18/256—Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Image Analysis (AREA)
Abstract
A multi-mode emotion recognition method with mixed feature fusion and decision fusion belongs to the field of mode recognition and emotion recognition. The implementation method of the invention comprises the following steps: firstly, constructing an image emotion recognition network by using a convolutional neural network framework, and acquiring image characteristics and image emotion states; secondly, constructing a text emotion recognition network by using a recurrent neural network framework, and acquiring text characteristics and a text emotion state; and thirdly, constructing a multi-mode information fusion emotion recognition network, constructing a main classifier for fusing image emotion states and text emotion states and acquiring a main emotion classification, constructing an auxiliary classifier for fusing image features and text features and acquiring an auxiliary emotion classification, and fusing the main emotion classification and the auxiliary emotion classification to acquire a final emotion classification. The invention utilizes the information complementation among the multi-mode information to avoid the problem of low emotion recognition accuracy rate of single-mode information caused by factors such as information blurring or missing and provides a new idea for multi-mode data fusion and emotion recognition.
Description
Technical Field
The invention relates to the fields of data fusion, neural networks, emotion recognition and the like, in particular to a multi-mode information fusion emotion recognition method based on hybrid fusion.
Background
Human beings express emotional information through various modalities such as expressions, postures, sounds, languages, and the like, and emotional behaviors are important indexes reflecting human satisfaction. With the development of artificial intelligence technology, emotion recognition is an important means for realizing good human-computer interaction. The emotion recognition is to extract the features of the emotion signals to obtain the mapping relation between the external appearance features and the internal emotion states of the emotion, thereby recognizing the internal emotion types of the recognized objects. The emotion recognition has very wide application prospects in the fields of machine service, health medical treatment, remote education, unmanned driving and the like.
A modality is a way to characterize information, such as images, text, sound, etc. Multimodal, i.e., various forms of combinations of two or more modalities. The same object has expressions of different modalities, and information of different modalities is independent and has potential relevance. Currently, emotion recognition mainly acquires and analyzes single-mode emotion information to obtain the emotional state of a tested person. Due to the fact that the single-mode information is weak in anti-interference capacity and is easy to dope some redundant signals or lack part of information, the accuracy of classification identification is low and even classification errors can be caused.
Human cognitive process is multi-modal, and an individual perceives a scene through signals such as vision, hearing and even touch, and obtains high-dimensional information such as emotion through fusion processing and semantic understanding of the information. The multi-modal information fusion aims to simulate the human perception understanding process, and the aim of removing redundant information in the modalities or supplementing missing information of a certain modality is fulfilled by establishing a model capable of processing, associating and reasoning information from multiple modalities and capturing potential association among different modality information by utilizing complementarity among the modality information.
Multimodal fusion is mainly divided into three aspects according to the fusion hierarchy: data level fusion, feature level fusion and decision level fusion. The data level fusion is only suitable for signals with similar types, and cannot process signals with larger differences, such as image and sound signals. The feature level fusion converts different modal data extraction into high-dimensional feature expression, combines high-order features of different modalities in a certain mode, fuses the high-order features into a new feature vector, and can capture complementary information among different modalities. The decision-level fusion takes different modal data as the input of a trained classifier to obtain each classification result, a final decision vector is output according to a fusion method, the difference of different modal information is fully considered, errors of the decision-level fusion come from different classifiers, the errors of the different classifiers are usually not related to each other, and the accumulation of the errors is not caused.
Disclosure of Invention
The invention aims to overcome the defect that the existing single-mode emotion recognition method is weak in anti-interference capability, and provides a high-precision multi-mode information emotion recognition method by utilizing information complementation among multi-mode information. The invention adopts an information fusion method of mixed feature layer fusion and decision layer fusion, and a mixed feature fusion and decision fusion multi-modal emotion recognition method is constructed by fusing multi-modal information.
The purpose of the invention is realized by the following technical scheme.
The invention discloses a multi-modal information fusion emotion recognition method based on hybrid fusion, which comprises the following steps:
step 1: constructing an image emotion recognition network based on a Convolutional Neural Network (CNN) framework, extracting the characteristics of image information through a stacked convolutional structure, acquiring the image characteristics by capturing high-dimensional characteristics and classifying the acquired image information emotion states;
step 2: extracting edge information of the face feature area, obtaining a single image feature matrix by judging whether the edge information exists or not, obtaining an emotion feature matrix by accumulation processing of the single image feature matrix, removing redundant area feature information and reserving significant area feature information.
And step 3: and constructing a mixed and fused multi-mode information fusion network. And performing decision-making level fusion on the image emotion label and the text emotion label by using a main classifier to obtain a fused main classification result. And performing feature level fusion on the image features and the text features by using an auxiliary classifier to obtain an auxiliary classification result. And fusing the main classification result and the auxiliary classification result to obtain the final emotional state. And constructing a feature fusion layer and a decision fusion layer, and comprehensively utilizing the correlation and complementarity between the two modal information to realize the final emotion recognition and classification task.
The implementation method of the step 1 comprises the following steps:
and constructing an image emotion recognition network by using a Convolutional Neural Network (CNN) for extracting image features and acquiring emotion classification. The portion may employ a variety of image feature extraction networks, such as VGGnet, Resnet, and the like. Inputting image emotion recognition network for image data in a format with the size of (B, C, H, W), wherein B is Batch size (Batch size), namely the number of pieces of image information input at the same time; c is the number of image channels, if the color image is RGB three channels, the gray image is single channel; h and W are the height and width of the image, respectively. The network extracts image characteristics I1, sends I1 to a full connection layer and obtains the final image information emotional state I, wherein I is a vector of [ batch _ size, num _ class ] dimension, and num _ class is the predicted category number.
The step 2 is realized by the following steps:
and constructing a text emotion recognition network by using a Recurrent Neural Network (RNN) for extracting text features and acquiring emotion classification. The part can adopt various text feature extraction frameworks, such as LSTM, BilTM and other mainstream frameworks. For text data, each word in the text is input into a word embedding layer to be encoded to obtain a word vector, and the input dimension of the network model is [ batch _ size, seq _ len ], wherein the batch _ size is the size of the batch text, and the seq _ len is the length of a sentence. And after the specified word embedding layer is subjected to random initialization, the word vector dimension is [ batch _ size, seq _ len, embed _ size ], and the embed _ size is the word vector dimension. And inputting the obtained word vector into the RNN to obtain hidden layer vectors [ batch _ size, seq _ len, hidden _ size x 2] of all the moments, wherein hidden _ size is the size of a hidden layer. And extracting text characteristics T1 by the network, sending T1 into a full connection layer and acquiring a final text information emotional state T, wherein T is a vector of [ batch _ size, num _ class ] dimension, and num _ class is the predicted category number.
The implementation method of the step 3 is as follows:
step 3.1: and constructing a main classifier for multi-modal information fusion. Splicing the image emotional state A and the text emotional state B, and sending the spliced image emotional state A and the text emotional state B into a main classifier to obtain a main classification result (Class) with the dimension of 1 × 4;
step 3.2: acquiring image features and text feature weights of feature fusion, and performing cascading (collocation) operation on the image features and the text features on batch dimensions, wherein for image data, the feature weights are as follows:
wherein B is the batch size and C is the number of image data channels. For text data, the characteristic weights are:
wherein, B is the size of the text batch, and S is the length of the text. And mapping the two to a 0-1 interval through normalization to obtain a new feature Fused _ feature as follows:
and taking the new features as the input of an Auxiliary classifier to obtain an Auxiliary classification result (Auxiliary).
Step 3.3: the fusion layer routes the input vectors to a plurality of nodes by adopting a dynamic routing mode, and generates final fusion vectors through vector compression and splicing. Firstly, the input feature vector passes through a hidden layer:
u1=W1v1,u2=W2v2,
wherein v is1And v2W is the weight for the feature vector of the input text and image. Adopting dynamic route mode to make last oneThe feature vectors obtained in the step are routed to three nodes:
s1=c11u1+c12u2,
s2=c21u1+c22u2,
s3=c31u1+c32u2,
generating an auxiliary classifier with dimension 1 x 4 by compressing and splicing vectors:
v=Concat(Squash(si)),
step 3.4: fusing a main classification result and an auxiliary classification result by a decision-level fusion method, and acquiring a final classification result by using a softmax function:
Finally_class=softmax(Auxiliary+class)。
compared with the prior art, the invention has the following advantages:
1. the invention discloses a mixed feature fusion and decision fusion multi-mode emotion recognition method, which comprises the steps of extracting features of image and text information and recognizing emotion classification results, constructing a decision fusion-based main classifier and a feature fusion-based auxiliary classifier, and obtaining a final classification result by weighting the results of the main classifier and the feature fusion-based auxiliary classifier, so that the problem of poor performance of the emotion recognition method due to information loss or fuzziness under a single-mode condition is solved, and a good recognition effect is achieved;
2. the invention discloses a multi-modal emotion recognition method with mixed feature fusion and decision fusion, which comprises the steps of constructing a fusion layer of various modal features for fusing features of different modes, constructing the feature fusion layer in a dynamic routing mode, routing input vectors to a plurality of nodes, generating fusion vectors through compression and splicing of the vectors, and fully considering the correlation and difference among different modal information;
3. the multi-modal emotion recognition method with mixed feature fusion and decision fusion disclosed by the invention can be replaced by using a network framework with good feature extraction capability in each modal information, and has good variability and expansibility.
Drawings
The invention will be further described with reference to the following examples and embodiments, in which:
FIG. 1 is a flowchart of a mixed feature fusion and decision fusion multimodal emotion recognition method in an embodiment of the present invention;
FIG. 2 is a block diagram of a mixed feature fusion and decision fusion multimodal emotion recognition method according to an embodiment of the present invention;
FIG. 3 is a fusion layer framework diagram of a multi-modal emotion recognition method with hybrid feature fusion and decision fusion according to an embodiment of the present invention;
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments, which are given by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a flowchart of a mixed feature fusion and decision fusion multimodal emotion recognition method in an embodiment of the present invention, and fig. 2 is a frame diagram of a mixed feature fusion and decision fusion multimodal emotion recognition method in an embodiment of the present invention. FIG. 3 is a fusion layer framework diagram of a multi-modal emotion recognition method with hybrid feature fusion and decision fusion according to an embodiment of the present invention. Fig. 3 disclosed in this embodiment is a frame diagram fused by a multi-modal emotion recognition method based on hybrid feature fusion and decision fusion in the embodiment of the present invention, and the specific implementation steps are as follows:
step 1: and (4) synthesis of a multi-modal data set. The training data set of the network model is divided into an image data set and a text data set and is used for training and verifying the feasibility and superiority of the algorithm. The text data set is derived from yf _ amazon, contains 72 ten thousand shopping comment/score data from 14 ten thousand users, has 5 emotion classifications, and is subjected to data cleaning to remove texts with null values, messy codes and no actual meanings. The image data set is derived from original data sets such as Kaggle and FER2013, has 6 emotion classifications, and for matching the text data set, the image data set retains 5 emotion classifications: vitality generation, heart injury, normality, happiness and surprise. And (3) carrying out one-to-one correspondence on the text data and the image data, and constructing a triple data set with a structure of < label, image and text > for training the multi-modal emotion recognition model.
Step 2: generation of a master classifier. As shown in fig. 2, the embedding layer of the image adopts a residual error network Resnet50 architecture, the original vector of the image is M × N, and the feature I of S × T is obtained after the embedding layer is encoded. The imbedding network of the text adopts a long-short term memory recognition network BilSTM structure, the original vector length of the text adopts the size of 128 x 1, the length is insufficient and is supplemented by 0, the length exceeds 128 to be cut off, and 256 x 1 characteristic T is obtained after the imbedding layer coding. And dimension integration is carried out on the features I and the features T through different full connection layers, finally, features with the same dimension B1 are generated, and finally, the main classifier array of B2 is generated through splicing.
And step 3: generation of an auxiliary classifier. As shown in fig. 1, feature vectors a and B of an image and a text are input to a weighting layer to obtain shallow feature vectors, and in order to retain semantic information as much as possible, dot product operation is performed with the original feature vectors, and the obtained two vectors are used as input of a fusion layer. As shown in fig. 2, the fusion layer routes the input vector to N nodes in a dynamic routing manner, where N is 3. And then generating a final fusion vector through vector compression and splicing, and fusing image and text features as much as possible through the method.
And 4, step 4: and performing decision-level fusion on the main classifier and the auxiliary classifier, and identifying the emotional characteristics by adopting a Softmax regression model to obtain the emotional categories. The expression categories are 5 categories, which are angry, sad, neutral, happy and surprised, respectively. And fusing the main classifier and the auxiliary classifier by three methods of mean fusion, DS evidence theory fusion and dynamic weight fusion, and then identifying the emotional characteristics by using a Softmax regression model to obtain 5 types of emotional probabilities, wherein the maximum probability is an expression identification result.
Through the steps, a pre-synthesized multi-modal data set is subjected to experiment and randomly divided into a training set and a verification set, wherein the training set accounts for 70% of the total amount, the verification set accounts for 15%, and the test set accounts for 15%. Three sets of comparative experiments were performed, experiment one: training an LSTM network by using a single text data set to obtain the text emotion recognition accuracy; experiment two: training a ResNet50 network by using a single image data set to obtain the image emotion recognition accuracy; experiment three: and training the multi-modal emotion recognition model by using an image text data set, wherein the data fusion method respectively adopts mean fusion, DS evidence theory fusion and dynamic weight fusion to obtain the emotion recognition accuracy of the multi-modal emotion recognition model. Finally, compared with the experimental results of the first experiment and the second experiment, the experimental result of the third experiment is improved by 3.22%, 3.68% and 10.54%.
The above embodiments are preferred identification modes of the present invention, but the present invention is not limited to the above embodiments, and various changes can be made within the scope of knowledge in the art without departing from the spirit of the present invention. Any modification, equivalent replacement, or improvement made without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Claims (4)
1. A multi-mode emotion recognition method based on mixed feature fusion and decision fusion is characterized by comprising the following steps: comprises the following steps of preparing a mixture of a plurality of raw materials,
step 1: an image emotion recognition network is constructed based on a Convolutional Neural Network (CNN) framework, and image information is subjected to feature extraction through a stacked convolutional structure, so that the capability of capturing multi-dimensional features is achieved, the image features are further obtained, and the image information emotion states are obtained in a classified mode;
step 2: and constructing a text emotion recognition network based on a Recurrent Neural Network (RNN) framework. The RNN takes the output of the previous node as the input of the next node, so that the memory function of the RNN is realized, the model can better extract the characteristics of the long text information and recognize the emotional state of the text information;
and step 3: and constructing a mixed and fused multi-mode information fusion network. And performing decision-making level fusion on the image emotion label and the text emotion label by using a main classifier to obtain a fused main classification result. And performing feature level fusion on the image features and the text features by using an auxiliary classifier to obtain an auxiliary classification result. And fusing the main classification result and the auxiliary classification result to obtain the final emotional state. And constructing a feature fusion layer and a decision fusion layer, and comprehensively utilizing the correlation and complementarity between the two modal information to realize the final emotion recognition and classification task.
2. The method of claim 1, wherein the method comprises the following steps: the implementation method of the step 1 is that,
and constructing an image emotion recognition network by using a Convolutional Neural Network (CNN) for extracting image features and acquiring emotion classification. The portion may employ a variety of image feature extraction networks, such as VGGnet, Resnet, and the like. Inputting image emotion recognition network for image data in a format with the size of (B, C, H, W), wherein B is Batch size (Batch size), namely the number of pieces of image information input at the same time; c is the number of image channels, if the color image is RGB three channels, the gray image is single channel; h and W are the height and width of the image, respectively. The network extracts image characteristics I1, sends I1 to a full connection layer and obtains the final image information emotional state I, wherein I is a vector of [ batch _ size, num _ class ] dimension, and num _ class is the predicted category number.
3. The method of claim 1, wherein the method comprises the following steps: the implementation method of the step 2 is that,
and constructing a text emotion recognition network by using a Recurrent Neural Network (RNN) for extracting text features and acquiring emotion classification. The part can adopt various text feature extraction frameworks, such as LSTM, BilTM and other mainstream frameworks. For text data, each word in the text is input into a word embedding layer to be encoded to obtain a word vector, and the input dimension of the network model is [ batch _ size, seq _ len ], wherein the batch _ size is the size of the batch text, and the seq _ len is the length of a sentence. And after the specified word embedding layer is subjected to random initialization, the word vector dimension is [ batch _ size, seq _ len, embed _ size ], and the embed _ size is the word vector dimension. And inputting the obtained word vector into the RNN to obtain hidden layer vectors [ batch _ size, seq _ len, hidden _ size x 2] of all the moments, wherein hidden _ size is the size of a hidden layer. And extracting text characteristics T1 by the network, sending T1 into a full connection layer and acquiring a final text information emotional state T, wherein T is a vector of [ batch _ size, num _ class ] dimension, and num _ class is the predicted category number.
4. The method of claim 1, wherein the method comprises the following steps: the implementation method of the step 3 is that,
step 3.1: and constructing a main classifier for multi-modal information fusion. Splicing the image emotional state I and the text emotional state T and sending the spliced image emotional state I and the text emotional state T into a main classifier to obtain a main classification result (Class) with the dimension of 1 × 4;
step 3.2: acquiring image features and text feature weights of feature fusion, and performing cascading (collocation) operation on the image features and the text features on batch dimensions, wherein for image data, the feature weights are as follows:
wherein, B is the image batch size, and C is the image data channel number. For text data, the characteristic weights are:
wherein, B is the size of the text batch, and S is the length of the text. And mapping the two to a 0-1 interval through normalization to obtain a new feature Fused _ feature as follows:
and taking the new features as the input of an Auxiliary classifier to obtain an Auxiliary classification result (Auxiliary).
And 3.3, the fusion layer routes the input vector to a plurality of nodes in a dynamic routing mode and generates a final fusion vector through vector compression and splicing. Firstly, the input feature vector passes through a hidden layer:
u1=W1v1,u2=W2v2,
wherein v is1And v2W is the weight for the feature vector of the input text and image. And routing the feature vectors obtained in the last step to three nodes in a dynamic routing mode:
s1=c11u1+c12u2,
s2=c21u1+c22u2,
s3=c31u1+c32u2,
generating an auxiliary classifier with dimension 1 x 4 by compressing and splicing vectors:
v=Concat(Squash(si)),
step 3.4: fusing a main classification result and an auxiliary classification result by a decision-level fusion method, and acquiring a final classification result by using a softmax function:
Finally_class=softmax(Auxiliary+class)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110048664.5A CN112800875A (en) | 2021-01-14 | 2021-01-14 | Multi-mode emotion recognition method based on mixed feature fusion and decision fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110048664.5A CN112800875A (en) | 2021-01-14 | 2021-01-14 | Multi-mode emotion recognition method based on mixed feature fusion and decision fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112800875A true CN112800875A (en) | 2021-05-14 |
Family
ID=75810844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110048664.5A Pending CN112800875A (en) | 2021-01-14 | 2021-01-14 | Multi-mode emotion recognition method based on mixed feature fusion and decision fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112800875A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113673567A (en) * | 2021-07-20 | 2021-11-19 | 华南理工大学 | Panorama emotion recognition method and system based on multi-angle subregion self-adaption |
CN113688938A (en) * | 2021-09-07 | 2021-11-23 | 北京百度网讯科技有限公司 | Method for determining object emotion and method and device for training emotion classification model |
CN113988201A (en) * | 2021-11-03 | 2022-01-28 | 哈尔滨工程大学 | Multi-mode emotion classification method based on neural network |
CN114218380A (en) * | 2021-12-03 | 2022-03-22 | 淮阴工学院 | Multi-mode-based cold chain loading user portrait label extraction method and device |
CN114330454A (en) * | 2022-01-05 | 2022-04-12 | 东北农业大学 | Live pig cough sound identification method based on DS evidence theory fusion characteristics |
CN115034257A (en) * | 2022-05-09 | 2022-09-09 | 西北工业大学 | Cross-modal information target identification method and device based on feature fusion |
CN116383426A (en) * | 2023-05-30 | 2023-07-04 | 深圳大学 | Visual emotion recognition method, device, equipment and storage medium based on attribute |
CN116543283A (en) * | 2023-07-05 | 2023-08-04 | 合肥工业大学 | Multimode target detection method considering modal uncertainty |
CN116580436A (en) * | 2023-05-08 | 2023-08-11 | 长春理工大学 | Lightweight convolutional network facial emotion recognition method with auxiliary classifier |
CN116994069A (en) * | 2023-09-22 | 2023-11-03 | 武汉纺织大学 | Image analysis method and system based on multi-mode information |
CN117235605A (en) * | 2023-11-10 | 2023-12-15 | 湖南马栏山视频先进技术研究院有限公司 | Sensitive information classification method and device based on multi-mode attention fusion |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508640A (en) * | 2018-10-12 | 2019-03-22 | 咪咕文化科技有限公司 | A kind of crowd's sentiment analysis method, apparatus and storage medium |
CN109934260A (en) * | 2019-01-31 | 2019-06-25 | 中国科学院信息工程研究所 | Image, text and data fusion sensibility classification method and device based on random forest |
US20190311188A1 (en) * | 2018-12-05 | 2019-10-10 | Sichuan University | Face emotion recognition method based on dual-stream convolutional neural network |
CN110674339A (en) * | 2019-09-18 | 2020-01-10 | 北京工业大学 | Chinese song emotion classification method based on multi-mode fusion |
CN110826336A (en) * | 2019-09-18 | 2020-02-21 | 华南师范大学 | Emotion classification method, system, storage medium and equipment |
CN111881291A (en) * | 2020-06-19 | 2020-11-03 | 山东师范大学 | Text emotion classification method and system |
-
2021
- 2021-01-14 CN CN202110048664.5A patent/CN112800875A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508640A (en) * | 2018-10-12 | 2019-03-22 | 咪咕文化科技有限公司 | A kind of crowd's sentiment analysis method, apparatus and storage medium |
US20190311188A1 (en) * | 2018-12-05 | 2019-10-10 | Sichuan University | Face emotion recognition method based on dual-stream convolutional neural network |
CN109934260A (en) * | 2019-01-31 | 2019-06-25 | 中国科学院信息工程研究所 | Image, text and data fusion sensibility classification method and device based on random forest |
CN110674339A (en) * | 2019-09-18 | 2020-01-10 | 北京工业大学 | Chinese song emotion classification method based on multi-mode fusion |
CN110826336A (en) * | 2019-09-18 | 2020-02-21 | 华南师范大学 | Emotion classification method, system, storage medium and equipment |
CN111881291A (en) * | 2020-06-19 | 2020-11-03 | 山东师范大学 | Text emotion classification method and system |
Non-Patent Citations (2)
Title |
---|
张戈: "多模态连续维度情感识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
徐志栋 等: "基于胶囊网络的方面级情感分类研究", 《智能科学与技术学报》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113673567B (en) * | 2021-07-20 | 2023-07-21 | 华南理工大学 | Panorama emotion recognition method and system based on multi-angle sub-region self-adaption |
CN113673567A (en) * | 2021-07-20 | 2021-11-19 | 华南理工大学 | Panorama emotion recognition method and system based on multi-angle subregion self-adaption |
CN113688938A (en) * | 2021-09-07 | 2021-11-23 | 北京百度网讯科技有限公司 | Method for determining object emotion and method and device for training emotion classification model |
CN113688938B (en) * | 2021-09-07 | 2023-07-28 | 北京百度网讯科技有限公司 | Method for determining emotion of object, method and device for training emotion classification model |
CN113988201A (en) * | 2021-11-03 | 2022-01-28 | 哈尔滨工程大学 | Multi-mode emotion classification method based on neural network |
CN113988201B (en) * | 2021-11-03 | 2024-04-26 | 哈尔滨工程大学 | Multi-mode emotion classification method based on neural network |
CN114218380A (en) * | 2021-12-03 | 2022-03-22 | 淮阴工学院 | Multi-mode-based cold chain loading user portrait label extraction method and device |
CN114218380B (en) * | 2021-12-03 | 2022-07-29 | 淮阴工学院 | Multi-mode-based cold chain loading user portrait label extraction method and device |
CN114330454A (en) * | 2022-01-05 | 2022-04-12 | 东北农业大学 | Live pig cough sound identification method based on DS evidence theory fusion characteristics |
CN115034257A (en) * | 2022-05-09 | 2022-09-09 | 西北工业大学 | Cross-modal information target identification method and device based on feature fusion |
CN116580436A (en) * | 2023-05-08 | 2023-08-11 | 长春理工大学 | Lightweight convolutional network facial emotion recognition method with auxiliary classifier |
CN116383426A (en) * | 2023-05-30 | 2023-07-04 | 深圳大学 | Visual emotion recognition method, device, equipment and storage medium based on attribute |
CN116383426B (en) * | 2023-05-30 | 2023-08-22 | 深圳大学 | Visual emotion recognition method, device, equipment and storage medium based on attribute |
CN116543283A (en) * | 2023-07-05 | 2023-08-04 | 合肥工业大学 | Multimode target detection method considering modal uncertainty |
CN116543283B (en) * | 2023-07-05 | 2023-09-15 | 合肥工业大学 | Multimode target detection method considering modal uncertainty |
CN116994069A (en) * | 2023-09-22 | 2023-11-03 | 武汉纺织大学 | Image analysis method and system based on multi-mode information |
CN116994069B (en) * | 2023-09-22 | 2023-12-22 | 武汉纺织大学 | Image analysis method and system based on multi-mode information |
CN117235605A (en) * | 2023-11-10 | 2023-12-15 | 湖南马栏山视频先进技术研究院有限公司 | Sensitive information classification method and device based on multi-mode attention fusion |
CN117235605B (en) * | 2023-11-10 | 2024-02-02 | 湖南马栏山视频先进技术研究院有限公司 | Sensitive information classification method and device based on multi-mode attention fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112800875A (en) | Multi-mode emotion recognition method based on mixed feature fusion and decision fusion | |
CN108596039B (en) | Bimodal emotion recognition method and system based on 3D convolutional neural network | |
CN110046656B (en) | Multi-mode scene recognition method based on deep learning | |
Mino et al. | Logan: Generating logos with a generative adversarial neural network conditioned on color | |
CN111292765B (en) | Bimodal emotion recognition method integrating multiple deep learning models | |
CN112818861A (en) | Emotion classification method and system based on multi-mode context semantic features | |
CN112131383A (en) | Specific target emotion polarity classification method | |
CN109829499B (en) | Image-text data fusion emotion classification method and device based on same feature space | |
CN113343974B (en) | Multi-modal fusion classification optimization method considering inter-modal semantic distance measurement | |
CN111506732A (en) | Text multi-level label classification method | |
CN112580555B (en) | Spontaneous micro-expression recognition method | |
CN114662497A (en) | False news detection method based on cooperative neural network | |
CN114092742A (en) | Small sample image classification device and method based on multiple angles | |
CN112183465A (en) | Social relationship identification method based on character attributes and context | |
CN114283482A (en) | Facial expression recognition model of double-branch generation countermeasure network based on self-attention feature filtering classifier | |
CN113128284A (en) | Multi-mode emotion recognition method and device | |
CN111859925B (en) | Emotion analysis system and method based on probability emotion dictionary | |
Ruan et al. | Facial expression recognition in facial occlusion scenarios: A path selection multi-network | |
Sun et al. | Weak supervised learning based abnormal behavior detection | |
CN116758451A (en) | Audio-visual emotion recognition method and system based on multi-scale and global cross attention | |
CN112541469B (en) | Crowd counting method and system based on self-adaptive classification | |
CN112613405B (en) | Method for recognizing actions at any visual angle | |
Majumder et al. | Variational fusion for multimodal sentiment analysis | |
Soysal et al. | Facial action unit recognition using data mining integrated deep learning | |
Almana et al. | Real-time Arabic Sign Language Recognition using CNN and OpenCV |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210514 |
|
WD01 | Invention patent application deemed withdrawn after publication |