CN113822340A - Image-text emotion recognition method based on attention mechanism - Google Patents

Image-text emotion recognition method based on attention mechanism Download PDF

Info

Publication number
CN113822340A
CN113822340A CN202110992751.6A CN202110992751A CN113822340A CN 113822340 A CN113822340 A CN 113822340A CN 202110992751 A CN202110992751 A CN 202110992751A CN 113822340 A CN113822340 A CN 113822340A
Authority
CN
China
Prior art keywords
text
features
picture
layer
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110992751.6A
Other languages
Chinese (zh)
Inventor
刘博�
徐毓笑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110992751.6A priority Critical patent/CN113822340A/en
Publication of CN113822340A publication Critical patent/CN113822340A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a graph-text emotion recognition method based on an attention mechanism, which integrates classification results of all modes by introducing interaction between mining mode internal information and learning modes of the attention mechanism of fire and heat in recent years and designing a decision-level fusion rule aiming at the problem that the contribution of each mode to emotion classification is different to obtain a final emotion recognition result. And a decision-level fusion mode is adopted, and a fusion rule is designed to integrate the classification probability of each classifier, so that the final emotion recognition accuracy is improved. The method for recognizing the image-text comment emotion is beneficial to supplement and optimize in aspects of multi-mode feature extraction, feature fusion and the like, effectively excavates modal internal information, constructs interaction among the modalities, and improves the accuracy of image-text emotion recognition.

Description

Image-text emotion recognition method based on attention mechanism
Technical Field
The invention belongs to the field of computer vision and natural language processing, and is mainly used for emotion recognition of image-text comments on internet social media.
Background
With the rapid development of social media, users tend to express opinions and share experiences on social media platforms such as twitter, facebook and Xinlang microblog, contents published by the users are developing towards diversification of contents and forms, the users increasingly match character comments with drawings unlike conventional pure text comments, traditional text-based emotion analysis is evolved into multi-modal emotion analysis, the purpose is to automatically identify basic attitudes in comments, extract emotions of the users and understand behaviors of the users, and the method has important application significance in actual life.
How to effectively utilize information in visual content and text content in image-text comments in multi-modal sentiment analysis is a challenging problem, and compared with sentiment analysis in a single modality, a multi-modal sentiment analysis method should effectively fuse information between different modalities. At present, multi-mode emotion analysis has three problems, namely, information in each mode cannot be fully extracted, emotion of the picture cannot be abstracted from bottom-layer features and middle-layer features of the picture, a comment text has the characteristics of randomness, shortness and the like, important semantic information cannot be effectively mined by a traditional text representation method, information of each mode needs to be effectively fused, redundant information is removed while the information is supplemented, contribution degrees of each mode to emotion classification are different, and how to allocate the weight occupied by each mode is also a problem.
The attention mechanism simulates the focusing capacity of human eyes, pays attention to more important and valuable information, and can distribute reasonable weight for information of different dimensions in the same mode by introducing the attention mechanism, so that context information is accurately processed, and the problem that the contribution of pictures and texts to emotion classification is not equivalent can be solved by distributing the weight for different modes. The existing multi-modal feature fusion method can be mainly divided into data layer fusion, feature level fusion and decision level fusion. The data layer fusion is to unify collected different data sets together through a certain rule to form an integral data set, and the realization is complex and the obtained data often contains too much redundant information. The feature level fusion is to extract features of information of each mode, construct a joint vector, and input the joint vector into a classifier for emotion classification, and the common methods are splicing, bitwise adding and bitwise multiplying. The decision-level fusion is to construct classifiers of each mode respectively, and integrate the obtained classification results according to a certain rule to obtain a final emotion recognition result. Decision-level fusion is relatively simpler, and a decision-level fusion formula is properly designed to obtain considerable recognition accuracy.
Disclosure of Invention
The invention provides a graph-text emotion recognition method based on an attention mechanism aiming at network comments on Internet social media, and by introducing the interaction between mining modal internal information and learning modalities better in the attention mechanism of fire and heat in recent years, decision-level fusion rules are designed aiming at the problems of different emotion classification contributions of various modalities to integrate the classification results of the various modalities to obtain a final emotion recognition result.
The interaction between different modalities is constructed by introducing a self-attention mechanism to better mine emotional information inside the modalities and introducing a cross-attention mechanism. The basis for doing so is that the attention mechanism can make the model put into more attention resources in the parts of the model that focus on, in order to obtain more detailed information, weakens the attention to other parts that are relatively unimportant simultaneously, obtains higher value information from a large amount of information, has improved the efficiency of model processing. In the task of image-text comment emotion recognition, text features and picture features are obtained through preliminary feature extraction, because certain relation exists among information of all the modes, the pictures and the texts are respectively used as auxiliary information of each other by adding a cross-mode coding layer, the covered features can be deduced from alignment elements of all the modes, the relation among all the modes is found and constructed, so that information of different modes can interact, the text features, the picture features and the multi-mode features obtained through a cross attention mechanism are respectively input into a self-coding layer, and further feature selection is carried out through the self-attention mechanism. With careful design and combination of these self-attention and cross-attention layers, the present method is able to extract high quality text features, image features, and multi-modal features from the input data.
The method adopts a decision-level fusion mode, designs a fusion rule to integrate the classification probability of each classifier, and improves the final emotion recognition accuracy. The traditional feature level fusion simply combines text features and picture features, omits structural information coupling between the text and the picture, and has poor interpretability. The contribution of pictures and texts in the network comment data of the actual internet social media to emotion classification is not equivalent, the influence of different data on emotion classification results is large, and the decision-level fusion has the advantages that independent classifiers can be established in each mode, and the final decision result is obtained by giving different weights to the result of each classifier. The method analyzes the characteristics of each mode independently, sets a fusion rule, gives respective weight to the classification results of different modes, solves the problem that the contribution of different modes to emotion classification is not equivalent, and improves the identification accuracy.
The method for recognizing the image-text comment emotion is beneficial to supplement and optimize in aspects of multi-mode feature extraction, feature fusion and the like, effectively excavates modal internal information, constructs interaction among the modalities, and improves the accuracy of image-text emotion recognition.
The method comprises the following steps:
step 1, preprocessing the image-text comment data and converting the image-text comment data into a data format required by an input model.
And 2, performing primary feature extraction on the preprocessed text features and the preprocessed picture data by using the pre-trained model to obtain the text features and the picture features.
And 3, inputting the text features and the image features obtained in the step 2 as auxiliary information to a cross-modal coding layer, and learning the interaction between different modes by using a cross attention mechanism.
And 4, respectively inputting the text features, the picture features and the multi-mode features obtained in the step 3 into a self-attention coding layer to distribute reasonable weights for information of different dimensions in the features, and further selecting the features.
And 5, respectively inputting the text features, the picture features and the multi-mode features obtained in the step 4 into respective multilayer perceptrons to obtain emotion recognition results.
And 6, giving respective weights of the emotion classification probabilities obtained by the classifiers, and performing decision-level fusion in a weighting mode to obtain a final emotion classification result.
Drawings
FIG. 1 is a flow chart of the method of operation.
FIG. 2 is a model diagram of the method.
Fig. 3 illustrates a sample example of text review.
Detailed Description
The present invention is described in detail below with reference to examples and the accompanying drawings.
The embodiment of the invention only takes the graphic comment as an example, but the algorithm can be extended to any multi-modal sentiment classification problem. For the image-text comment sample shown in fig. 1, the emotional tendency is 'happy', a model is designed for the task, and after the model is optimally trained, a new image-text comment sample is input, so that the emotional tendency of the sample can be output. The following is a detailed description of the steps.
Step 1, preprocessing the image-text comment data and converting the image-text comment data into a data format required by an input model.
Data preprocessing is an important step in the method, especially for user comments from a social media platform, the data is original and unstructured, and the method mainly comprises the following preprocessing steps:
deleting special symbols: on a social media platform, the content published by a user usually contains some special symbols, such as an "@" symbol pointing to other users, and information behind the symbol is often related to user privacy and is not useful in an emotion analysis task, so that words after the @ are required to be deleted.
Word segmentation: the comment text is divided into words using common segmentation tools, which become the basic unit for further text processing.
Removing stop words: in natural language processing, certain words are filtered out because they are of little value (called "stop words"), and therefore, common stop words in text reviews are deleted.
Adjusting the pixel size of the picture: the picture is adjusted to 224 x 224 pixels.
And 2, performing primary feature extraction on the preprocessed text features and the preprocessed picture data by using the pre-trained model to obtain the text features and the picture features.
(1) Text feature extraction
Word sequence of text comments w obtained in step 1i,...wmWill specially mark [ CLS ]]Added to the beginning of a word sequence, special marks [ SEP ]]Added to the end of a word sequence, the word w is transformed by a pre-trained Roberta modeliMapped into a 768-dimensional vector, and the formula is as follows:
ti'=roberta(wi),ti'∈R768
(2) picture feature extraction
The picture extraction uses an advanced pre-trained model Resnet152, which has been pre-trained on the visual data of 1400 million images, the pictures collocated in the user comment have been processed to 224 × 224 pixel size by step 1, the Resnet152 model then pools the 7 × 7 meshes in the pictures evenly, generating 49 output vectors for each picture, the size of each vector is 2048 dimensions, and the formula is as follows:
ResNet(I)={r'i∈R2048}
and 3, inputting the text features and the image features obtained in the step 2 as auxiliary information to a cross-modal coding layer, and learning the interaction between different modes by using a cross attention mechanism.
The attention mechanism is intended to derive a context vector y from a set of context vectors y associated with a query vector xiAnd mining information in the data. An attention layer first computes a query vector x and each context vector yiThe matching score between them. The scores are then normalized by the softmax function, and the output of the attention layer is a weighted sum of the context vector and the normalized scores. The formula is as follows:
Figure BDA0003232968160000061
Figure BDA0003232968160000062
q in the formulai,ki,viRepresenting queries, keys, value vectors, respectively, which are computed as linear mappings from the input sequence,
Figure BDA0003232968160000071
representing an attention map for predicting how different elements of an input sequence affect each other.
The invention adopts a trans-modal transformer coding layer to respectively mine emotion areas in pictures by utilizing text characteristics and mine emotion words related to the pictures in text description by utilizing the picture characteristics, each layer in the trans-modal coder consists of a bidirectional cross attention sublayer and two feedforward sublayers, and N layers are stacked in a cross modal codercAnd a layer using an input of the k-th layer as an output of the k + 1-th layer. Inside the k-th layer, first a bi-directional cross-attention sublayer is applied, which contains two unidirectional cross-attention sublayers: one from language to vision, one from vision to language:
Figure BDA0003232968160000072
Figure BDA0003232968160000073
the cross-attention layer is used to exchange information between the two modalities and align entities, fully mining the relevance and complementarity between the teletext data.
And 4, respectively inputting the text features, the picture features and the multi-mode features obtained in the step 3 into a self-attention coding layer to distribute reasonable weights for information of different dimensions in the features, and further selecting the features.
The picture obtained in the step 3 is specially usedAnd (3) obtaining a text feature T belonging to R after the feature is spliced with the text feature obtained in the step (2) through global average pooling65×768Splicing the text characteristics obtained in the step 3 with the picture characteristics obtained in the step 2 through global average pooling to obtain picture characteristics V E R17×768Splicing the text characteristics obtained in the step 3 and the picture characteristics obtained in the step 3 to obtain multi-mode combined characteristics M E R32×768
The invention adopts a transform coding layer to respectively carry out further feature coding on text features T, picture features V and multi-mode joint features M, wherein each layer in the coder comprises a self-attention sublayer and a feedforward sublayer, the feedforward sublayer consists of two complete connection layers, and a residual link and normalization layer is added behind each sublayer. The text encoder and the picture encoder have N respectivelytLayer and NpLayer, self-attention layer formula as follows:
T'=text-attention(T,T,T),T'∈R65×768
V'=vision-attention(V,V,V),V'∈R17×768
M'=multimodal-attention(M,M,M),M'∈R32×768
and 5, respectively inputting the text features, the picture features and the multi-mode features obtained in the step 4 into respective multilayer perceptrons to obtain emotion recognition results.
From step 4, three outputs can be obtained, namely a text feature T ', a picture feature V ', and a multimodal feature M '. And respectively inputting the obtained three outputs into respective multilayer perceptrons to obtain the probability of each category:
P1(y|T')=MLP(T')
P2(y|V')=MLP(V')
P3(y|M')=MLP(M')
and 6, giving respective weights of the emotion classification probabilities obtained by the classifiers, and performing decision-level fusion in a weighting mode to obtain a final emotion classification result.
The emotion recognition rate expressions of the three classifiers of text, picture and multi-mode are obtained in the step 5:
Pi=(pi1,pi2,...pic)T(1≤i≤3)
pijis the recognition rate of the ith mode to the jth emotional state, c is the number of categories of emotion classification, | Pi|=1,pijE to {0,1} (i is more than or equal to 1 and less than or equal to 3, and j is more than or equal to 1 and less than or equal to c), and obtaining a multi-modal emotion recognition weighting matrix WiComprises the following steps:
Figure BDA0003232968160000091
linearly weighted fusion of the classification probability of each modality, the formula is as follows:
Figure BDA0003232968160000092
and selecting the category with the highest probability as a final recognition result according to a maximum rule, wherein the formula is as follows:
Figure BDA0003232968160000093
the learning rate is set to be 5e-5, the dropout rate is 0.1, the multi-head attention amount is 12, the whole model is trained for 12 epochs, the model provided by the method is optimally trained by using cross-classification cross entropy calculation based on mass Loss of back propagation, and the weight and the deviation are continuously adjusted, so that the Loss function achieves the convergence effect. The method is used for experiments on the image-text emotion recognition data set, and the accuracy is improved. And inputting the new image-text social comment sample into the trained model to obtain the emotion recognition result of the sample.
The above embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and the scope of the present invention is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present invention, and such modifications and equivalents should also be considered as falling within the scope of the present invention.

Claims (8)

1. A picture and text emotion recognition method based on an attention mechanism is characterized by comprising the following steps: the method comprises the following steps:
step 1, preprocessing image-text comment data and converting the image-text comment data into a data format required by an input model;
step 2, performing primary feature extraction on the preprocessed text features and the preprocessed picture data by using a pre-trained model to obtain text features and picture features;
step 3, inputting the text characteristics and the picture characteristics obtained in the step 2 as auxiliary information to a cross-modal coding layer, and learning the interaction between different modes by using a cross attention mechanism;
step 4, respectively inputting the text features, the picture features and the multi-modal features obtained in the step 3 into a self-attention coding layer to distribute reasonable weights for information of different dimensions in the features, and further selecting the features;
step 5, respectively inputting the text features, the picture features and the multi-mode features obtained in the step 4 into respective multilayer perceptrons to obtain emotion recognition results;
and 6, giving respective weights of the emotion classification probabilities obtained by the classifiers, and performing decision-level fusion in a weighting mode to obtain a final emotion classification result.
2. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein:
a data preprocessing step: deleting special symbols: on a social media platform, the content published by a user usually contains some special symbols, such as an "@" symbol pointing to other users, and information behind the symbol is often related to user privacy and is not useful in an emotion analysis task, so that words after the @ need to be deleted;
word segmentation: dividing the comment text into words by using a common word segmentation tool, wherein the words become basic units for further text processing; removing stop words: in natural language processing, common stop words in text reviews are deleted.
3. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein:
word sequence of text comments w obtained in step 1i,...wmWill specially mark [ CLS ]]Added to the beginning of a word sequence, special marks [ SEP ]]Added to the end of a word sequence, the word w is transformed by a pre-trained Roberta modeliMapping into 768-dimensional vector: the picture extraction employs an advanced pre-trained model Resnet 152.
4. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein: inputting the text characteristics and the picture characteristics obtained in the step (2) as auxiliary information to a cross-modal coding layer, and learning the interaction between different modes by using a cross attention mechanism; the attention mechanism is intended to derive a context vector y from a set of context vectors y associated with a query vector xiMining information in the data; an attention layer first computes a query vector x and each context vector yiA matching score therebetween; the scores are then normalized by the softmax function, and the output of the attention layer is a weighted sum of the context vector and the normalized scores.
5. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein:
respectively mining emotion areas in pictures by using text features and mining emotion words associated with the pictures in text description by using a cross-modal transformer coding layer, wherein each layer in the cross-modal coder consists of a two-way cross attention sublayer and two feedforward sublayers, and N layers are stacked in the cross-modal codercA layer using an input of a k-th layer as an output of a k + 1-th layer; inside the k-th layer, first a bi-directional cross-attention sublayer is applied, which contains two unidirectional cross-attention sublayers: one from speech to vision and one from vision to visionLanguage:
Figure FDA0003232968150000031
Figure FDA0003232968150000032
the cross-attention layer is used to exchange information between the two modalities and align entities, fully mining the relevance and complementarity between the teletext data.
6. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein:
respectively inputting the text features, the image features and the multi-modal features obtained in the step 3 into a self-attention coding layer to distribute reasonable weights for information of different dimensions in the features, and performing further feature selection;
splicing the picture characteristics obtained in the step 3 with the text characteristics obtained in the step 2 through global average pooling to obtain text characteristics T e R65×768Splicing the text characteristics obtained in the step 3 with the picture characteristics obtained in the step 2 through global average pooling to obtain picture characteristics V E R17×768Splicing the text characteristics obtained in the step 3 and the picture characteristics obtained in the step 3 to obtain multi-mode combined characteristics M E R32×768
Respectively carrying out further feature coding on the text feature T, the picture feature V and the multi-mode combined feature M by adopting a transform coding layer, wherein each layer in a coder comprises a self-attention sublayer and a feedforward sublayer, the feedforward sublayer consists of two complete connection layers, and a residual link and normalization layer is added behind each sublayer; the text encoder and the picture encoder have N respectivelytLayer and NpAnd (3) a layer.
7. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein:
respectively inputting the text features, the picture features and the multi-mode features obtained in the step (4) into respective multilayer perceptrons to obtain emotion recognition results;
the three outputs obtained by step 4 are the text feature T ', the picture feature V ', and the multimodal feature M '; and respectively inputting the obtained three outputs into respective multilayer perceptrons to obtain the probability of each category.
8. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein:
giving respective weights of the emotion classification probabilities obtained by each classifier, and performing decision-level fusion in a weighting mode to obtain a final emotion classification result;
obtaining emotion recognition rates of the three classifiers of the text, the picture and the multi-mode in the step 5, and fusing the classification probability of each mode by linear weighting; and selecting the category with the highest probability as the final recognition result according to the maximum rule.
CN202110992751.6A 2021-08-27 2021-08-27 Image-text emotion recognition method based on attention mechanism Pending CN113822340A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110992751.6A CN113822340A (en) 2021-08-27 2021-08-27 Image-text emotion recognition method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110992751.6A CN113822340A (en) 2021-08-27 2021-08-27 Image-text emotion recognition method based on attention mechanism

Publications (1)

Publication Number Publication Date
CN113822340A true CN113822340A (en) 2021-12-21

Family

ID=78913663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110992751.6A Pending CN113822340A (en) 2021-08-27 2021-08-27 Image-text emotion recognition method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN113822340A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140673A (en) * 2022-02-07 2022-03-04 人民中科(济南)智能技术有限公司 Illegal image identification method, system and equipment
CN114343670A (en) * 2022-01-07 2022-04-15 北京师范大学 Interpretation information generation method and electronic equipment
CN115423050A (en) * 2022-11-04 2022-12-02 暨南大学 False news detection method and device, electronic equipment and storage medium
CN116049397A (en) * 2022-12-29 2023-05-02 北京霍因科技有限公司 Sensitive information discovery and automatic classification method based on multi-mode fusion
WO2024082891A1 (en) * 2022-10-20 2024-04-25 华为技术有限公司 Data processing method and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN112132075A (en) * 2020-09-28 2020-12-25 腾讯科技(深圳)有限公司 Method and medium for processing image-text content
CN112598067A (en) * 2020-12-25 2021-04-02 中国联合网络通信集团有限公司 Emotion classification method and device for event, electronic equipment and storage medium
CN113065577A (en) * 2021-03-09 2021-07-02 北京工业大学 Multi-modal emotion classification method for targets
CN113158875A (en) * 2021-04-16 2021-07-23 重庆邮电大学 Image-text emotion analysis method and system based on multi-mode interactive fusion network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN112132075A (en) * 2020-09-28 2020-12-25 腾讯科技(深圳)有限公司 Method and medium for processing image-text content
CN112598067A (en) * 2020-12-25 2021-04-02 中国联合网络通信集团有限公司 Emotion classification method and device for event, electronic equipment and storage medium
CN113065577A (en) * 2021-03-09 2021-07-02 北京工业大学 Multi-modal emotion classification method for targets
CN113158875A (en) * 2021-04-16 2021-07-23 重庆邮电大学 Image-text emotion analysis method and system based on multi-mode interactive fusion network

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114343670A (en) * 2022-01-07 2022-04-15 北京师范大学 Interpretation information generation method and electronic equipment
CN114140673A (en) * 2022-02-07 2022-03-04 人民中科(济南)智能技术有限公司 Illegal image identification method, system and equipment
WO2024082891A1 (en) * 2022-10-20 2024-04-25 华为技术有限公司 Data processing method and related device
CN115423050A (en) * 2022-11-04 2022-12-02 暨南大学 False news detection method and device, electronic equipment and storage medium
CN116049397A (en) * 2022-12-29 2023-05-02 北京霍因科技有限公司 Sensitive information discovery and automatic classification method based on multi-mode fusion
CN116049397B (en) * 2022-12-29 2024-01-02 北京霍因科技有限公司 Sensitive information discovery and automatic classification method based on multi-mode fusion

Similar Documents

Publication Publication Date Title
Yang et al. Image-text multimodal emotion classification via multi-view attentional network
CN110717047B (en) Web service classification method based on graph convolution neural network
Gong et al. Hashtag recommendation using attention-based convolutional neural network.
CN113065577A (en) Multi-modal emotion classification method for targets
CN113822340A (en) Image-text emotion recognition method based on attention mechanism
CN115033670A (en) Cross-modal image-text retrieval method with multi-granularity feature fusion
CN113378989B (en) Multi-mode data fusion method based on compound cooperative structure characteristic recombination network
CN107818084B (en) Emotion analysis method fused with comment matching diagram
Sharma et al. A survey of methods, datasets and evaluation metrics for visual question answering
Wang et al. Docstruct: A multimodal method to extract hierarchy structure in document for general form understanding
CN109712108B (en) Visual positioning method for generating network based on diversity discrimination candidate frame
CN110287323A (en) A kind of object-oriented sensibility classification method
CN115034224A (en) News event detection method and system integrating representation of multiple text semantic structure diagrams
CN111597341B (en) Document-level relation extraction method, device, equipment and storage medium
CN115455970A (en) Image-text combined named entity recognition method for multi-modal semantic collaborative interaction
CN114004220A (en) Text emotion reason identification method based on CPC-ANN
CN114201605A (en) Image emotion analysis method based on joint attribute modeling
CN113535949B (en) Multi-modal combined event detection method based on pictures and sentences
Yang et al. CLIP-KD: An Empirical Study of Distilling CLIP Models
Mahima et al. A text-based hybrid approach for multiple emotion detection using contextual and semantic analysis
US20240119716A1 (en) Method for multimodal emotion classification based on modal space assimilation and contrastive learning
CN117671460A (en) Cross-modal image-text emotion analysis method based on hybrid fusion
CN117539999A (en) Cross-modal joint coding-based multi-modal emotion analysis method
CN116822513A (en) Named entity identification method integrating entity types and keyword features
CN116662924A (en) Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination