CN113822340A - Image-text emotion recognition method based on attention mechanism - Google Patents
Image-text emotion recognition method based on attention mechanism Download PDFInfo
- Publication number
- CN113822340A CN113822340A CN202110992751.6A CN202110992751A CN113822340A CN 113822340 A CN113822340 A CN 113822340A CN 202110992751 A CN202110992751 A CN 202110992751A CN 113822340 A CN113822340 A CN 113822340A
- Authority
- CN
- China
- Prior art keywords
- text
- features
- picture
- layer
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000007246 mechanism Effects 0.000 title claims abstract description 27
- 230000008909 emotion recognition Effects 0.000 title claims abstract description 23
- 230000008451 emotion Effects 0.000 claims abstract description 30
- 230000004927 fusion Effects 0.000 claims abstract description 24
- 238000000605 extraction Methods 0.000 claims abstract description 10
- 230000003993 interaction Effects 0.000 claims abstract description 9
- 238000005065 mining Methods 0.000 claims abstract description 8
- 239000013598 vector Substances 0.000 claims description 19
- 238000004458 analytical method Methods 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 3
- 238000003058 natural language processing Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000012552 review Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 239000013589 supplement Substances 0.000 abstract description 2
- 230000002996 emotional effect Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a graph-text emotion recognition method based on an attention mechanism, which integrates classification results of all modes by introducing interaction between mining mode internal information and learning modes of the attention mechanism of fire and heat in recent years and designing a decision-level fusion rule aiming at the problem that the contribution of each mode to emotion classification is different to obtain a final emotion recognition result. And a decision-level fusion mode is adopted, and a fusion rule is designed to integrate the classification probability of each classifier, so that the final emotion recognition accuracy is improved. The method for recognizing the image-text comment emotion is beneficial to supplement and optimize in aspects of multi-mode feature extraction, feature fusion and the like, effectively excavates modal internal information, constructs interaction among the modalities, and improves the accuracy of image-text emotion recognition.
Description
Technical Field
The invention belongs to the field of computer vision and natural language processing, and is mainly used for emotion recognition of image-text comments on internet social media.
Background
With the rapid development of social media, users tend to express opinions and share experiences on social media platforms such as twitter, facebook and Xinlang microblog, contents published by the users are developing towards diversification of contents and forms, the users increasingly match character comments with drawings unlike conventional pure text comments, traditional text-based emotion analysis is evolved into multi-modal emotion analysis, the purpose is to automatically identify basic attitudes in comments, extract emotions of the users and understand behaviors of the users, and the method has important application significance in actual life.
How to effectively utilize information in visual content and text content in image-text comments in multi-modal sentiment analysis is a challenging problem, and compared with sentiment analysis in a single modality, a multi-modal sentiment analysis method should effectively fuse information between different modalities. At present, multi-mode emotion analysis has three problems, namely, information in each mode cannot be fully extracted, emotion of the picture cannot be abstracted from bottom-layer features and middle-layer features of the picture, a comment text has the characteristics of randomness, shortness and the like, important semantic information cannot be effectively mined by a traditional text representation method, information of each mode needs to be effectively fused, redundant information is removed while the information is supplemented, contribution degrees of each mode to emotion classification are different, and how to allocate the weight occupied by each mode is also a problem.
The attention mechanism simulates the focusing capacity of human eyes, pays attention to more important and valuable information, and can distribute reasonable weight for information of different dimensions in the same mode by introducing the attention mechanism, so that context information is accurately processed, and the problem that the contribution of pictures and texts to emotion classification is not equivalent can be solved by distributing the weight for different modes. The existing multi-modal feature fusion method can be mainly divided into data layer fusion, feature level fusion and decision level fusion. The data layer fusion is to unify collected different data sets together through a certain rule to form an integral data set, and the realization is complex and the obtained data often contains too much redundant information. The feature level fusion is to extract features of information of each mode, construct a joint vector, and input the joint vector into a classifier for emotion classification, and the common methods are splicing, bitwise adding and bitwise multiplying. The decision-level fusion is to construct classifiers of each mode respectively, and integrate the obtained classification results according to a certain rule to obtain a final emotion recognition result. Decision-level fusion is relatively simpler, and a decision-level fusion formula is properly designed to obtain considerable recognition accuracy.
Disclosure of Invention
The invention provides a graph-text emotion recognition method based on an attention mechanism aiming at network comments on Internet social media, and by introducing the interaction between mining modal internal information and learning modalities better in the attention mechanism of fire and heat in recent years, decision-level fusion rules are designed aiming at the problems of different emotion classification contributions of various modalities to integrate the classification results of the various modalities to obtain a final emotion recognition result.
The interaction between different modalities is constructed by introducing a self-attention mechanism to better mine emotional information inside the modalities and introducing a cross-attention mechanism. The basis for doing so is that the attention mechanism can make the model put into more attention resources in the parts of the model that focus on, in order to obtain more detailed information, weakens the attention to other parts that are relatively unimportant simultaneously, obtains higher value information from a large amount of information, has improved the efficiency of model processing. In the task of image-text comment emotion recognition, text features and picture features are obtained through preliminary feature extraction, because certain relation exists among information of all the modes, the pictures and the texts are respectively used as auxiliary information of each other by adding a cross-mode coding layer, the covered features can be deduced from alignment elements of all the modes, the relation among all the modes is found and constructed, so that information of different modes can interact, the text features, the picture features and the multi-mode features obtained through a cross attention mechanism are respectively input into a self-coding layer, and further feature selection is carried out through the self-attention mechanism. With careful design and combination of these self-attention and cross-attention layers, the present method is able to extract high quality text features, image features, and multi-modal features from the input data.
The method adopts a decision-level fusion mode, designs a fusion rule to integrate the classification probability of each classifier, and improves the final emotion recognition accuracy. The traditional feature level fusion simply combines text features and picture features, omits structural information coupling between the text and the picture, and has poor interpretability. The contribution of pictures and texts in the network comment data of the actual internet social media to emotion classification is not equivalent, the influence of different data on emotion classification results is large, and the decision-level fusion has the advantages that independent classifiers can be established in each mode, and the final decision result is obtained by giving different weights to the result of each classifier. The method analyzes the characteristics of each mode independently, sets a fusion rule, gives respective weight to the classification results of different modes, solves the problem that the contribution of different modes to emotion classification is not equivalent, and improves the identification accuracy.
The method for recognizing the image-text comment emotion is beneficial to supplement and optimize in aspects of multi-mode feature extraction, feature fusion and the like, effectively excavates modal internal information, constructs interaction among the modalities, and improves the accuracy of image-text emotion recognition.
The method comprises the following steps:
step 1, preprocessing the image-text comment data and converting the image-text comment data into a data format required by an input model.
And 2, performing primary feature extraction on the preprocessed text features and the preprocessed picture data by using the pre-trained model to obtain the text features and the picture features.
And 3, inputting the text features and the image features obtained in the step 2 as auxiliary information to a cross-modal coding layer, and learning the interaction between different modes by using a cross attention mechanism.
And 4, respectively inputting the text features, the picture features and the multi-mode features obtained in the step 3 into a self-attention coding layer to distribute reasonable weights for information of different dimensions in the features, and further selecting the features.
And 5, respectively inputting the text features, the picture features and the multi-mode features obtained in the step 4 into respective multilayer perceptrons to obtain emotion recognition results.
And 6, giving respective weights of the emotion classification probabilities obtained by the classifiers, and performing decision-level fusion in a weighting mode to obtain a final emotion classification result.
Drawings
FIG. 1 is a flow chart of the method of operation.
FIG. 2 is a model diagram of the method.
Fig. 3 illustrates a sample example of text review.
Detailed Description
The present invention is described in detail below with reference to examples and the accompanying drawings.
The embodiment of the invention only takes the graphic comment as an example, but the algorithm can be extended to any multi-modal sentiment classification problem. For the image-text comment sample shown in fig. 1, the emotional tendency is 'happy', a model is designed for the task, and after the model is optimally trained, a new image-text comment sample is input, so that the emotional tendency of the sample can be output. The following is a detailed description of the steps.
Step 1, preprocessing the image-text comment data and converting the image-text comment data into a data format required by an input model.
Data preprocessing is an important step in the method, especially for user comments from a social media platform, the data is original and unstructured, and the method mainly comprises the following preprocessing steps:
deleting special symbols: on a social media platform, the content published by a user usually contains some special symbols, such as an "@" symbol pointing to other users, and information behind the symbol is often related to user privacy and is not useful in an emotion analysis task, so that words after the @ are required to be deleted.
Word segmentation: the comment text is divided into words using common segmentation tools, which become the basic unit for further text processing.
Removing stop words: in natural language processing, certain words are filtered out because they are of little value (called "stop words"), and therefore, common stop words in text reviews are deleted.
Adjusting the pixel size of the picture: the picture is adjusted to 224 x 224 pixels.
And 2, performing primary feature extraction on the preprocessed text features and the preprocessed picture data by using the pre-trained model to obtain the text features and the picture features.
(1) Text feature extraction
Word sequence of text comments w obtained in step 1i,...wmWill specially mark [ CLS ]]Added to the beginning of a word sequence, special marks [ SEP ]]Added to the end of a word sequence, the word w is transformed by a pre-trained Roberta modeliMapped into a 768-dimensional vector, and the formula is as follows:
ti'=roberta(wi),ti'∈R768
(2) picture feature extraction
The picture extraction uses an advanced pre-trained model Resnet152, which has been pre-trained on the visual data of 1400 million images, the pictures collocated in the user comment have been processed to 224 × 224 pixel size by step 1, the Resnet152 model then pools the 7 × 7 meshes in the pictures evenly, generating 49 output vectors for each picture, the size of each vector is 2048 dimensions, and the formula is as follows:
ResNet(I)={r'i∈R2048}
and 3, inputting the text features and the image features obtained in the step 2 as auxiliary information to a cross-modal coding layer, and learning the interaction between different modes by using a cross attention mechanism.
The attention mechanism is intended to derive a context vector y from a set of context vectors y associated with a query vector xiAnd mining information in the data. An attention layer first computes a query vector x and each context vector yiThe matching score between them. The scores are then normalized by the softmax function, and the output of the attention layer is a weighted sum of the context vector and the normalized scores. The formula is as follows:
q in the formulai,ki,viRepresenting queries, keys, value vectors, respectively, which are computed as linear mappings from the input sequence,representing an attention map for predicting how different elements of an input sequence affect each other.
The invention adopts a trans-modal transformer coding layer to respectively mine emotion areas in pictures by utilizing text characteristics and mine emotion words related to the pictures in text description by utilizing the picture characteristics, each layer in the trans-modal coder consists of a bidirectional cross attention sublayer and two feedforward sublayers, and N layers are stacked in a cross modal codercAnd a layer using an input of the k-th layer as an output of the k + 1-th layer. Inside the k-th layer, first a bi-directional cross-attention sublayer is applied, which contains two unidirectional cross-attention sublayers: one from language to vision, one from vision to language:
the cross-attention layer is used to exchange information between the two modalities and align entities, fully mining the relevance and complementarity between the teletext data.
And 4, respectively inputting the text features, the picture features and the multi-mode features obtained in the step 3 into a self-attention coding layer to distribute reasonable weights for information of different dimensions in the features, and further selecting the features.
The picture obtained in the step 3 is specially usedAnd (3) obtaining a text feature T belonging to R after the feature is spliced with the text feature obtained in the step (2) through global average pooling65×768Splicing the text characteristics obtained in the step 3 with the picture characteristics obtained in the step 2 through global average pooling to obtain picture characteristics V E R17×768Splicing the text characteristics obtained in the step 3 and the picture characteristics obtained in the step 3 to obtain multi-mode combined characteristics M E R32×768。
The invention adopts a transform coding layer to respectively carry out further feature coding on text features T, picture features V and multi-mode joint features M, wherein each layer in the coder comprises a self-attention sublayer and a feedforward sublayer, the feedforward sublayer consists of two complete connection layers, and a residual link and normalization layer is added behind each sublayer. The text encoder and the picture encoder have N respectivelytLayer and NpLayer, self-attention layer formula as follows:
T'=text-attention(T,T,T),T'∈R65×768
V'=vision-attention(V,V,V),V'∈R17×768
M'=multimodal-attention(M,M,M),M'∈R32×768
and 5, respectively inputting the text features, the picture features and the multi-mode features obtained in the step 4 into respective multilayer perceptrons to obtain emotion recognition results.
From step 4, three outputs can be obtained, namely a text feature T ', a picture feature V ', and a multimodal feature M '. And respectively inputting the obtained three outputs into respective multilayer perceptrons to obtain the probability of each category:
P1(y|T')=MLP(T')
P2(y|V')=MLP(V')
P3(y|M')=MLP(M')
and 6, giving respective weights of the emotion classification probabilities obtained by the classifiers, and performing decision-level fusion in a weighting mode to obtain a final emotion classification result.
The emotion recognition rate expressions of the three classifiers of text, picture and multi-mode are obtained in the step 5:
Pi=(pi1,pi2,...pic)T(1≤i≤3)
pijis the recognition rate of the ith mode to the jth emotional state, c is the number of categories of emotion classification, | Pi|=1,pijE to {0,1} (i is more than or equal to 1 and less than or equal to 3, and j is more than or equal to 1 and less than or equal to c), and obtaining a multi-modal emotion recognition weighting matrix WiComprises the following steps:
linearly weighted fusion of the classification probability of each modality, the formula is as follows:
and selecting the category with the highest probability as a final recognition result according to a maximum rule, wherein the formula is as follows:
the learning rate is set to be 5e-5, the dropout rate is 0.1, the multi-head attention amount is 12, the whole model is trained for 12 epochs, the model provided by the method is optimally trained by using cross-classification cross entropy calculation based on mass Loss of back propagation, and the weight and the deviation are continuously adjusted, so that the Loss function achieves the convergence effect. The method is used for experiments on the image-text emotion recognition data set, and the accuracy is improved. And inputting the new image-text social comment sample into the trained model to obtain the emotion recognition result of the sample.
The above embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and the scope of the present invention is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present invention, and such modifications and equivalents should also be considered as falling within the scope of the present invention.
Claims (8)
1. A picture and text emotion recognition method based on an attention mechanism is characterized by comprising the following steps: the method comprises the following steps:
step 1, preprocessing image-text comment data and converting the image-text comment data into a data format required by an input model;
step 2, performing primary feature extraction on the preprocessed text features and the preprocessed picture data by using a pre-trained model to obtain text features and picture features;
step 3, inputting the text characteristics and the picture characteristics obtained in the step 2 as auxiliary information to a cross-modal coding layer, and learning the interaction between different modes by using a cross attention mechanism;
step 4, respectively inputting the text features, the picture features and the multi-modal features obtained in the step 3 into a self-attention coding layer to distribute reasonable weights for information of different dimensions in the features, and further selecting the features;
step 5, respectively inputting the text features, the picture features and the multi-mode features obtained in the step 4 into respective multilayer perceptrons to obtain emotion recognition results;
and 6, giving respective weights of the emotion classification probabilities obtained by the classifiers, and performing decision-level fusion in a weighting mode to obtain a final emotion classification result.
2. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein:
a data preprocessing step: deleting special symbols: on a social media platform, the content published by a user usually contains some special symbols, such as an "@" symbol pointing to other users, and information behind the symbol is often related to user privacy and is not useful in an emotion analysis task, so that words after the @ need to be deleted;
word segmentation: dividing the comment text into words by using a common word segmentation tool, wherein the words become basic units for further text processing; removing stop words: in natural language processing, common stop words in text reviews are deleted.
3. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein:
word sequence of text comments w obtained in step 1i,...wmWill specially mark [ CLS ]]Added to the beginning of a word sequence, special marks [ SEP ]]Added to the end of a word sequence, the word w is transformed by a pre-trained Roberta modeliMapping into 768-dimensional vector: the picture extraction employs an advanced pre-trained model Resnet 152.
4. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein: inputting the text characteristics and the picture characteristics obtained in the step (2) as auxiliary information to a cross-modal coding layer, and learning the interaction between different modes by using a cross attention mechanism; the attention mechanism is intended to derive a context vector y from a set of context vectors y associated with a query vector xiMining information in the data; an attention layer first computes a query vector x and each context vector yiA matching score therebetween; the scores are then normalized by the softmax function, and the output of the attention layer is a weighted sum of the context vector and the normalized scores.
5. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein:
respectively mining emotion areas in pictures by using text features and mining emotion words associated with the pictures in text description by using a cross-modal transformer coding layer, wherein each layer in the cross-modal coder consists of a two-way cross attention sublayer and two feedforward sublayers, and N layers are stacked in the cross-modal codercA layer using an input of a k-th layer as an output of a k + 1-th layer; inside the k-th layer, first a bi-directional cross-attention sublayer is applied, which contains two unidirectional cross-attention sublayers: one from speech to vision and one from vision to visionLanguage:
the cross-attention layer is used to exchange information between the two modalities and align entities, fully mining the relevance and complementarity between the teletext data.
6. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein:
respectively inputting the text features, the image features and the multi-modal features obtained in the step 3 into a self-attention coding layer to distribute reasonable weights for information of different dimensions in the features, and performing further feature selection;
splicing the picture characteristics obtained in the step 3 with the text characteristics obtained in the step 2 through global average pooling to obtain text characteristics T e R65×768Splicing the text characteristics obtained in the step 3 with the picture characteristics obtained in the step 2 through global average pooling to obtain picture characteristics V E R17×768Splicing the text characteristics obtained in the step 3 and the picture characteristics obtained in the step 3 to obtain multi-mode combined characteristics M E R32×768;
Respectively carrying out further feature coding on the text feature T, the picture feature V and the multi-mode combined feature M by adopting a transform coding layer, wherein each layer in a coder comprises a self-attention sublayer and a feedforward sublayer, the feedforward sublayer consists of two complete connection layers, and a residual link and normalization layer is added behind each sublayer; the text encoder and the picture encoder have N respectivelytLayer and NpAnd (3) a layer.
7. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein:
respectively inputting the text features, the picture features and the multi-mode features obtained in the step (4) into respective multilayer perceptrons to obtain emotion recognition results;
the three outputs obtained by step 4 are the text feature T ', the picture feature V ', and the multimodal feature M '; and respectively inputting the obtained three outputs into respective multilayer perceptrons to obtain the probability of each category.
8. The method for recognizing the graphics context based on the attention mechanism as claimed in claim 1, wherein:
giving respective weights of the emotion classification probabilities obtained by each classifier, and performing decision-level fusion in a weighting mode to obtain a final emotion classification result;
obtaining emotion recognition rates of the three classifiers of the text, the picture and the multi-mode in the step 5, and fusing the classification probability of each mode by linear weighting; and selecting the category with the highest probability as the final recognition result according to the maximum rule.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110992751.6A CN113822340A (en) | 2021-08-27 | 2021-08-27 | Image-text emotion recognition method based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110992751.6A CN113822340A (en) | 2021-08-27 | 2021-08-27 | Image-text emotion recognition method based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113822340A true CN113822340A (en) | 2021-12-21 |
Family
ID=78913663
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110992751.6A Pending CN113822340A (en) | 2021-08-27 | 2021-08-27 | Image-text emotion recognition method based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113822340A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114140673A (en) * | 2022-02-07 | 2022-03-04 | 人民中科(济南)智能技术有限公司 | Illegal image identification method, system and equipment |
CN114343670A (en) * | 2022-01-07 | 2022-04-15 | 北京师范大学 | Interpretation information generation method and electronic equipment |
CN115423050A (en) * | 2022-11-04 | 2022-12-02 | 暨南大学 | False news detection method and device, electronic equipment and storage medium |
CN116049397A (en) * | 2022-12-29 | 2023-05-02 | 北京霍因科技有限公司 | Sensitive information discovery and automatic classification method based on multi-mode fusion |
WO2024082891A1 (en) * | 2022-10-20 | 2024-04-25 | 华为技术有限公司 | Data processing method and related device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080249764A1 (en) * | 2007-03-01 | 2008-10-09 | Microsoft Corporation | Smart Sentiment Classifier for Product Reviews |
CN112132075A (en) * | 2020-09-28 | 2020-12-25 | 腾讯科技(深圳)有限公司 | Method and medium for processing image-text content |
CN112598067A (en) * | 2020-12-25 | 2021-04-02 | 中国联合网络通信集团有限公司 | Emotion classification method and device for event, electronic equipment and storage medium |
CN113065577A (en) * | 2021-03-09 | 2021-07-02 | 北京工业大学 | Multi-modal emotion classification method for targets |
CN113158875A (en) * | 2021-04-16 | 2021-07-23 | 重庆邮电大学 | Image-text emotion analysis method and system based on multi-mode interactive fusion network |
-
2021
- 2021-08-27 CN CN202110992751.6A patent/CN113822340A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080249764A1 (en) * | 2007-03-01 | 2008-10-09 | Microsoft Corporation | Smart Sentiment Classifier for Product Reviews |
CN112132075A (en) * | 2020-09-28 | 2020-12-25 | 腾讯科技(深圳)有限公司 | Method and medium for processing image-text content |
CN112598067A (en) * | 2020-12-25 | 2021-04-02 | 中国联合网络通信集团有限公司 | Emotion classification method and device for event, electronic equipment and storage medium |
CN113065577A (en) * | 2021-03-09 | 2021-07-02 | 北京工业大学 | Multi-modal emotion classification method for targets |
CN113158875A (en) * | 2021-04-16 | 2021-07-23 | 重庆邮电大学 | Image-text emotion analysis method and system based on multi-mode interactive fusion network |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114343670A (en) * | 2022-01-07 | 2022-04-15 | 北京师范大学 | Interpretation information generation method and electronic equipment |
CN114140673A (en) * | 2022-02-07 | 2022-03-04 | 人民中科(济南)智能技术有限公司 | Illegal image identification method, system and equipment |
WO2024082891A1 (en) * | 2022-10-20 | 2024-04-25 | 华为技术有限公司 | Data processing method and related device |
CN115423050A (en) * | 2022-11-04 | 2022-12-02 | 暨南大学 | False news detection method and device, electronic equipment and storage medium |
CN116049397A (en) * | 2022-12-29 | 2023-05-02 | 北京霍因科技有限公司 | Sensitive information discovery and automatic classification method based on multi-mode fusion |
CN116049397B (en) * | 2022-12-29 | 2024-01-02 | 北京霍因科技有限公司 | Sensitive information discovery and automatic classification method based on multi-mode fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | Image-text multimodal emotion classification via multi-view attentional network | |
CN110717047B (en) | Web service classification method based on graph convolution neural network | |
Gong et al. | Hashtag recommendation using attention-based convolutional neural network. | |
CN113065577A (en) | Multi-modal emotion classification method for targets | |
CN113822340A (en) | Image-text emotion recognition method based on attention mechanism | |
CN115033670A (en) | Cross-modal image-text retrieval method with multi-granularity feature fusion | |
CN113378989B (en) | Multi-mode data fusion method based on compound cooperative structure characteristic recombination network | |
CN107818084B (en) | Emotion analysis method fused with comment matching diagram | |
Sharma et al. | A survey of methods, datasets and evaluation metrics for visual question answering | |
Wang et al. | Docstruct: A multimodal method to extract hierarchy structure in document for general form understanding | |
CN109712108B (en) | Visual positioning method for generating network based on diversity discrimination candidate frame | |
CN110287323A (en) | A kind of object-oriented sensibility classification method | |
CN115034224A (en) | News event detection method and system integrating representation of multiple text semantic structure diagrams | |
CN111597341B (en) | Document-level relation extraction method, device, equipment and storage medium | |
CN115455970A (en) | Image-text combined named entity recognition method for multi-modal semantic collaborative interaction | |
CN114004220A (en) | Text emotion reason identification method based on CPC-ANN | |
CN114201605A (en) | Image emotion analysis method based on joint attribute modeling | |
CN113535949B (en) | Multi-modal combined event detection method based on pictures and sentences | |
Yang et al. | CLIP-KD: An Empirical Study of Distilling CLIP Models | |
Mahima et al. | A text-based hybrid approach for multiple emotion detection using contextual and semantic analysis | |
US20240119716A1 (en) | Method for multimodal emotion classification based on modal space assimilation and contrastive learning | |
CN117671460A (en) | Cross-modal image-text emotion analysis method based on hybrid fusion | |
CN117539999A (en) | Cross-modal joint coding-based multi-modal emotion analysis method | |
CN116822513A (en) | Named entity identification method integrating entity types and keyword features | |
CN116662924A (en) | Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |