CN115631504A - Emotion identification method based on bimodal graph network information bottleneck - Google Patents
Emotion identification method based on bimodal graph network information bottleneck Download PDFInfo
- Publication number
- CN115631504A CN115631504A CN202211645853.1A CN202211645853A CN115631504A CN 115631504 A CN115631504 A CN 115631504A CN 202211645853 A CN202211645853 A CN 202211645853A CN 115631504 A CN115631504 A CN 115631504A
- Authority
- CN
- China
- Prior art keywords
- graph
- bimodal
- text
- image
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002902 bimodal effect Effects 0.000 title claims abstract description 69
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000008451 emotion Effects 0.000 title description 4
- 230000000007 visual effect Effects 0.000 claims abstract description 34
- 230000008909 emotion recognition Effects 0.000 claims abstract description 29
- 230000003993 interaction Effects 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000005516 engineering process Methods 0.000 claims abstract description 9
- 230000015654 memory Effects 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 238000011176 pooling Methods 0.000 claims abstract description 6
- 238000013528 artificial neural network Methods 0.000 claims abstract description 5
- 239000013598 vector Substances 0.000 claims description 18
- 125000004432 carbon atom Chemical group C* 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 12
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 230000002996 emotional effect Effects 0.000 claims description 5
- 150000001875 compounds Chemical class 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000012876 topography Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000013604 expression vector Substances 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19147—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19173—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/1918—Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an emotion recognition method based on bimodal graph network information bottleneck, which comprises the steps of preprocessing data, and respectively coding pictures and texts through corresponding pre-training models; respectively extracting the characteristics of the text and the image by using a long-short term memory network and a feedforward neural network; constructing a topological graph in the modality based on the grammar dependency relationship and the adjacent position relationship of the visual block, and constructing a dual-modality topological graph based on a complete bipartite graph; designing a modal interaction module based on a bimodal graph network, and realizing information interaction in and among the modalities by utilizing a graph convolution network; converting node representation of the bimodal topological graph into graph representation through a graph pooling technology; and (4) performing bimodal emotion recognition by adopting a multilayer perceptron. In addition, an information bottleneck module is established, and the generalization capability of the method is improved. The emotion recognition method based on the bimodal graph network information bottleneck can effectively fuse modal information and is used for guiding emotion recognition.
Description
Technical Field
The invention belongs to the field of bimodal emotion recognition in the fields of natural language processing and vision intersection, and particularly relates to an emotion recognition method based on bimodal graph network information bottleneck.
Background
The emotion recognition aims at mining subjective information in data by using a natural language processing technology, and is widely applied to various fields, such as: financial market forecasting, business review analysis, and the like. With the rapid development of internet technology, information in the internet gradually changes from plain text to bimodal, so that the existing emotion analysis method faces new challenges and opportunities. How to effectively extract and fuse features from bimodal data is key to bimodal emotion characterization.
General bimodal emotion recognition can be realized by splicing, adding and calculating Hadamard products of all monomodal features, but correlation among the modals cannot be obtained in the mode. Recently, a cross attention mechanism method is introduced to enhance the feature fusion of bimodal data; however, cross-attention merely establishes the association of global semantics of one modality with local features on another modality, and is not sufficient to reflect the alignment relationship of the modalities on the local features, and using a global feature representation of a modality for semantic alignment may generate a large noise. Furthermore, the attention-based methods have another drawback, and such methods usually require careful attention patterns, such as: multi-layer/multi-pass attention, multi-layer attention will introduce more parameters, increasing the likelihood of overfitting.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides an emotion recognition method based on bimodal graph network information bottleneck, and decomposes data of each modality into semantic units with fine granularity, such as: the text word and image visual block establishes the relation between the bimodal fine-grained semantic units by utilizing the relevance in each modality and among the modalities, so that bimodal feature fusion is directly performed among the fine-grained semantic units, namely, a mapping relation is established for the representation information of each modality by adopting a local alignment local mode, and the semantic information of a text and the local information of an image can be fully fused. In addition, an information bottleneck mechanism is added, so that the generalization capability of the method can be effectively improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
s1: preprocessing data, processing the text by adopting a word embedding technology Glove to obtain a text embedding matrix(ii) a The image is processed using an image processing technique ResNet152, where the image is cut into pieces prior to processingA visual block for obtaining an image representation matrix(ii) a Wherein,indicating the number of visual blocks.
S2: extracting the features of the preprocessed embedded expression, and extracting the text features by using a bidirectional long-short term memory networkExtracting image features using feed-forward neural networks。
S3: and constructing a topological graph by using the grammar dependency relationship in the text and the spatial position relationship in the image. The specific operation is as follows:
s31, constructing a topological graph in a text mode by taking words in the text as nodes and grammatical dependency relationship in a dependency tree as undirected edges。
S32: the visual blocks in the image are taken as nodes, and the spatial position relation between the visual blocks is taken as an undirected edgeConstructing a topology map within an image modality。
S33: taking words in a text and a visual block in an image as two groups of nodes, forming a non-directional edge by any node in the words and each node in the visual block, and constructing a complete bipartite graph as a dual-mode topological graph。
S4: and designing a modal interaction module based on a bimodal graph network, and performing representation learning by using a message transmission mechanism of the graph convolution network to realize information interaction and feature fusion in and among the modes. The specific operation is as follows:
s41: topological graph in text modeThe extracted text features are word node feature vectors, the expression learning of the word nodes is carried out through a graph convolution network, the information interaction in the text mode is realized, and the calculation formula is as follows:
in the above formula, the first and second carbon atoms are,in order to train the parameters, the user may,the function is activated for sigmoid.
S42: in topological graph in image modeThe image features extracted in S2 are visual block node feature vectors, the representation learning of the visual block nodes is carried out through a graph convolution network,the information interaction in the image modality is realized, and the calculation formula is as follows:
in the above formula, the first and second carbon atoms are,in order to train the parameters, the user may,the function is activated for sigmoid.
S43: in a bimodal topologyAs an adjacency matrix, splicing the text and image features extracted by S2 into a node feature vectorInformation aggregation is carried out through a graph convolution network, information fusion between modes is achieved, and a calculation formula is as follows:
in the above formula, the first and second carbon atoms are,in order to train the parameters, the user may,the function is activated for sigmoid.
S44: loops S41-S43 are set according to the specific parameters of the model.
S5: and an information bottleneck module is established, and the generalization capability of the method is improved. The specific operation is as follows:
s51: splicing the text embedding and the image embedding after the S1 data preprocessing to obtain the input characteristics of the information bottleneck module。
S52: splicing the text features and the image features extracted in the step S2 to obtain intermediate features of the information bottleneck module。
S53: splicing the text representation and the image representation after the modal interaction based on the bimodal graph network S4 to serve as the output characteristic of the information bottleneck module。
S54: the goal of the information bottleneck is to reduceAnd withMutual information between, increaseAndthe calculation formula is as follows:
in the above formula, the first and second carbon atoms are,the goal of the optimization required for the information bottleneck module,for the parameters of the emotion recognition method based on the bimodal graph network information bottleneck,is composed ofAnd withThe mutual information between the two groups is obtained,is composed ofAnd withThe mutual information between the two groups of the information,is an adjustable factor.
S6: obtaining a graph representation vector by adopting a graph pooling technology represented by all nodes in the spliced bimodal topological graph, wherein a calculation formula is as follows:
in the above formula, the first and second carbon atoms are,a graph representation vector representing the merged text and all node representations of the visual block,for all of the nodes in the bimodal topology map,as nodes after S4Is shown.
S7: and identifying bimodal emotional tendency by using a multi-layer perceptron as a classifier.
S8: the model is trained through bimodal data, a cross entropy loss function and an information bottleneck objective function are used as model training targets, and an Adam optimizer with hot start is used for training the model. The training goals for the model are as follows:
in the above formula, the first and second carbon atoms are,in order to train one sample in the set,for the set of all the training samples,is a coefficient which can be adjusted,for the parameters of the emotion recognition method based on the bimodal graph network information bottleneck,is the true value of the sample or samples,is a predicted value.
S9: and classifying the bimodal data to be classified through the trained model to obtain an emotion recognition result.
Compared with the existing bimodal emotion recognition method, the emotion recognition method based on the bimodal graph network information bottleneck has the following beneficial effects:
1. forming a bimodal topological graph by the text words and the visual blocks, and utilizing grammatical information of the text and spatial position information of the image;
2. the bi-modal topological graph establishes the relation between the bi-modal fine-grained semantic units, so that the multi-modal feature fusion is directly carried out between the fine-grained semantic units, the semantic information of texts and the local information of images can be fully fused, and the defects of the existing method are greatly supplemented;
3. by utilizing an information bottleneck mechanism, the generalization capability of the method is effectively improved.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a diagram of a system model of the present invention;
FIG. 3 is a module for constructing a bimodal topology of the present invention.
Detailed Description
In order that the public may better understand the present invention, specific embodiments thereof will be described below with reference to the accompanying drawings. Wherein the showings are for the purpose of illustration only and not for the purpose of limiting the invention; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The invention provides an emotion recognition method based on bimodal graph network information bottleneck, which comprises the following steps:
s1: and (3) data preprocessing, namely preprocessing the text and the image respectively through corresponding pre-training models.
As shown in FIG. 1, the text and image in the bimodal data are separated and then preprocessed separately. For texts, the representation of words is searched in pre-trained Glove, each word is mapped to a 300-dimensional vector, and a text embedding matrix is obtained(ii) a For images, it is first cut intoA visual block, and then adopting image processing techniqueThe operation ResNet152 processes each visual block, processes each visual block into 1024-dimensional expression vectors, and finally obtains an image embedding matrix(ii) a Wherein,indicating the number of visual blocks.
S2: and performing feature extraction on the preprocessed embedded representation.
As shown in fig. 1, the text embedding and the image embedding obtained in S1 are subjected to feature extraction, respectively.
Because the text has a front-back order relation, in order to integrate more context information into word embedding, a bidirectional long-short early-stage memory network is adopted to carry out context semantic dependency learning, and text characteristics are extracted. The specific calculation formula is as follows:
in the above formula, the first and second carbon atoms are,in order to forget to leave the door,in order to input the information into the gate,in order to output the output gate, the output gate is provided with a gate,is a vector of candidate values, and is,the memory cells of the last moment are the memory cells,the memory cells at the current time are the memory cells,for the hidden state representation at the last moment,is a hidden state representation of the current time instant,、、、and、、、indicating trainable parameters, subscripts, of long and short term memory networksRepresenting the index of the position of the current word in the text.
Because no sequence features exist among visual blocks of the image, the feedforward neural network is adopted to extract the image features. The specific calculation formula is as follows:
in the above formula, the first and second carbon atoms are,trainable parameters representing a feed forward neural network.
To facilitate the implementation of subsequent feature fusion, text featuresAnd image featuresIs set to 128.
S3: and constructing a topological graph by using the grammar dependency relationship in the text and the spatial position relationship in the image.
In order to solve the defects of the prior art, the alignment relation of each modality on local features is reflected. As shown in fig. 3, this step will construct three topologies, namely: two intra-modal topographies and one bi-modal topographies, the operation is as follows.
S31: for the text modality, complex grammatical dependencies exist between words, and modeling grammatical dependencies facilitate learning of text information. Therefore, a topological graph in a text mode is constructed by taking words in the text as nodes and grammar dependency relationship in a dependency tree as undirected edges。
S32: constructing a topological graph in an image mode by taking visual blocks in an image as nodes and taking spatial position relations among the visual blocks as undirected edges。
S33: establishing the relation between bimodal fine-grained semantic units, so that bimodal feature fusion can be directly carried out between the fine-grained semantic units, namely: and establishing a mapping relation for the representation information of each mode by adopting a local alignment local mode, so that the semantic information of the text and the local information of the image are fully fused. Therefore, a word in the text and a visual block in the image are used as two groups of nodes, any node in the word and each node in the visual block form an undirected edge, and a complete bipartite graph is constructed to serve as a bimodal topological graph。
S4: and designing a modal interaction module based on a bimodal graph network, and performing representation learning by using a message transfer mechanism of the graph convolution network to realize information interaction and feature fusion in and among the modes.
As shown in fig. 2, the text features extracted in S2And image featuresAnd sending the data to a bimodal graph network, and carrying out information interaction and feature fusion through a graph convolution network on the basis of a topological graph constructed in the S3.
S41: topological graph in text modeIn the form of a contiguous matrix, the matrix,the expression learning of the word nodes is carried out for the word node feature vectors through a graph convolution network, each word node transmits information to a neighbor word node with a grammar dependency relationship, and the information interaction in a text mode is realized, wherein the calculation formula is as follows:
in the above-mentioned formula, the compound has the following structure,in order to train the parameters, the user may,the function is activated for sigmoid.
S42: in topological graph in image modeIn the form of a contiguous matrix, the matrix,for the feature vectors of the visual block nodes, the representation learning of the visual block nodes is carried out through a graph convolution network, and the information transmission is carried out between the adjacent visual blocks, so as to realize the information interaction in the image modality, and the calculation formula is as follows:
in the above formula, the first and second carbon atoms are,in order to train the parameters, the user may,the function is activated for sigmoid.
S43: in a bimodal topologyAs an adjacent matrix, splicing the text and image features extracted by S2 into a node feature vectorInformation aggregation is carried out through a graph convolution network, all neighbor nodes of each node belong to another mode node, so that information fusion between modes is realized, and a calculation formula is as follows:
in the above formula, the first and second carbon atoms are,in order to train the parameters in a trainable manner,the function is activated for sigmoid.
S44: as shown in fig. 2, S41 to S43 form a convolutional network block, and after the model is parametrized, a better parameter value of the number of layers of the convolutional network block is obtained, and S41 to S43 are cycled according to the specific parameter value.
S5: and an information bottleneck module is established, and the generalization capability of the method is improved.
The information bottleneck module runs through the whole process of the method, and the specific operation is as follows.
S51: splicing the text embedding and the image embedding after the S1 data preprocessingObtaining input characteristics of information bottleneck module。
S52: splicing the text features and the image features extracted in the step S2 to obtain intermediate features of the information bottleneck module。
S53: s4, splicing the text representation and the image representation after modal interaction based on the bimodal graph network, wherein the text representation and the image representation are used as the output characteristics of the information bottleneck module。
S54: the goal of information bottlenecks is to reduceAnd withMutual information between, increaseAnd withThe mutual information between the two is calculated according to the following formula:
in the above-mentioned formula, the compound has the following structure,for the goal of the information bottleneck module requiring optimization,parameters of emotion recognition method based on bimodal graph network information bottleneckThe number of the first and second groups is,is composed ofAndthe mutual information between the two groups is obtained,is composed ofAndthe mutual information between the two groups is obtained,is an adjustable factor.
S6: a graph pooling technique is employed to convert the node representation of the bimodal topology graph into a graph representation.
The bimodal emotion recognition is to classify the overall emotional tendency of the data, and needs to combine the feature information of all nodes in the bimodal topological graph. Therefore, a graph pooling technology represented by all nodes in the spliced bimodal topological graph is adopted to obtain a graph representation vector, and a calculation formula is as follows:
in the above formula, the first and second carbon atoms are,a graph representation vector representing the merged text and all node representations of the visual block,for all of the nodes in the bimodal topology map,is a node after S4Is shown.
S7: and expressing the vector through the graph obtained in the S6, and identifying the bimodal emotional tendency by using a multilayer perceptron as a classifier, wherein a calculation formula is as follows:
in the above formula, the first and second carbon atoms are,for the bimodal characterization to be finally learned,the emotional tendency predicted for the model,andrepresenting a trainable weight that is to be weighted,andis a trainable bias.
S8: the model is trained through the bimodal data.
In the training process, a cross entropy loss function and an information bottleneck objective function are used as model training targets, and an Adam optimizer with hot start is used for training the model. Wherein the training targets of the model are as follows:
in the above formula, the first and second carbon atoms are,in order to train one sample in the set of samples,for the set of all the training samples,is a coefficient which can be adjusted,for the parameters of the emotion recognition method based on the bimodal graph network information bottleneck,is the true value of the sample or samples,is a predicted value.
S9: and classifying the bimodal data to be classified through the trained model to obtain an emotion recognition result.
The described embodiments of the present invention are only for describing the preferred embodiments of the present invention, and do not limit the concept and scope of the present invention, and various modifications and improvements made to the technical solutions of the present invention by those skilled in the art should fall into the protection scope of the present invention without departing from the design concept of the present invention, and the technical contents of the present invention which are claimed are all described in the claims.
Claims (7)
1. An emotion recognition method based on bimodal graph network information bottleneck is characterized by comprising the following steps:
s1: data preprocessing, namely preprocessing a text and an image respectively through corresponding pre-training models;
s2: extracting the features of the preprocessed embedded representation, and extracting the text features by using a bidirectional long-short term memory networkExtracting image features using feed-forward neural networks;
S3: constructing a topological graph by using a syntax dependency relationship in a text and a spatial position relationship in an image;
s4: designing a modal interaction module based on a bimodal graph network, and utilizing a message transmission mechanism of the graph convolution network to carry out representation learning so as to realize information interaction and feature fusion in and among the modes;
s5: an information bottleneck module is established, and the generalization capability of the method is improved;
s6: converting the node representation of the bimodal topological graph into a graph representation by adopting a graph pooling technology;
s7: identifying bimodal emotional tendency by taking a multilayer perceptron as a classifier;
s8: training the model through the bimodal data;
s9: and classifying the bimodal data to be classified through the trained model to obtain an emotion recognition result.
2. The emotion recognition method based on bimodal graph network information bottleneck, according to claim 1, wherein the S1 specifically is: text is processed by adopting word embedding technology Glove to obtain a text embedding matrix(ii) a The image is processed using an image processing technique ResNet152, where the image is cut into pieces prior to processingA visual block for obtaining an image representation matrix(ii) a Wherein,representing the number of visual blocks.
3. The emotion recognition method based on bimodal graph network information bottleneck, according to claim 1, wherein the specific step of S3 comprises:
s31, constructing a topological graph in a text mode by taking words in the text as nodes and grammatical dependency relationship in a dependency tree as undirected edges;
S32: constructing a topological graph in an image mode by taking visual blocks in an image as nodes and taking spatial position relations among the visual blocks as undirected edges;
4. The emotion recognition method based on bimodal graph network information bottleneck, according to claim 1, wherein the specific step of S4 comprises:
s41: topological graph in text modeThe extracted text features are word node feature vectors, and the expression learning of word nodes is carried out through a graph convolution network, so that information interaction in a text mode is realized;
s42: topology map in image modalityThe image features extracted in S2 are visual block node feature vectors, and the representation learning of the visual block nodes is carried out through a graph convolution network, so that information interaction in an image mode is realized;
s43: in a bimodal topologyAs an adjacent matrix, splicing the text and image features extracted by S2 into a node feature vectorInformation aggregation is carried out through a graph convolution network, and information fusion between modes is realized;
s44: loops S41-S43 are set according to the specific parameters of the model.
5. The emotion recognition method based on bimodal graph network information bottleneck, according to claim 1, wherein the specific step of S5 comprises:
s51: splicing the text embedding and the image embedding after the S1 data preprocessing to obtain the input characteristics of the information bottleneck module;
S52: splicing the text features and the image features extracted in the step S2 to obtain intermediate features of the information bottleneck module;
S53: s4, splicing the text representation and the image representation after modal interaction based on the bimodal graph network, wherein the text representation and the image representation are used as the output characteristics of the information bottleneck module;
S54: the goal of information bottlenecks is to reduceAnd withMutual information between, increaseAnd withThe mutual information between the two is calculated according to the following formula:
in the above-mentioned formula, the compound has the following structure,the goal of the optimization required for the information bottleneck module,for the parameters of the emotion recognition method based on the bimodal graph network information bottleneck,is composed ofAndthe mutual information between the two groups is obtained,is composed ofAndthe mutual information between the two groups is obtained,is an adjustable coefficient.
6. The emotion recognition method based on bimodal graph network information bottleneck, as claimed in claim 1, wherein the S6 specifically is: obtaining a graph representation vector by adopting a graph pooling technology of all node representations in the spliced bimodal topological graph, wherein a calculation formula is as follows:
7. The emotion recognition method based on bimodal graph network information bottleneck, as claimed in claim 1, wherein said S8 specifically is: using a cross entropy loss function and an information bottleneck objective function as a model training objective, and using an Adam optimizer with hot start to train the model; wherein the training targets of the model are as follows:
in the above-mentioned formula, the compound has the following structure,in order to train one sample in the set,for the set of all the training samples,is a coefficient which can be adjusted,for the parameters of the emotion recognition method based on the bimodal graph network information bottleneck,is the true value of the sample or samples,is a predicted value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211645853.1A CN115631504B (en) | 2022-12-21 | 2022-12-21 | Emotion identification method based on bimodal graph network information bottleneck |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211645853.1A CN115631504B (en) | 2022-12-21 | 2022-12-21 | Emotion identification method based on bimodal graph network information bottleneck |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115631504A true CN115631504A (en) | 2023-01-20 |
CN115631504B CN115631504B (en) | 2023-04-07 |
Family
ID=84910557
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211645853.1A Active CN115631504B (en) | 2022-12-21 | 2022-12-21 | Emotion identification method based on bimodal graph network information bottleneck |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115631504B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116304984A (en) * | 2023-03-14 | 2023-06-23 | 烟台大学 | Multi-modal intention recognition method and system based on contrast learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150379336A1 (en) * | 2014-06-27 | 2015-12-31 | Fujitsu Limited | Handwriting input conversion apparatus, computer-readable medium, and conversion method |
CN112860888A (en) * | 2021-01-26 | 2021-05-28 | 中山大学 | Attention mechanism-based bimodal emotion analysis method |
CN114511906A (en) * | 2022-01-20 | 2022-05-17 | 重庆邮电大学 | Cross-modal dynamic convolution-based video multi-modal emotion recognition method and device and computer equipment |
CN115363531A (en) * | 2022-08-22 | 2022-11-22 | 山东师范大学 | Epilepsy detection system based on bimodal electroencephalogram signal information bottleneck |
-
2022
- 2022-12-21 CN CN202211645853.1A patent/CN115631504B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150379336A1 (en) * | 2014-06-27 | 2015-12-31 | Fujitsu Limited | Handwriting input conversion apparatus, computer-readable medium, and conversion method |
CN112860888A (en) * | 2021-01-26 | 2021-05-28 | 中山大学 | Attention mechanism-based bimodal emotion analysis method |
CN114511906A (en) * | 2022-01-20 | 2022-05-17 | 重庆邮电大学 | Cross-modal dynamic convolution-based video multi-modal emotion recognition method and device and computer equipment |
CN115363531A (en) * | 2022-08-22 | 2022-11-22 | 山东师范大学 | Epilepsy detection system based on bimodal electroencephalogram signal information bottleneck |
Non-Patent Citations (1)
Title |
---|
范习健等: "一种融合视觉和听觉信息的双模态情感识别算法" * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116304984A (en) * | 2023-03-14 | 2023-06-23 | 烟台大学 | Multi-modal intention recognition method and system based on contrast learning |
Also Published As
Publication number | Publication date |
---|---|
CN115631504B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111488734B (en) | Emotional feature representation learning system and method based on global interaction and syntactic dependency | |
US20220147836A1 (en) | Method and device for text-enhanced knowledge graph joint representation learning | |
CN111079409B (en) | Emotion classification method utilizing context and aspect memory information | |
CN111061843A (en) | Knowledge graph guided false news detection method | |
CN111274398A (en) | Method and system for analyzing comment emotion of aspect-level user product | |
CN115034224A (en) | News event detection method and system integrating representation of multiple text semantic structure diagrams | |
CN110826639B (en) | Zero sample image classification method trained by full data | |
CN114444516B (en) | Cantonese rumor detection method based on deep semantic perception map convolutional network | |
CN113535904A (en) | Aspect level emotion analysis method based on graph neural network | |
CN110866542A (en) | Depth representation learning method based on feature controllable fusion | |
CN115221325A (en) | Text classification method based on label semantic learning and attention adjustment mechanism | |
CN111598183A (en) | Multi-feature fusion image description method | |
CN114528374A (en) | Movie comment emotion classification method and device based on graph neural network | |
CN114648031A (en) | Text aspect level emotion recognition method based on bidirectional LSTM and multi-head attention mechanism | |
CN114462420A (en) | False news detection method based on feature fusion model | |
CN114004220A (en) | Text emotion reason identification method based on CPC-ANN | |
CN115631504B (en) | Emotion identification method based on bimodal graph network information bottleneck | |
CN116933051A (en) | Multi-mode emotion recognition method and system for modal missing scene | |
CN111930981A (en) | Data processing method for sketch retrieval | |
CN116187349A (en) | Visual question-answering method based on scene graph relation information enhancement | |
CN114818719A (en) | Community topic classification method based on composite network and graph attention machine mechanism | |
CN110111365B (en) | Training method and device based on deep learning and target tracking method and device | |
CN113268592B (en) | Short text object emotion classification method based on multi-level interactive attention mechanism | |
CN113255360A (en) | Document rating method and device based on hierarchical self-attention network | |
CN116562286A (en) | Intelligent configuration event extraction method based on mixed graph attention |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |