CN115631504A - Emotion identification method based on bimodal graph network information bottleneck - Google Patents

Emotion identification method based on bimodal graph network information bottleneck Download PDF

Info

Publication number
CN115631504A
CN115631504A CN202211645853.1A CN202211645853A CN115631504A CN 115631504 A CN115631504 A CN 115631504A CN 202211645853 A CN202211645853 A CN 202211645853A CN 115631504 A CN115631504 A CN 115631504A
Authority
CN
China
Prior art keywords
graph
bimodal
text
image
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211645853.1A
Other languages
Chinese (zh)
Other versions
CN115631504B (en
Inventor
李丽
李平
苟丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Petroleum University
Original Assignee
Southwest Petroleum University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Petroleum University filed Critical Southwest Petroleum University
Priority to CN202211645853.1A priority Critical patent/CN115631504B/en
Publication of CN115631504A publication Critical patent/CN115631504A/en
Application granted granted Critical
Publication of CN115631504B publication Critical patent/CN115631504B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an emotion recognition method based on bimodal graph network information bottleneck, which comprises the steps of preprocessing data, and respectively coding pictures and texts through corresponding pre-training models; respectively extracting the characteristics of the text and the image by using a long-short term memory network and a feedforward neural network; constructing a topological graph in the modality based on the grammar dependency relationship and the adjacent position relationship of the visual block, and constructing a dual-modality topological graph based on a complete bipartite graph; designing a modal interaction module based on a bimodal graph network, and realizing information interaction in and among the modalities by utilizing a graph convolution network; converting node representation of the bimodal topological graph into graph representation through a graph pooling technology; and (4) performing bimodal emotion recognition by adopting a multilayer perceptron. In addition, an information bottleneck module is established, and the generalization capability of the method is improved. The emotion recognition method based on the bimodal graph network information bottleneck can effectively fuse modal information and is used for guiding emotion recognition.

Description

Emotion identification method based on bimodal graph network information bottleneck
Technical Field
The invention belongs to the field of bimodal emotion recognition in the fields of natural language processing and vision intersection, and particularly relates to an emotion recognition method based on bimodal graph network information bottleneck.
Background
The emotion recognition aims at mining subjective information in data by using a natural language processing technology, and is widely applied to various fields, such as: financial market forecasting, business review analysis, and the like. With the rapid development of internet technology, information in the internet gradually changes from plain text to bimodal, so that the existing emotion analysis method faces new challenges and opportunities. How to effectively extract and fuse features from bimodal data is key to bimodal emotion characterization.
General bimodal emotion recognition can be realized by splicing, adding and calculating Hadamard products of all monomodal features, but correlation among the modals cannot be obtained in the mode. Recently, a cross attention mechanism method is introduced to enhance the feature fusion of bimodal data; however, cross-attention merely establishes the association of global semantics of one modality with local features on another modality, and is not sufficient to reflect the alignment relationship of the modalities on the local features, and using a global feature representation of a modality for semantic alignment may generate a large noise. Furthermore, the attention-based methods have another drawback, and such methods usually require careful attention patterns, such as: multi-layer/multi-pass attention, multi-layer attention will introduce more parameters, increasing the likelihood of overfitting.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides an emotion recognition method based on bimodal graph network information bottleneck, and decomposes data of each modality into semantic units with fine granularity, such as: the text word and image visual block establishes the relation between the bimodal fine-grained semantic units by utilizing the relevance in each modality and among the modalities, so that bimodal feature fusion is directly performed among the fine-grained semantic units, namely, a mapping relation is established for the representation information of each modality by adopting a local alignment local mode, and the semantic information of a text and the local information of an image can be fully fused. In addition, an information bottleneck mechanism is added, so that the generalization capability of the method can be effectively improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
s1: preprocessing data, processing the text by adopting a word embedding technology Glove to obtain a text embedding matrix
Figure 861635DEST_PATH_IMAGE001
(ii) a The image is processed using an image processing technique ResNet152, where the image is cut into pieces prior to processing
Figure 944998DEST_PATH_IMAGE002
A visual block for obtaining an image representation matrix
Figure 944178DEST_PATH_IMAGE003
(ii) a Wherein,
Figure 759687DEST_PATH_IMAGE002
indicating the number of visual blocks.
S2: extracting the features of the preprocessed embedded expression, and extracting the text features by using a bidirectional long-short term memory network
Figure 80203DEST_PATH_IMAGE004
Extracting image features using feed-forward neural networks
Figure 373781DEST_PATH_IMAGE005
S3: and constructing a topological graph by using the grammar dependency relationship in the text and the spatial position relationship in the image. The specific operation is as follows:
s31, constructing a topological graph in a text mode by taking words in the text as nodes and grammatical dependency relationship in a dependency tree as undirected edges
Figure 961889DEST_PATH_IMAGE006
S32: the visual blocks in the image are taken as nodes, and the spatial position relation between the visual blocks is taken as an undirected edgeConstructing a topology map within an image modality
Figure 213879DEST_PATH_IMAGE007
S33: taking words in a text and a visual block in an image as two groups of nodes, forming a non-directional edge by any node in the words and each node in the visual block, and constructing a complete bipartite graph as a dual-mode topological graph
Figure 35073DEST_PATH_IMAGE008
S4: and designing a modal interaction module based on a bimodal graph network, and performing representation learning by using a message transmission mechanism of the graph convolution network to realize information interaction and feature fusion in and among the modes. The specific operation is as follows:
s41: topological graph in text mode
Figure 538867DEST_PATH_IMAGE009
The extracted text features are word node feature vectors, the expression learning of the word nodes is carried out through a graph convolution network, the information interaction in the text mode is realized, and the calculation formula is as follows:
Figure 732213DEST_PATH_IMAGE010
in the above formula, the first and second carbon atoms are,
Figure 155104DEST_PATH_IMAGE011
in order to train the parameters, the user may,
Figure 948747DEST_PATH_IMAGE012
the function is activated for sigmoid.
S42: in topological graph in image mode
Figure 584128DEST_PATH_IMAGE013
The image features extracted in S2 are visual block node feature vectors, the representation learning of the visual block nodes is carried out through a graph convolution network,the information interaction in the image modality is realized, and the calculation formula is as follows:
Figure 661674DEST_PATH_IMAGE014
in the above formula, the first and second carbon atoms are,
Figure 865254DEST_PATH_IMAGE015
in order to train the parameters, the user may,
Figure 5248DEST_PATH_IMAGE012
the function is activated for sigmoid.
S43: in a bimodal topology
Figure 444320DEST_PATH_IMAGE008
As an adjacency matrix, splicing the text and image features extracted by S2 into a node feature vector
Figure 612258DEST_PATH_IMAGE016
Information aggregation is carried out through a graph convolution network, information fusion between modes is achieved, and a calculation formula is as follows:
Figure 111373DEST_PATH_IMAGE017
in the above formula, the first and second carbon atoms are,
Figure 348450DEST_PATH_IMAGE018
in order to train the parameters, the user may,
Figure 591213DEST_PATH_IMAGE012
the function is activated for sigmoid.
S44: loops S41-S43 are set according to the specific parameters of the model.
S5: and an information bottleneck module is established, and the generalization capability of the method is improved. The specific operation is as follows:
s51: splicing the text embedding and the image embedding after the S1 data preprocessing to obtain the input characteristics of the information bottleneck module
Figure 377772DEST_PATH_IMAGE019
S52: splicing the text features and the image features extracted in the step S2 to obtain intermediate features of the information bottleneck module
Figure 188733DEST_PATH_IMAGE020
S53: splicing the text representation and the image representation after the modal interaction based on the bimodal graph network S4 to serve as the output characteristic of the information bottleneck module
Figure 568899DEST_PATH_IMAGE021
S54: the goal of the information bottleneck is to reduce
Figure 913555DEST_PATH_IMAGE022
And with
Figure 85779DEST_PATH_IMAGE020
Mutual information between, increase
Figure 552795DEST_PATH_IMAGE020
And
Figure 30043DEST_PATH_IMAGE021
the calculation formula is as follows:
Figure 614609DEST_PATH_IMAGE023
in the above formula, the first and second carbon atoms are,
Figure 110181DEST_PATH_IMAGE024
the goal of the optimization required for the information bottleneck module,
Figure 262945DEST_PATH_IMAGE025
for the parameters of the emotion recognition method based on the bimodal graph network information bottleneck,
Figure 86544DEST_PATH_IMAGE026
is composed of
Figure 209221DEST_PATH_IMAGE020
And with
Figure 60764DEST_PATH_IMAGE021
The mutual information between the two groups is obtained,
Figure 977905DEST_PATH_IMAGE027
is composed of
Figure 429746DEST_PATH_IMAGE019
And with
Figure 356114DEST_PATH_IMAGE020
The mutual information between the two groups of the information,
Figure 560699DEST_PATH_IMAGE028
is an adjustable factor.
S6: obtaining a graph representation vector by adopting a graph pooling technology represented by all nodes in the spliced bimodal topological graph, wherein a calculation formula is as follows:
Figure DEST_PATH_IMAGE029
in the above formula, the first and second carbon atoms are,
Figure 586424DEST_PATH_IMAGE030
a graph representation vector representing the merged text and all node representations of the visual block,
Figure 119036DEST_PATH_IMAGE031
for all of the nodes in the bimodal topology map,
Figure 583516DEST_PATH_IMAGE032
as nodes after S4
Figure 397933DEST_PATH_IMAGE031
Is shown.
S7: and identifying bimodal emotional tendency by using a multi-layer perceptron as a classifier.
S8: the model is trained through bimodal data, a cross entropy loss function and an information bottleneck objective function are used as model training targets, and an Adam optimizer with hot start is used for training the model. The training goals for the model are as follows:
Figure 922455DEST_PATH_IMAGE033
in the above formula, the first and second carbon atoms are,
Figure 817730DEST_PATH_IMAGE034
in order to train one sample in the set,
Figure 85900DEST_PATH_IMAGE035
for the set of all the training samples,
Figure 530657DEST_PATH_IMAGE028
is a coefficient which can be adjusted,
Figure 226080DEST_PATH_IMAGE036
for the parameters of the emotion recognition method based on the bimodal graph network information bottleneck,
Figure 874230DEST_PATH_IMAGE037
is the true value of the sample or samples,
Figure 680512DEST_PATH_IMAGE038
is a predicted value.
S9: and classifying the bimodal data to be classified through the trained model to obtain an emotion recognition result.
Compared with the existing bimodal emotion recognition method, the emotion recognition method based on the bimodal graph network information bottleneck has the following beneficial effects:
1. forming a bimodal topological graph by the text words and the visual blocks, and utilizing grammatical information of the text and spatial position information of the image;
2. the bi-modal topological graph establishes the relation between the bi-modal fine-grained semantic units, so that the multi-modal feature fusion is directly carried out between the fine-grained semantic units, the semantic information of texts and the local information of images can be fully fused, and the defects of the existing method are greatly supplemented;
3. by utilizing an information bottleneck mechanism, the generalization capability of the method is effectively improved.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a diagram of a system model of the present invention;
FIG. 3 is a module for constructing a bimodal topology of the present invention.
Detailed Description
In order that the public may better understand the present invention, specific embodiments thereof will be described below with reference to the accompanying drawings. Wherein the showings are for the purpose of illustration only and not for the purpose of limiting the invention; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The invention provides an emotion recognition method based on bimodal graph network information bottleneck, which comprises the following steps:
s1: and (3) data preprocessing, namely preprocessing the text and the image respectively through corresponding pre-training models.
As shown in FIG. 1, the text and image in the bimodal data are separated and then preprocessed separately. For texts, the representation of words is searched in pre-trained Glove, each word is mapped to a 300-dimensional vector, and a text embedding matrix is obtained
Figure 589563DEST_PATH_IMAGE001
(ii) a For images, it is first cut into
Figure 347565DEST_PATH_IMAGE002
A visual block, and then adopting image processing techniqueThe operation ResNet152 processes each visual block, processes each visual block into 1024-dimensional expression vectors, and finally obtains an image embedding matrix
Figure 76487DEST_PATH_IMAGE003
(ii) a Wherein,
Figure 827405DEST_PATH_IMAGE002
indicating the number of visual blocks.
S2: and performing feature extraction on the preprocessed embedded representation.
As shown in fig. 1, the text embedding and the image embedding obtained in S1 are subjected to feature extraction, respectively.
Because the text has a front-back order relation, in order to integrate more context information into word embedding, a bidirectional long-short early-stage memory network is adopted to carry out context semantic dependency learning, and text characteristics are extracted
Figure 856541DEST_PATH_IMAGE039
. The specific calculation formula is as follows:
Figure 628188DEST_PATH_IMAGE040
Figure 500198DEST_PATH_IMAGE041
Figure 648283DEST_PATH_IMAGE042
Figure 407291DEST_PATH_IMAGE043
Figure 349839DEST_PATH_IMAGE044
Figure 318932DEST_PATH_IMAGE045
in the above formula, the first and second carbon atoms are,
Figure 162386DEST_PATH_IMAGE046
in order to forget to leave the door,
Figure 634955DEST_PATH_IMAGE047
in order to input the information into the gate,
Figure 154930DEST_PATH_IMAGE048
in order to output the output gate, the output gate is provided with a gate,
Figure 611319DEST_PATH_IMAGE049
is a vector of candidate values, and is,
Figure 101206DEST_PATH_IMAGE050
the memory cells of the last moment are the memory cells,
Figure 84074DEST_PATH_IMAGE051
the memory cells at the current time are the memory cells,
Figure 634004DEST_PATH_IMAGE052
for the hidden state representation at the last moment,
Figure 187476DEST_PATH_IMAGE053
is a hidden state representation of the current time instant,
Figure 100002_DEST_PATH_IMAGE054
Figure 12213DEST_PATH_IMAGE055
Figure 351053DEST_PATH_IMAGE056
Figure 337463DEST_PATH_IMAGE057
and
Figure 100002_DEST_PATH_IMAGE058
Figure 909390DEST_PATH_IMAGE059
Figure DEST_PATH_IMAGE060
Figure 396872DEST_PATH_IMAGE061
indicating trainable parameters, subscripts, of long and short term memory networks
Figure DEST_PATH_IMAGE062
Representing the index of the position of the current word in the text.
Because no sequence features exist among visual blocks of the image, the feedforward neural network is adopted to extract the image features
Figure 105065DEST_PATH_IMAGE005
. The specific calculation formula is as follows:
Figure 262377DEST_PATH_IMAGE063
in the above formula, the first and second carbon atoms are,
Figure DEST_PATH_IMAGE064
trainable parameters representing a feed forward neural network.
To facilitate the implementation of subsequent feature fusion, text features
Figure 806753DEST_PATH_IMAGE039
And image features
Figure 707713DEST_PATH_IMAGE005
Is set to 128.
S3: and constructing a topological graph by using the grammar dependency relationship in the text and the spatial position relationship in the image.
In order to solve the defects of the prior art, the alignment relation of each modality on local features is reflected. As shown in fig. 3, this step will construct three topologies, namely: two intra-modal topographies and one bi-modal topographies, the operation is as follows.
S31: for the text modality, complex grammatical dependencies exist between words, and modeling grammatical dependencies facilitate learning of text information. Therefore, a topological graph in a text mode is constructed by taking words in the text as nodes and grammar dependency relationship in a dependency tree as undirected edges
Figure 270412DEST_PATH_IMAGE006
S32: constructing a topological graph in an image mode by taking visual blocks in an image as nodes and taking spatial position relations among the visual blocks as undirected edges
Figure 333046DEST_PATH_IMAGE007
S33: establishing the relation between bimodal fine-grained semantic units, so that bimodal feature fusion can be directly carried out between the fine-grained semantic units, namely: and establishing a mapping relation for the representation information of each mode by adopting a local alignment local mode, so that the semantic information of the text and the local information of the image are fully fused. Therefore, a word in the text and a visual block in the image are used as two groups of nodes, any node in the word and each node in the visual block form an undirected edge, and a complete bipartite graph is constructed to serve as a bimodal topological graph
Figure 597674DEST_PATH_IMAGE008
S4: and designing a modal interaction module based on a bimodal graph network, and performing representation learning by using a message transfer mechanism of the graph convolution network to realize information interaction and feature fusion in and among the modes.
As shown in fig. 2, the text features extracted in S2
Figure 771167DEST_PATH_IMAGE039
And image features
Figure 313007DEST_PATH_IMAGE005
And sending the data to a bimodal graph network, and carrying out information interaction and feature fusion through a graph convolution network on the basis of a topological graph constructed in the S3.
S41: topological graph in text mode
Figure 687487DEST_PATH_IMAGE009
In the form of a contiguous matrix, the matrix,
Figure 49198DEST_PATH_IMAGE039
the expression learning of the word nodes is carried out for the word node feature vectors through a graph convolution network, each word node transmits information to a neighbor word node with a grammar dependency relationship, and the information interaction in a text mode is realized, wherein the calculation formula is as follows:
Figure 918060DEST_PATH_IMAGE010
in the above-mentioned formula, the compound has the following structure,
Figure 579985DEST_PATH_IMAGE011
in order to train the parameters, the user may,
Figure 125367DEST_PATH_IMAGE012
the function is activated for sigmoid.
S42: in topological graph in image mode
Figure 974374DEST_PATH_IMAGE013
In the form of a contiguous matrix, the matrix,
Figure 145462DEST_PATH_IMAGE005
for the feature vectors of the visual block nodes, the representation learning of the visual block nodes is carried out through a graph convolution network, and the information transmission is carried out between the adjacent visual blocks, so as to realize the information interaction in the image modality, and the calculation formula is as follows:
Figure 396314DEST_PATH_IMAGE014
in the above formula, the first and second carbon atoms are,
Figure 237231DEST_PATH_IMAGE015
in order to train the parameters, the user may,
Figure 448901DEST_PATH_IMAGE012
the function is activated for sigmoid.
S43: in a bimodal topology
Figure 33466DEST_PATH_IMAGE008
As an adjacent matrix, splicing the text and image features extracted by S2 into a node feature vector
Figure 30503DEST_PATH_IMAGE016
Information aggregation is carried out through a graph convolution network, all neighbor nodes of each node belong to another mode node, so that information fusion between modes is realized, and a calculation formula is as follows:
Figure 776742DEST_PATH_IMAGE017
in the above formula, the first and second carbon atoms are,
Figure 741287DEST_PATH_IMAGE018
in order to train the parameters in a trainable manner,
Figure 598385DEST_PATH_IMAGE012
the function is activated for sigmoid.
S44: as shown in fig. 2, S41 to S43 form a convolutional network block, and after the model is parametrized, a better parameter value of the number of layers of the convolutional network block is obtained, and S41 to S43 are cycled according to the specific parameter value.
S5: and an information bottleneck module is established, and the generalization capability of the method is improved.
The information bottleneck module runs through the whole process of the method, and the specific operation is as follows.
S51: splicing the text embedding and the image embedding after the S1 data preprocessingObtaining input characteristics of information bottleneck module
Figure 89409DEST_PATH_IMAGE019
S52: splicing the text features and the image features extracted in the step S2 to obtain intermediate features of the information bottleneck module
Figure 131183DEST_PATH_IMAGE020
S53: s4, splicing the text representation and the image representation after modal interaction based on the bimodal graph network, wherein the text representation and the image representation are used as the output characteristics of the information bottleneck module
Figure 442079DEST_PATH_IMAGE021
S54: the goal of information bottlenecks is to reduce
Figure 243813DEST_PATH_IMAGE022
And with
Figure 323764DEST_PATH_IMAGE020
Mutual information between, increase
Figure 677385DEST_PATH_IMAGE020
And with
Figure 101676DEST_PATH_IMAGE021
The mutual information between the two is calculated according to the following formula:
Figure 566155DEST_PATH_IMAGE023
in the above-mentioned formula, the compound has the following structure,
Figure 641558DEST_PATH_IMAGE024
for the goal of the information bottleneck module requiring optimization,
Figure 900501DEST_PATH_IMAGE025
parameters of emotion recognition method based on bimodal graph network information bottleneckThe number of the first and second groups is,
Figure 185989DEST_PATH_IMAGE026
is composed of
Figure 844373DEST_PATH_IMAGE020
And
Figure 633337DEST_PATH_IMAGE021
the mutual information between the two groups is obtained,
Figure 204127DEST_PATH_IMAGE027
is composed of
Figure 976911DEST_PATH_IMAGE019
And
Figure 783193DEST_PATH_IMAGE020
the mutual information between the two groups is obtained,
Figure 330060DEST_PATH_IMAGE028
is an adjustable factor.
S6: a graph pooling technique is employed to convert the node representation of the bimodal topology graph into a graph representation.
The bimodal emotion recognition is to classify the overall emotional tendency of the data, and needs to combine the feature information of all nodes in the bimodal topological graph. Therefore, a graph pooling technology represented by all nodes in the spliced bimodal topological graph is adopted to obtain a graph representation vector, and a calculation formula is as follows:
Figure 461964DEST_PATH_IMAGE029
in the above formula, the first and second carbon atoms are,
Figure 66252DEST_PATH_IMAGE030
a graph representation vector representing the merged text and all node representations of the visual block,
Figure 941804DEST_PATH_IMAGE031
for all of the nodes in the bimodal topology map,
Figure 829995DEST_PATH_IMAGE032
is a node after S4
Figure 132800DEST_PATH_IMAGE031
Is shown.
S7: and expressing the vector through the graph obtained in the S6, and identifying the bimodal emotional tendency by using a multilayer perceptron as a classifier, wherein a calculation formula is as follows:
Figure 755542DEST_PATH_IMAGE065
Figure 169206DEST_PATH_IMAGE066
in the above formula, the first and second carbon atoms are,
Figure 413368DEST_PATH_IMAGE067
for the bimodal characterization to be finally learned,
Figure 887075DEST_PATH_IMAGE068
the emotional tendency predicted for the model,
Figure DEST_PATH_IMAGE069
and
Figure 262692DEST_PATH_IMAGE070
representing a trainable weight that is to be weighted,
Figure DEST_PATH_IMAGE071
and
Figure 870260DEST_PATH_IMAGE072
is a trainable bias.
S8: the model is trained through the bimodal data.
In the training process, a cross entropy loss function and an information bottleneck objective function are used as model training targets, and an Adam optimizer with hot start is used for training the model. Wherein the training targets of the model are as follows:
Figure 608409DEST_PATH_IMAGE033
in the above formula, the first and second carbon atoms are,
Figure 862804DEST_PATH_IMAGE034
in order to train one sample in the set of samples,
Figure 319193DEST_PATH_IMAGE035
for the set of all the training samples,
Figure 809080DEST_PATH_IMAGE028
is a coefficient which can be adjusted,
Figure 293413DEST_PATH_IMAGE036
for the parameters of the emotion recognition method based on the bimodal graph network information bottleneck,
Figure 577764DEST_PATH_IMAGE037
is the true value of the sample or samples,
Figure 396816DEST_PATH_IMAGE038
is a predicted value.
S9: and classifying the bimodal data to be classified through the trained model to obtain an emotion recognition result.
The described embodiments of the present invention are only for describing the preferred embodiments of the present invention, and do not limit the concept and scope of the present invention, and various modifications and improvements made to the technical solutions of the present invention by those skilled in the art should fall into the protection scope of the present invention without departing from the design concept of the present invention, and the technical contents of the present invention which are claimed are all described in the claims.

Claims (7)

1. An emotion recognition method based on bimodal graph network information bottleneck is characterized by comprising the following steps:
s1: data preprocessing, namely preprocessing a text and an image respectively through corresponding pre-training models;
s2: extracting the features of the preprocessed embedded representation, and extracting the text features by using a bidirectional long-short term memory network
Figure 371805DEST_PATH_IMAGE002
Extracting image features using feed-forward neural networks
Figure 959913DEST_PATH_IMAGE004
S3: constructing a topological graph by using a syntax dependency relationship in a text and a spatial position relationship in an image;
s4: designing a modal interaction module based on a bimodal graph network, and utilizing a message transmission mechanism of the graph convolution network to carry out representation learning so as to realize information interaction and feature fusion in and among the modes;
s5: an information bottleneck module is established, and the generalization capability of the method is improved;
s6: converting the node representation of the bimodal topological graph into a graph representation by adopting a graph pooling technology;
s7: identifying bimodal emotional tendency by taking a multilayer perceptron as a classifier;
s8: training the model through the bimodal data;
s9: and classifying the bimodal data to be classified through the trained model to obtain an emotion recognition result.
2. The emotion recognition method based on bimodal graph network information bottleneck, according to claim 1, wherein the S1 specifically is: text is processed by adopting word embedding technology Glove to obtain a text embedding matrix
Figure 211902DEST_PATH_IMAGE006
(ii) a The image is processed using an image processing technique ResNet152, where the image is cut into pieces prior to processing
Figure 501938DEST_PATH_IMAGE008
A visual block for obtaining an image representation matrix
Figure 599207DEST_PATH_IMAGE010
(ii) a Wherein,
Figure 307400DEST_PATH_IMAGE008
representing the number of visual blocks.
3. The emotion recognition method based on bimodal graph network information bottleneck, according to claim 1, wherein the specific step of S3 comprises:
s31, constructing a topological graph in a text mode by taking words in the text as nodes and grammatical dependency relationship in a dependency tree as undirected edges
Figure 667975DEST_PATH_IMAGE012
S32: constructing a topological graph in an image mode by taking visual blocks in an image as nodes and taking spatial position relations among the visual blocks as undirected edges
Figure 320673DEST_PATH_IMAGE014
S33: taking words in a text and a visual block in an image as two groups of nodes, forming a non-directional edge by any node in the words and each node in the visual block, and constructing a complete bipartite graph as a dual-mode topological graph
Figure 113311DEST_PATH_IMAGE016
4. The emotion recognition method based on bimodal graph network information bottleneck, according to claim 1, wherein the specific step of S4 comprises:
s41: topological graph in text mode
Figure 800644DEST_PATH_IMAGE012
The extracted text features are word node feature vectors, and the expression learning of word nodes is carried out through a graph convolution network, so that information interaction in a text mode is realized;
s42: topology map in image modality
Figure 4223DEST_PATH_IMAGE017
The image features extracted in S2 are visual block node feature vectors, and the representation learning of the visual block nodes is carried out through a graph convolution network, so that information interaction in an image mode is realized;
s43: in a bimodal topology
Figure 144218DEST_PATH_IMAGE016
As an adjacent matrix, splicing the text and image features extracted by S2 into a node feature vector
Figure 786551DEST_PATH_IMAGE019
Information aggregation is carried out through a graph convolution network, and information fusion between modes is realized;
s44: loops S41-S43 are set according to the specific parameters of the model.
5. The emotion recognition method based on bimodal graph network information bottleneck, according to claim 1, wherein the specific step of S5 comprises:
s51: splicing the text embedding and the image embedding after the S1 data preprocessing to obtain the input characteristics of the information bottleneck module
Figure 453025DEST_PATH_IMAGE021
S52: splicing the text features and the image features extracted in the step S2 to obtain intermediate features of the information bottleneck module
Figure 217719DEST_PATH_IMAGE023
S53: s4, splicing the text representation and the image representation after modal interaction based on the bimodal graph network, wherein the text representation and the image representation are used as the output characteristics of the information bottleneck module
Figure 720375DEST_PATH_IMAGE025
S54: the goal of information bottlenecks is to reduce
Figure DEST_PATH_IMAGE026
And with
Figure 697559DEST_PATH_IMAGE023
Mutual information between, increase
Figure 720004DEST_PATH_IMAGE023
And with
Figure 655599DEST_PATH_IMAGE025
The mutual information between the two is calculated according to the following formula:
Figure DEST_PATH_IMAGE028
in the above-mentioned formula, the compound has the following structure,
Figure DEST_PATH_IMAGE030
the goal of the optimization required for the information bottleneck module,
Figure DEST_PATH_IMAGE032
for the parameters of the emotion recognition method based on the bimodal graph network information bottleneck,
Figure DEST_PATH_IMAGE034
is composed of
Figure 363660DEST_PATH_IMAGE023
And
Figure 285480DEST_PATH_IMAGE025
the mutual information between the two groups is obtained,
Figure 739595DEST_PATH_IMAGE036
is composed of
Figure DEST_PATH_IMAGE037
And
Figure 3348DEST_PATH_IMAGE023
the mutual information between the two groups is obtained,
Figure DEST_PATH_IMAGE039
is an adjustable coefficient.
6. The emotion recognition method based on bimodal graph network information bottleneck, as claimed in claim 1, wherein the S6 specifically is: obtaining a graph representation vector by adopting a graph pooling technology of all node representations in the spliced bimodal topological graph, wherein a calculation formula is as follows:
Figure DEST_PATH_IMAGE041
in the above formula, the first and second carbon atoms are,
Figure DEST_PATH_IMAGE043
a graph representation vector representing the merged text and all node representations of the visual block,
Figure DEST_PATH_IMAGE045
for all of the nodes in the bimodal topology,
Figure DEST_PATH_IMAGE047
is a node after S4
Figure 464286DEST_PATH_IMAGE045
Is shown.
7. The emotion recognition method based on bimodal graph network information bottleneck, as claimed in claim 1, wherein said S8 specifically is: using a cross entropy loss function and an information bottleneck objective function as a model training objective, and using an Adam optimizer with hot start to train the model; wherein the training targets of the model are as follows:
Figure 189796DEST_PATH_IMAGE049
in the above-mentioned formula, the compound has the following structure,
Figure 763997DEST_PATH_IMAGE051
in order to train one sample in the set,
Figure 775815DEST_PATH_IMAGE053
for the set of all the training samples,
Figure 491093DEST_PATH_IMAGE039
is a coefficient which can be adjusted,
Figure DEST_PATH_IMAGE054
for the parameters of the emotion recognition method based on the bimodal graph network information bottleneck,
Figure DEST_PATH_IMAGE056
is the true value of the sample or samples,
Figure DEST_PATH_IMAGE058
is a predicted value.
CN202211645853.1A 2022-12-21 2022-12-21 Emotion identification method based on bimodal graph network information bottleneck Active CN115631504B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211645853.1A CN115631504B (en) 2022-12-21 2022-12-21 Emotion identification method based on bimodal graph network information bottleneck

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211645853.1A CN115631504B (en) 2022-12-21 2022-12-21 Emotion identification method based on bimodal graph network information bottleneck

Publications (2)

Publication Number Publication Date
CN115631504A true CN115631504A (en) 2023-01-20
CN115631504B CN115631504B (en) 2023-04-07

Family

ID=84910557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211645853.1A Active CN115631504B (en) 2022-12-21 2022-12-21 Emotion identification method based on bimodal graph network information bottleneck

Country Status (1)

Country Link
CN (1) CN115631504B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304984A (en) * 2023-03-14 2023-06-23 烟台大学 Multi-modal intention recognition method and system based on contrast learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150379336A1 (en) * 2014-06-27 2015-12-31 Fujitsu Limited Handwriting input conversion apparatus, computer-readable medium, and conversion method
CN112860888A (en) * 2021-01-26 2021-05-28 中山大学 Attention mechanism-based bimodal emotion analysis method
CN114511906A (en) * 2022-01-20 2022-05-17 重庆邮电大学 Cross-modal dynamic convolution-based video multi-modal emotion recognition method and device and computer equipment
CN115363531A (en) * 2022-08-22 2022-11-22 山东师范大学 Epilepsy detection system based on bimodal electroencephalogram signal information bottleneck

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150379336A1 (en) * 2014-06-27 2015-12-31 Fujitsu Limited Handwriting input conversion apparatus, computer-readable medium, and conversion method
CN112860888A (en) * 2021-01-26 2021-05-28 中山大学 Attention mechanism-based bimodal emotion analysis method
CN114511906A (en) * 2022-01-20 2022-05-17 重庆邮电大学 Cross-modal dynamic convolution-based video multi-modal emotion recognition method and device and computer equipment
CN115363531A (en) * 2022-08-22 2022-11-22 山东师范大学 Epilepsy detection system based on bimodal electroencephalogram signal information bottleneck

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
范习健等: "一种融合视觉和听觉信息的双模态情感识别算法" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304984A (en) * 2023-03-14 2023-06-23 烟台大学 Multi-modal intention recognition method and system based on contrast learning

Also Published As

Publication number Publication date
CN115631504B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN111488734B (en) Emotional feature representation learning system and method based on global interaction and syntactic dependency
US20220147836A1 (en) Method and device for text-enhanced knowledge graph joint representation learning
CN111079409B (en) Emotion classification method utilizing context and aspect memory information
CN111061843A (en) Knowledge graph guided false news detection method
CN111274398A (en) Method and system for analyzing comment emotion of aspect-level user product
CN115034224A (en) News event detection method and system integrating representation of multiple text semantic structure diagrams
CN110826639B (en) Zero sample image classification method trained by full data
CN114444516B (en) Cantonese rumor detection method based on deep semantic perception map convolutional network
CN113535904A (en) Aspect level emotion analysis method based on graph neural network
CN110866542A (en) Depth representation learning method based on feature controllable fusion
CN115221325A (en) Text classification method based on label semantic learning and attention adjustment mechanism
CN111598183A (en) Multi-feature fusion image description method
CN114528374A (en) Movie comment emotion classification method and device based on graph neural network
CN114648031A (en) Text aspect level emotion recognition method based on bidirectional LSTM and multi-head attention mechanism
CN114462420A (en) False news detection method based on feature fusion model
CN114004220A (en) Text emotion reason identification method based on CPC-ANN
CN115631504B (en) Emotion identification method based on bimodal graph network information bottleneck
CN116933051A (en) Multi-mode emotion recognition method and system for modal missing scene
CN111930981A (en) Data processing method for sketch retrieval
CN116187349A (en) Visual question-answering method based on scene graph relation information enhancement
CN114818719A (en) Community topic classification method based on composite network and graph attention machine mechanism
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
CN113268592B (en) Short text object emotion classification method based on multi-level interactive attention mechanism
CN113255360A (en) Document rating method and device based on hierarchical self-attention network
CN116562286A (en) Intelligent configuration event extraction method based on mixed graph attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant