CN114648635A - Multi-label image classification method fusing strong correlation among labels - Google Patents

Multi-label image classification method fusing strong correlation among labels Download PDF

Info

Publication number
CN114648635A
CN114648635A CN202210250180.3A CN202210250180A CN114648635A CN 114648635 A CN114648635 A CN 114648635A CN 202210250180 A CN202210250180 A CN 202210250180A CN 114648635 A CN114648635 A CN 114648635A
Authority
CN
China
Prior art keywords
label
matrix
node
layer
occurrence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210250180.3A
Other languages
Chinese (zh)
Other versions
CN114648635B (en
Inventor
张辉宜
夏媛龙
黄�俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Technology AHUT
Original Assignee
Anhui University of Technology AHUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Technology AHUT filed Critical Anhui University of Technology AHUT
Priority to CN202210250180.3A priority Critical patent/CN114648635B/en
Publication of CN114648635A publication Critical patent/CN114648635A/en
Application granted granted Critical
Publication of CN114648635B publication Critical patent/CN114648635B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-label image classification method fusing strong correlation among labels, which comprises the following steps: clustering the labels in the data set into M communities, and dividing a traditional label co-occurrence matrix into M-times label co-occurrence tensors; sending the picture to be trained into a general convolutional neural network, and performing M-weight categorical pooling after the last hierarchy to obtain a multiple categorical feature map; sending the label co-occurrence tensor and the label embedded matrix into a multi-graph convolutional neural network, and fusing the M-fold label expression tensors into a label expression matrix by using an attention fusion mechanism after the last multi-graph convolutional layer; merging the label sub-semantic relation into a middle hierarchy of the convolutional neural network; integrating community coding information into a label expression matrix, and performing label level multiplication with the multiple generic characteristic diagrams; and constructing a global objective function. The invention learns the strong correlation between the labels and the correlation between the labels, and integrates the strong correlation into the feature map, thereby improving the performance of the multi-label classification task.

Description

Multi-label image classification method fusing strong correlation among labels
Technical Field
The invention relates to the technical field of multi-label image classification, in particular to a multi-label image classification method fusing strong correlation among labels.
Background
Image classification is an important topic in the field of machine learning and is also the core of computer vision. The traditional method is to extract image features by using a convolutional neural network, for example, the neural network such as AlexNet, VGG, Resnet and the like is used to extract the features, and a high accuracy rate is obtained in the field of single-label image classification. However, in multi-tag image recognition, tag correlation is difficult to be represented by such a neural network, and therefore, in recent years, researchers have tried to incorporate the correlation between tags into a convolutional neural network by different methods.
Previous researchers have dealt with the problem of multi-label image classification, and often adopt a recurrent neural network and a graph convolution neural network to construct the correlation between labels. However, since the recurrent neural network generally processes serialized data, it is difficult to express its internal complex co-occurrence relationship in constructing correlation problems between tags. The graph convolution neural network has strong modeling capacity in processing non-Euclidean structure data (such as graph data), so that the co-occurrence relation between the labels in the graph convolution neural network can be well learned by utilizing the graph convolution neural network and a label co-occurrence graph constructed in advance. Researchers often adopt a word embedding model to construct a label node representation matrix, construct a label co-occurrence adjacency matrix by utilizing the co-occurrence relation among labels in a data set, send the label node representation matrix and the label co-occurrence adjacency matrix into a graph convolution neural network to learn the correlation among labels, and finally fuse the learned label correlation into a final-level feature graph of the convolutional neural network.
The conventional method for constructing the tag correlation by using the graph convolution neural network, such as the ML-GCN method, and integrating the tag correlation into the last layer of the convolution neural network has two disadvantages: first, the convolutional neural network only has correlation between the last convolutional layer and the merged tag, and the convolutional neural network, especially the number of convolutional layers of the Resnet correlation model, is usually stacked very much, which results in insufficient correlation of the merged tag of the convolutional neural network; second, conventional methods of constructing tag dependency adjacency matrices tend to simply establish co-occurrence relationships between tags throughout the dataset, ignoring strong relationships between the interior of tags. For example, { "person", "cup", "bowl", "table", "umbrella", chair "," car "} may have some relevance, but {" cup "," bowl "," table "," chair "}, {" person "," umbrella "," car "} may have stronger relevance.
The prior MGTN model fuses the strong connection in the label into the convolutional neural network by the Graph transformations method, divides the label nodes into different communities by the community detection algorithm, and fuses the label nodes of the different communities into different characteristic graphs by using Multiple CNNs (consisting of a plurality of convolutional neural networks), but the use of the Multiple CNNs needs to consume very large calculation amount to learn a small number of communities, so that the number of the communities is difficult to expand.
Disclosure of Invention
1. Technical problem to be solved by the invention
In view of the defects of the prior art, the invention provides a multi-label image classification method fusing strong correlation among labels; according to the invention, the traditional label co-occurrence adjacency matrix is converted into a plurality of sub-label co-occurrence adjacency matrixes, and the strong correlation inside the sub-labels is learned through the multi-graph convolutional neural network, so that the co-occurrence information of the sub-labels is more fully fused into the convolutional neural network, and the image classification performance of the convolutional neural network is improved.
2. Technical scheme
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
the invention discloses a multi-label image classification method fusing strong correlation among labels, which comprises the following steps:
s1, constructing a label co-occurrence matrix by using the label co-occurrence relation in the data set; dividing the label nodes into M communities by using a community detection algorithm; setting a threshold value to divide the label co-occurrence matrix into M co-occurrence subgraphs;
s2, obtaining an image file of training data, sending the image file into a convolutional neural network to extract image features to obtain a feature map, converting the feature map into M two-dimensional generic feature maps through multiple generic pooling operations, and splicing the two-dimensional generic feature maps in parallel;
s3, acquiring a label embedding matrix, acquiring a label co-occurrence tensor, sending the M label co-occurrence tensors and the label embedding matrix into a multi-graph convolutional neural network to obtain an M-fold label node representation tensor, and fusing the M-fold label node representation tensors into a label node representation matrix by utilizing a label level attention fusion mechanism;
s4, fusing an M-ply label node expression tensor generated by the multi-ply graph convolutional layer into a feature graph output by a middle level of the general convolutional neural network by using an attention mechanism;
s5, building a community coding M matrix according to M communities divided by the label nodes by a community detection algorithm, multiplying the M matrix by a label node expression matrix fused by a fusion mechanism to obtain a label node expression matrix with community coding, multiplying the matrix by a multiple generic feature graph spliced together in the column direction, and using the obtained prediction label for classification;
and S6, constructing a loss function, wherein the loss function comprises a multi-label classification loss function and an objective function for learning a parameter matrix in the multiple generic pooling layer.
3. Advantageous effects
Compared with the prior art, the technical scheme provided by the invention has the following remarkable effects:
(1) the traditional multi-label deep learning algorithm usually utilizes the correlation between the convolutional neural network learning labels, but ignores the strong connection in the sub-labels, and the multi-label image classification method fusing the strong correlation between the labels converts the traditional label co-occurrence adjacent matrix into a plurality of sub-label co-occurrence adjacent matrices, learns the strong correlation in the sub-labels through the multi-image convolutional neural network, and is more favorable for improving the classification performance of the multi-label image.
(2) According to the multi-label image classification method fusing the strong correlation among labels, a label attention fusion mechanism can learn the correlation among label nodes, and the correlation among sub-images is not only learned in a Graph transform algorithm. The sub-graph label node representation is fused into the graph convolution neural network by using an attention mechanism, so that the co-occurrence information of the sub-labels is more fully fused into the convolution neural network, and the image classification performance of the convolution neural network is improved.
Drawings
FIG. 1 is a flow chart of a multi-label image classification method fusing strong correlation between labels according to the present invention;
FIG. 2 is a schematic diagram of Resnet-101 extracting image features according to the present invention;
FIG. 3 is a schematic diagram of generic pooling of the present invention;
FIG. 4 is a schematic diagram of the multiple generic pooling of the present invention;
FIG. 5 is a schematic diagram of the fusion of multiple convolutional layers and a sub-graph tag representation matrix according to the present invention;
FIG. 6 is a schematic diagram of an intermediate layer of the present invention for fusing sub-graph label correlations to a convolutional neural network;
FIG. 7 is an internal structure diagram of the M matrix according to the present invention;
FIG. 8 is a schematic diagram of the tag level multiplication of the present invention.
Detailed Description
For a further understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings and examples.
Example 1
According to the multi-label image classification method fusing the strong correlation between labels, the label relation and the strong relation inside the labels are learned, and a multi-label classification task is better performed, and the method comprises the following steps:
s1, constructing a label co-occurrence matrix by using the label co-occurrence relation in the data set; dividing the label nodes into M communities by utilizing a community detection algorithm; setting a threshold value to divide the label co-occurrence matrix into M co-occurrence subgraphs, namely a label co-occurrence tensor; the method specifically comprises the following steps:
(1-1) obtaining an image file of training data and labels in the training data, and establishing a co-occurrence relation adjacency matrix A belonging to R for the labelsC×CWherein A isij1 denotes the label LiWhen present, label LjThere is also a certain probability that A will occur otherwiseij0. By using the basic conditional probability formula and the Z matrix, the conditional probability matrix P epsilon R can be constructedC×CDenotes when the label LiWhen present, label LjSetting a threshold value tau to binarize the P matrix and constructing a label co-occurrence matrix A epsilon for RC×CThe following:
Figure BDA0003546442290000031
(1-2) obtaining a label co-occurrence adjacency matrix, dividing the label co-occurrence adjacency matrix into M communities and M subgraphs (label sub-relationship adjacency tensors),
dividing the label nodes into M communities by using a community detection algorithm, firstly, calculating the modularity of the adjacency matrix A,
Figure BDA0003546442290000032
wherein the content of the first and second substances,
Figure BDA0003546442290000041
di=∑kAi,kdegree of the ith node, and δ (c)i,cj) Used for detecting whether the i node and the j node are in the same community, if so, delta (c)i,cj) 1, otherwise, δ (c)i,cj)=0,
When the community detection algorithm starts, each node is independently used as a community, and in order to maximize modularity, other label nodes are brought into the community. Until the modularity does not change, M communities are detected.
(1-3) constructing a label sub-relationship adjacency tensor by setting a threshold as follows: setting a threshold T ═ T on the conditional probability matrix P1,...tm]Wherein, ti∈[0,1]And is
Figure BDA0003546442290000049
Constructing a label sub-relationship adjacency tensor
Figure BDA0003546442290000042
The following were used:
Figure BDA0003546442290000043
wherein the content of the first and second substances,
Figure BDA0003546442290000044
co-occurrence of adjacency matrices by labels
Figure BDA0003546442290000045
k is formed by { 1.,. M },
Figure BDA0003546442290000046
representing the kth sub-graph, the label LiWhen present, label LjThere is also a certain probability that this will occur,
in this embodiment, the number of communities is equal to the number of subgraphs.
S2, obtaining an image file of training data, sending the image file into a convolutional neural network to extract image features to obtain a feature map, converting the feature map into M two-dimensional generic feature maps through multiple generic pooling operations, and splicing the two-dimensional generic feature maps in parallel; the method specifically comprises the following steps:
(2-1) acquiring an image file of training data, and cutting the image into a uniform specification (3 × 448 × 448) and inputting the uniform specification into the Resnet-101 neural network. In a neural network, the feature map goes through a 7 × 7 convolutional layer, a 1 × 1 pooling layer, and four intermediate layers, layer1, layer2, layer3, layer 4. Four ofThe feature map output by the middle hierarchy is
Figure BDA0003546442290000047
Wherein s is the hierarchy number, and the value range is {1,2,3,4 }. WS、HSAnd CSThe width, height and channel number of the output feature map of the s-th level are respectively. Specifically, the feature map dimensions of the four hierarchical outputs are respectively, F1∈R256×112×112,F2∈R512×56×56,F3∈R1024×28×28,F4∈R2048×14×14Then, a multiple generic pooling layer is connected to obtain a multiple generic feature map, as shown in FIG. 2.
(2-2) obtaining the last hierarchical feature map F4∈R2048×14×14Will F4Is shown as
Figure BDA0003546442290000048
Wherein, H, W and DcThe method for extracting the picture features in the hierarchical image includes the following steps:
Xlsp=Wlsp×R(Xconv)
wherein the content of the first and second substances,
Figure BDA00035464422900000410
the feature map output for the last level of Resnet-101. The dimension transformation operation R (-) may let XconvFrom the three-dimensional tensor (dimension D)cxHxW) to a two-dimensional matrix (dimension HW x D)c) That is to say that,
Figure BDA0003546442290000052
and let the parameter matrix Wlsp∈RC×HWLeft ride
Figure BDA0003546442290000053
Can obtain a matrix after special pooling
Figure BDA0003546442290000054
Wherein C is the label category number, and the generic pooling method is shown in FIG. 3.
(2-3) finally, performing M times of generic pooling operations by using the same method, wherein the method comprises the following specific steps:
Figure BDA0003546442290000055
wherein i is a positive integer from 1 to M, M is the number of subgraphs or communities, and M generic feature graphs are used
Figure BDA0003546442290000056
Splicing according to the column direction to obtain
Figure BDA0003546442290000057
Multiple generic pooling layers are shown in FIG. 4.
(2-4) finally, learning parameter matrix by using sub-graph label adjacency matrix
Figure BDA0003546442290000058
The following:
Figure BDA0003546442290000051
in particular, the amount of the solvent to be used,
Figure BDA0003546442290000059
wherein i is a positive integer from 1 to M, M is the number of subgraphs, and l is used2Norm is to maintain a parameter matrix
Figure BDA00035464422900000510
The sparsity of the composite material is determined,
learning a parameter matrix using tag co-occurrence tensors
Figure BDA00035464422900000511
Is to accelerate the parameter matrix
Figure BDA00035464422900000512
And (4) learning, and differentially fusing different sub-graphs to learn the label semantic relationship by utilizing the generic feature graph. Generic feature maps containing different features can be obtained by multiple generic pooling
Figure BDA00035464422900000513
S3, acquiring a label embedding matrix, acquiring a label co-occurrence tensor, sending the M label co-occurrence tensors and the label embedding matrix into a multi-graph convolutional neural network to obtain an M-fold label node representation tensor, and fusing the M-fold label node representation tensors into a label node representation matrix by utilizing a label level attention fusion mechanism; the method specifically comprises the following steps:
(3-1) adjoining subgraph label relationship to matrix
Figure BDA00035464422900000514
And the label embedding matrix E ∈ RC×DAnd sending the signal to a multiple convolutional neural network as follows:
Figure BDA00035464422900000515
the first layer of input of the multi-graph convolution layer is a label embedding matrix E belonging to RC×DAnd multiple adjacent subgraphs after normalization
Figure BDA00035464422900000516
Figure BDA00035464422900000517
The subgraphs are sent into different graph convolution layers, and after the multiple graph convolution layers, a label expression tensor is obtained
Figure BDA00035464422900000518
Wherein L islThe labels input for the ith layer of graph convolution represent the node dimensions of the tensor,l is more than or equal to 2. h (-) is a nonlinear activation function. The above formula is an expression of the multi-graph convolution layer when the multi-graph convolution layer is more than two layers, wherein
Figure BDA0003546442290000064
For a parameter matrix in a graph convolution layer, each layer of nodes represents a dimension represented by a parameter
Figure BDA0003546442290000065
And (6) determining. Similar to the above, the first layer map is represented by
Figure BDA0003546442290000066
(3-2) at this time, the M-repeated label nodes need to represent tensors by using a label-level attention fusion mechanism
Figure BDA0003546442290000067
A label node expression matrix G epsilon R is fused into a wholeC×LThe following are:
Figure BDA0003546442290000068
can be expressed as
Figure BDA0003546442290000069
Wherein the content of the first and second substances,
Figure BDA00035464422900000610
denotes a k-th node representing a matrix, k being an integer between 1 and M,
the importance of the matrix is represented by learning the nodes of each subgraph first, as follows:
Figure BDA0003546442290000061
wherein W is a parameter vector W ∈ R1×CB is an offset vector b ∈ R1×CTan h (-) is an activation function, q is a parameter vector q epsilon R1×CThe purpose is to convert the vector generated by the function tanh (-) into a scalar, so that the whole isThe scalar values obtained on both sides of the equation,
Figure BDA00035464422900000611
for the kth label representation the label node representation of the ith node in the matrix, wkRepresenting importance weights of the label embedding matrix and the kth sub-graph when learning through a graph convolution neural network, i is an integer between 1 and C,
calculating the weight w of the good node in each subgraphkAfterwards, normalization is done by the softmax () function as follows:
Figure BDA0003546442290000062
thus, the importance score (β) of the node in each sub-graph is calculated1,...,βM) Respectively multiplying the importance scores by the corresponding label nodes, then summing, and then multiplying by the parameter matrix
Figure BDA00035464422900000612
And performing dimension transformation to obtain a label node representation matrix G after fusion, which is as follows:
Figure BDA0003546442290000063
wherein G ∈ RC×LWherein C is the number of label nodes, and the label nodes represent the dimension L by a parameter matrix
Figure BDA00035464422900000613
It is determined that the multi-map convolutional layer and label level attention fusion network structure is shown in FIG. 5.
S4, fusing the M-ply label node expression tensor generated by the multi-ply convolutional layer into a feature graph output by the middle layer of the general convolutional neural network by using an attention mechanism; comprises the following steps:
(4-1) mixing
Figure BDA00035464422900000614
Fusing the attention mechanism into a feature map output at a certain level in the middle of Resnet-101 as follows:
the Resnet-101 top three-level output characteristic diagram is
Figure BDA0003546442290000071
Wherein s is more than or equal to 1 and less than or equal to 3, Cs、WsAnd HsThe number, width and height of the channels of the s-th middle layer
Figure BDA0003546442290000072
Is marked as
Figure BDA0003546442290000073
Wherein, Cl=Cs、W=Ws、H=HsL represents FsAnd fusing the label node correlation output by the l-th layer graph convolution neural network.
Due to the fact that
Figure BDA0003546442290000074
Middle LlIs defined by a parameter matrix
Figure BDA0003546442290000075
Determine that L can bel=ClP, i.e. Xl∈RP×W×H
Figure BDA0003546442290000076
Figure BDA0003546442290000077
Using dimension transformation operation R (-) to transform XlAnd HlDimensional transformation to XR∈RWH×PAnd
Figure BDA0003546442290000078
mixing XR∈RWH×PWhen the key and the value are taken as the key and the value,
Figure BDA0003546442290000079
as query, it is fed into the transform model decoder, and the Multi-Head Attention mechanism in the transform decoder is used to fuse tag correlations as follows:
Figure BDA00035464422900000710
wherein, Multihead (Q, K, V) Concat (head)1,head2,...,headh)WO,
Wherein the content of the first and second substances,
Figure BDA00035464422900000711
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00035464422900000712
the present embodiment uses a standard Transformer structure, and the difference from the conventional Transformer structure is that the input is from a different source. In this embodiment, key and value in the MultiHead structure are derived from feature maps output by a convolutional neural network Resnet-101 middle hierarchy, and query is derived from label nodes output by a multi-graph convolutional network to represent tensors. The correlation between the image features and the sub-labels can be learned through query and key, wherein QK isTThe correlation degree of the Q matrix and the K matrix can be calculated, the correlation score is obtained through the function normalization of softmax (), and then the correlation score is multiplied to value, so that the label correlation can be merged into the feature map. Wherein the multi-head attention mechanism is consistent with the conventional Transformer, i.e.
Figure BDA00035464422900000713
While
Figure BDA00035464422900000714
Wherein h is 8, P is hdk
The obtained characteristics X fused with the relevance of the labelRDoing the inverse operation of dimension transformation R (-) to convert XR∈RWH×PConversion to XM∈RP×W×HThe newly obtained XM∈RP×W×HWith the original feature map Xl∈RP×W×HAdd to Transformer decoder&The Norm layer is used for carrying out normalization and residual error linkage, and then the characteristic diagram X of the tag correlation is blendedM∈RP×W×HFNN layer fed into the transform decoder as follows:
XF=FFN(XM)
wherein ffn (x) max (0, xW)1+b1)W2+b2The inner layer comprises two linear layer linear transformations, and the middle is activated by a Relu activation layer.
The FFN layer in the conventional Transformer consists of two linear layers, the middle of which is activated by a Relu function, and the difference between the embodiment and the conventional method is that the parameter W is1And W2Is a three-dimensional tensor, i.e.
Figure BDA0003546442290000081
Wherein d isffIs the dimension of the previous linear layer output.
The newly obtained XF∈RP×W×HSignature graph X correlated with original merged labelM∈RP×W×HAdd to Transformer decoder&And the Norm layer is used for carrying out normalization and residual error linkage. Obtaining the final characteristic diagram X of the correlation of the merged labell∈RP×W×HAnd (5) sending the information to the next layer of training of the convolutional neural network, wherein the convolutional neural network attention fusion subgraph label correlation network structure is shown in FIG. 6.
S5, building a community coding M matrix according to M communities divided by label nodes by a community detection algorithm, multiplying the M matrix by a label node expression matrix fused by a fusion mechanism (Hadamard product) to obtain a label node expression matrix with community coding, multiplying the matrix by multiple generic feature graphs spliced together in the column direction (label level multiplication), and using the obtained prediction label for classification; the method comprises the following steps:
(5-1) dividing the label nodes in the data set into M communities by utilizing a community detection method, wherein M and a subgraphThe number is kept consistent, the label nodes are divided into M communities according to a community detection algorithm, so that each label node has a community to which the label node belongs, the communities are numbered as p according to a { 1.,. M } sequence, wherein M is the number of the communities, and the community number to which each label node belongs is stored in an S e.ZCWherein C is the number of label categories, Si=p,i∈{1,...,C},
Obtaining multiple label generic feature map through multiple generic pooling layers
Figure BDA0003546442290000082
Wherein, XmullspCan be expressed as a combination of generic feature maps
Figure BDA0003546442290000083
i is a positive integer between 1 and M,
Figure BDA0003546442290000084
by using
Figure BDA0003546442290000085
Different communities are distinguished and matched, the pth community is determined to be matched with which generic label graph according to the community code p,
constructing a community coding M matrix as follows:
Figure BDA0003546442290000086
wherein, tau is equal to [0, 1]],SiP means that the i-th tag can find the community to which it belongs as p,
Figure BDA0003546442290000087
representing node needs and generic features graph belonging to p-community
Figure BDA0003546442290000088
Matching, so that the point corresponding to the feature map can be found in the M matrix, the point value is set as tau, and other position points are set as
Figure BDA0003546442290000089
It is clear that,
Figure BDA00035464422900000810
the purpose of differentially fusing different label nodes by using different generic characteristics is achieved, and the internal structure of the community coding M matrix is shown in FIG. 7.
(5-2) expressing the community coding M matrix and the label node expression matrix G epsilon RC×LMaking Hadamard product, and mixing with XmullspPerforming label level multiplication to enable the model to fuse different generic features with label nodes of different communities as follows:
since G is belonged to RC×LL in (1) is a parameter matrix
Figure BDA0003546442290000091
Determine that L and D can be madecM equal, coding subgraphs
Figure BDA0003546442290000092
Figure BDA0003546442290000093
And label node representation
Figure BDA0003546442290000094
Multiplying (Hadamard product) to enable the label nodes to be merged into community coding information to obtain a matrix
Figure BDA0003546442290000095
The following were used:
W=M·G
(5-3) As shown in FIG. 8, the
Figure BDA0003546442290000096
And
Figure BDA0003546442290000097
performing label level multiplication for final classification, as follows:
w may be represented as W ═ W1,...,wC]T,XmullspCan be represented as Xmullsp=[x1,...,xC]And C is the number of label categories, the prediction function is as follows:
Figure BDA0003546442290000098
and S6, constructing a loss function, wherein the loss function comprises a multi-label classification loss function and an objective function for learning a parameter matrix in the multiple generic pooling layer. The method comprises the following steps:
(6-1) defining the objective function as:
Figure BDA0003546442290000099
where L (-) is a loss function of the multi-label classification, LlspFor accelerating learning parameters
Figure BDA00035464422900000910
And is used to distinguish different subgraphs, theta is a global parameter that needs to be learned.
(6-2) using a two-class loss function as a multi-label class loss function as follows:
Figure BDA00035464422900000911
objective function L for guiding generic featureslsp(θ) is:
Figure BDA00035464422900000912
(6-3) therefore, the overall loss function is:
Figure BDA0003546442290000101
wherein the parameters
Figure BDA0003546442290000102
Is part of the parameter theta.
Through the steps, semantic correlations of different strengths of the labels are merged into the convolutional neural network to obtain the predicted label
Figure BDA0003546442290000103
And calculating the minimum value between the real label and the predicted label by using a loss function, and optimizing by adopting SGD (generalized minimum) to fulfill the aim of improving the classification performance of the multi-label image.
In the embodiment, the Resnet related network is used as a base network to extract image features, and replaces the traditional average pooling layer or maximum pooling layer after the last layer, and a plurality of generic pooling layers are used for converting feature map dimensions. And learning parameters in the label sub-relation adjacent tensor to obtain a generic feature map with different features. The method replaces different communities of a Multiple CNNs matching method in MGTN, so that the calculated amount is greatly reduced, more communities are learned, and parameters in the adjacent matrixes are learned by using different sub-labels to distinguish label features in different subgraphs.
In addition, the embodiment adopts the multi-Graph convolutional neural network to train the co-occurrence adjacent tensor of the sub-labels and the label node representation matrix, which is different from the method in the MGTN, the embodiment can learn the semantic relationship of the label nodes under the co-occurrence adjacent matrix of different sub-labels, and only learn the correlation of the co-occurrence adjacent matrix of different sub-labels, or the correlation between element paths, by using the Graph transformations method,
the embodiment utilizes an attention mechanism, a transform decoder structure to integrate the tag correlation into the middle layer of the convolutional neural network. The embodiment utilizes a MultiHead orientation mechanism to integrate the label correlation into a convolutional neural network, takes a feature graph output by the convolutional neural network as key and value, and takes a label node expression tensor output by the multi-graph convolutional neural network as query, so that the sub-label correlation is integrated into an intermediate layer feature graph of the convolutional neural network, and the integration degree of the label correlation and the convolutional neural network is improved.
In summary, the embodiment can learn the correlation between the labels and the correlation inside the labels, and more fully fuse the strong correlation between the labels and the correlation between the labels into the convolutional neural network, and the method of the embodiment is more easily expanded on the number of communities and the number of subgraphs, thereby being more beneficial to improving the multi-label image classification performance.
It is worth to be noted that the number of layers of the multi-map convolutional layer is not necessarily limited to three, if the number of layers of the multi-map convolutional neural network is three, the multi-map convolutional neural network needs to correspond to the first three middle layers of the convolutional neural network one by one, and the sub-label correlation is merged into the middle layer of the convolutional neural network; if the number of layers of the multi-map convolutional layer is less than three, selectively fusing the sub-label correlation into some layers of the first three middle layers of the convolutional neural network; but the number of layers of the multi-map convolutional layer cannot be more than three.
If the number of layers of the multi-map convolutional layers is three, the label expression tensors output by each multi-map convolutional layer need to correspond to the first three middle layers of the feature map output by the convolutional neural network one by one, and the transformer decoder is used for integrating the sub-label correlation into the middle layers of the convolutional neural network. Therefore, the dimension of the feature map output by the first three middle levels is F1∈R256×112×112,F2∈R512×56×56,F3∈R1024×28×28And the label expression tensor dimension of the multi-graph convolution layer output is
Figure BDA0003546442290000111
Figure BDA0003546442290000112
If the number of layers of the multi-map convolutional layer is less than three, the eigen map and the label expression tensor dimension are kept consistent by using a similar method.
Compared with the method of matching different communities by using a multiple CNNs method, the method can expand the number of sub-graphs or communities by using smaller calculated amount, and is easier to learn the strong correlation between the label correlation and the sub-labels, thereby improving the classification performance of multi-label images.
The present invention and its embodiments have been described above schematically, without limitation, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, if the person skilled in the art receives the teaching, without departing from the spirit of the invention, the person skilled in the art shall not inventively design the similar structural modes and embodiments to the technical solution, but shall fall within the scope of the invention.

Claims (7)

1. A multi-label image classification method fusing strong correlation among labels is characterized by comprising the following steps:
s1, constructing a label co-occurrence matrix by using the label co-occurrence relation in the data set; dividing the label nodes into M communities by using a community detection algorithm; setting a threshold value to divide the label co-occurrence matrix into M co-occurrence subgraphs;
s2, obtaining an image file of training data, sending the image file into a convolutional neural network to extract image features to obtain a feature map, converting the feature map into M two-dimensional generic feature maps through multiple generic pooling operations, and splicing the two-dimensional generic feature maps in parallel;
s3, acquiring a label embedding matrix, acquiring a label co-occurrence tensor, sending the M label co-occurrence tensors and the label embedding matrix into a multi-graph convolutional neural network to obtain an M-fold label node representation tensor, and fusing the M-fold label node representation tensors into a label node representation matrix by utilizing a label level attention fusion mechanism;
s4, fusing the M-ply label node expression tensor generated by the multi-ply convolutional layer into a feature graph output by the middle layer of the general convolutional neural network by using an attention mechanism;
s5, constructing a community coding M matrix according to M communities divided by the label nodes by a community detection algorithm, multiplying the M matrix by a label node expression matrix fused by a fusion mechanism to obtain a label node expression matrix with community coding, multiplying the matrix by multiple generic feature graphs spliced together in the column direction, and using the obtained prediction label for classification;
and S6, constructing a loss function, wherein the loss function comprises a multi-label classification loss function and an objective function for learning a parameter matrix in the multiple generic pooling layer.
2. The multi-label image classification method fusing strong correlation between labels as claimed in claim 1, wherein step S1 specifically includes the following steps:
(1-1) obtaining the image file of the training data and the label in the training data, and establishing a co-occurrence relation adjacency matrix A belonging to the R for the labelC×CWherein A isij1 denotes the label LiWhen present, label LjIt is also possible that otherwise Aij0, as follows:
firstly, counting the co-occurrence relation of labels in a data set, and constructing a Z matrix, wherein Z belongs to RC×CWherein Z isijIndicating label LiAnd a label LjThe times of common occurrence in the data set are constructed by a Z matrix and a basic conditional probability formula to construct a conditional probability matrix P belonging to RC×C,Pij=P(Lj|Li) Denotes when the label LiWhen present, label LjSetting a threshold value tau to binarize the P matrix and constructing a label co-occurrence matrix A epsilon for RC×CThe following are:
Figure FDA0003546442280000011
(1-2) dividing the label nodes into M communities by using a community detection algorithm, wherein the modularity of the adjacency matrix A is defined as follows:
Figure FDA0003546442280000012
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003546442280000013
di=∑kAi,kis the degree of the ith node, and, δ (c)i,cj) Used for detecting whether the i node and the j node are in the same community, if so, delta (c)i,cj) 1, otherwise, δ (c)i,cj)=0;
Each node is independently used as a community, and in order to maximize modularity, other label nodes are brought into the community;
until the modularity does not change, the M communities are detected, namely, all the label nodes in the data set are divided into the M communities;
(1-3) setting a threshold to construct a tag co-occurrence tensor as follows: setting a threshold T ═ T on the conditional probability matrix P1,...,tm]Wherein, ti∈[0,1]And is
Figure FDA0003546442280000021
Constructing a tag co-occurrence tensor
Figure FDA0003546442280000022
The following were used:
Figure FDA0003546442280000023
wherein the content of the first and second substances,
Figure FDA0003546442280000024
co-occurrence matrix by tag
Figure FDA0003546442280000025
The components of the composition are as follows,
Figure FDA0003546442280000026
representing the kth sub-graph, the label LiWhen present, label LjMay also occur.
3. The multi-label image classification method fusing the strong correlation between labels as claimed in claim 2, wherein: step S2 specifically includes the following steps:
(2-1) acquiring an image file of training data, and inputting the image file into a Resnet-101 convolutional neural network; obtaining a feature map of the final level output by Resnet-101 feature extraction, and representing the feature map as
Figure FDA0003546442280000027
Wherein, H, W and DcRespectively, height, width and channel of the characteristic diagram;
(2-2) changing the original feature map XconvDimension (c):
Xlsp=Wlsp×R(Xconv)
transforming X by a dimension transformation operation R (-)convMapping from three-dimensional tensor to two-dimensional matrix and letting parameter matrix Wlsp∈RC×HWLeft ride
Figure FDA0003546442280000028
Wherein C is the label category number, and a generic feature map of dimension transformation is obtained
Figure FDA0003546442280000029
(2-3) repeating the same method M times, specifically as follows:
Figure FDA00035464422800000210
wherein i is a positive integer between 1 and M, M is the number of label co-occurrence graphs or community number, and M generic feature graphs are combined
Figure FDA00035464422800000211
Splicing according to the column direction to obtain a multiple generic feature map
Figure FDA00035464422800000212
(2-4) learning parameter matrix by using label co-occurrence relation after segmenting subgraph
Figure FDA00035464422800000213
The following were used:
Figure FDA00035464422800000214
wherein the content of the first and second substances,
Figure FDA0003546442280000031
is the ith sub-graph in the tag co-occurrence tensor.
4. The multi-label image classification method fusing strong correlation between labels as claimed in claim 3, wherein step S3 specifically comprises the following steps:
(3-1) Co-occurrence tensor of tags
Figure FDA0003546442280000032
And the label embedding matrix E ∈ RC×DThe method is sent to a multi-graph convolutional neural network, and comprises the following steps:
Figure FDA0003546442280000033
the first layer of input of the multi-graph convolution layer is a label embedding matrix E belonging to RC×DAnd normalized multiple adjacency sub-graph
Figure FDA0003546442280000034
Figure FDA0003546442280000035
The subgraphs are sent into different graph convolution layers, and after the multiple graph convolution layers, a label expression tensor is obtained
Figure FDA0003546442280000036
Wherein L islThe label input for the convolution of the ith layer of graph represents the node dimension of the tensor, wherein l is more than or equal to 2; h (-) is a nonlinear activation function; the above formula is an expression of the multi-graph convolution layer when the multi-graph convolution layer is more than two layers, wherein
Figure FDA0003546442280000037
For a parameter matrix in a graph convolution layer, each layer of nodes represents a dimension represented by a parameter
Figure FDA0003546442280000038
Determining; similar to the above, the first layer map is represented by
Figure FDA0003546442280000039
(3-2) Label node expression tensor learned by multigraph convolutional layer by using label level attention fusion mechanism
Figure FDA00035464422800000310
A label node expression matrix G epsilon R is fused into a wholeC×LThe following:
Figure FDA00035464422800000311
is shown as
Figure FDA00035464422800000312
Wherein the content of the first and second substances,
Figure FDA00035464422800000313
representing the kth node represents a matrix, k is an integer between 1 and M, and the importance of the label node representing the matrix learned by each subgraph is as follows:
Figure FDA00035464422800000314
wherein W is a parameter vector W ∈ R1×CB is an offset vector b ∈ R1×CTan h (-) is an activation function, q is a parameter vector q epsilon R1×C
Figure FDA00035464422800000315
For the k-th label representing matrix the label node representing matrix of the i-th node, wkRepresenting importance weights of the label embedded matrix and the kth sub-graph when the k sub-graph is subjected to graph convolution neural network learning, wherein i is an integer from 1 to C;
calculating the weight w of the good node on each subgraphkThen, normalization is carried out through a softmax () function, and the importance score of the node on the subgraph is calculated as follows:
Figure FDA00035464422800000316
thus, the importance scores (beta) of the nodes on each subgraph are obtained1,...,βM) Respectively multiplying the importance scores by corresponding label node representation matrixes, then summing, and then multiplying by a parameter matrix
Figure FDA00035464422800000317
And performing dimension transformation to obtain a label node representation matrix G after fusion, which is as follows:
Figure FDA0003546442280000041
wherein G ∈ RC×LWherein C is the number of label nodes, and the label nodes represent the dimension L by a parameter matrix
Figure FDA0003546442280000042
And (6) determining.
5. The multi-label image classification method fusing strong correlation between labels as claimed in claim 4, wherein step S4 specifically comprises the following steps:
(4-1) outputting a multi-label node representation matrix from the multi-graph convolutional neural network as
Figure FDA0003546442280000043
Wherein M is the number of subgraphs, C is the number of label categories, LlFor label node representation dimensions output for the ith graph convolution layer, will
Figure FDA0003546442280000044
Fusing the feature map output by Resnet-101 intermediate layer with attention mechanism as follows:
the output characteristic diagram of the first three middle layers of Resnet-101 is
Figure FDA0003546442280000045
Wherein s is more than or equal to 1 and less than or equal to 3, Cs、WsAnd HsThe number, width and height of the channels of the s-th middle layer
Figure FDA0003546442280000046
Is marked as
Figure FDA0003546442280000047
Wherein, Cl=Cs、W=Ws、H=HsL represents FsFused with the label node correlation output by the l-th layer graph convolution neural network,
due to the fact that
Figure FDA0003546442280000048
Middle LlIs determined by a parameter matrix
Figure FDA0003546442280000049
Determine that L can bel=ClP, i.e. Xl∈RP×W×H
Figure FDA00035464422800000410
Figure FDA00035464422800000411
Using dimension transformation operation R (-) to transform XlAnd HlDimensional transformation to XR∈RWH×PAnd
Figure FDA00035464422800000412
into the decoder of the transform model, the tag correlation is fused using the Multi-Head Attention mechanism in the transform decoder, as follows:
Figure FDA00035464422800000413
wherein, Multihead (Q, K, V) Concat (head)1,head2,...,headh)WO,
Wherein the content of the first and second substances,
Figure FDA00035464422800000414
wherein the content of the first and second substances,
Figure FDA00035464422800000415
the obtained characteristics X fused with the label correlationRDoing the inverse operation of dimension transformation R (-) to convert XR∈RWH×PConversion to XM∈RP×W×HThe newly obtained XM∈RP×W×HWith the original feature map Xl∈RP×W×HAdd to Transformer decoder&The Norm layer is used for carrying out normalization and residual error linkage, and then the characteristic diagram X of the tag correlation is blendedM∈RP×W×HFNN layer fed into the transform decoder as follows:
XF=FFN(XM)
wherein ffn (x) max (0, xW)1+b1)W2+b2The inner layer comprises two linear layer linear transformations, and the middle is activated by a Relu activation layer;
the newly obtained XF∈RP×W×HSignature graph X correlated with original merged labelM∈RP×W×HAdd to Transformer decoder&A Norm layer for normalization and residual linking; obtaining the final characteristic diagram X of the correlation of the merged labell∈RP×W×HAnd sending the training to the next layer of the convolutional neural network.
6. The method for classifying multi-label images fused with strong correlation between labels as claimed in claim 5, wherein step S5 comprises the following steps:
(5-1) dividing the labels in the data set into M communities by using a community detection algorithm, so that each label node has a respective community, numbering the communities as p according to a { 1., M } sequence order, wherein M is the number of communities, and storing the community number of each label node to the S E ZCWherein C is the number of label categories, Si=p,i∈{1,...,C},
Obtaining a multi-label generic feature map
Figure FDA0003546442280000051
XmullspCan be expressed as a combination of generic feature maps
Figure FDA0003546442280000052
By using
Figure FDA0003546442280000053
Different communities are distinguished and matched, the pth community is determined to be matched with which generic label graph according to the community code p, and a community code M matrix is constructed as follows:
Figure FDA0003546442280000054
wherein, tau belongs to [0, 1 ];
(5-2) representing the matrix G e R according to the label nodesC×LExpressing the community coding M matrix and the label node to form a matrix G epsilon RC×LMaking Hadamard product and combining with multiple generic feature diagram XmullspPerforming label level multiplication to enable the model to fuse different generic features with label nodes of different communities as follows:
since G is belonged to RC×LL in (1) is a parameter matrix
Figure FDA0003546442280000055
Determine that L and D can be madeCM is equal, then the subgraph is coded
Figure FDA0003546442280000056
And label node representation
Figure FDA0003546442280000057
Multiplying to enable the label nodes to be integrated into community coding information to obtain a matrix
Figure FDA0003546442280000058
The following were used:
W=M·G
(5-3) mixing
Figure FDA0003546442280000059
And
Figure FDA00035464422800000510
performing label-level multiplication, and using the obtained prediction label for final classification, wherein W can be expressed as W ═ W1,...,wC]TMultiple generic feature map XmullspCan be represented as Xmullsp=[x1,...,xC]Wherein C is the number of label categories, the prediction function is,
Figure FDA00035464422800000511
7. the method for classifying multi-label images fused with strong correlation between labels as claimed in claim 6, wherein step S6 comprises the following steps:
the objective function is defined as:
Figure FDA0003546442280000061
where L (-) is a loss function of the multi-label classification, Llsp(theta) is a parameter matrix objective function for learning the multiple generic pooling layers, and theta is a global parameter to be learned;
using a two-class loss function as the multi-label class loss function as follows:
Figure FDA0003546442280000062
objective function L for guiding generic featureslsp(θ) is:
Figure FDA0003546442280000063
thus, the overall loss function is:
Figure FDA0003546442280000064
wherein the parameters
Figure FDA0003546442280000065
Is part of the parameter θ;
by the above steps, we will differentiate the labelsThe strength semantic correlation is merged into the convolutional neural network to obtain a prediction label
Figure FDA0003546442280000066
And calculating the minimum value between the real label and the predicted label by using a loss function, and optimizing by adopting SGD (generalized minimum) to fulfill the aim of improving the classification performance of the multi-label image.
CN202210250180.3A 2022-03-15 2022-03-15 Multi-label image classification method fusing strong correlation among labels Active CN114648635B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210250180.3A CN114648635B (en) 2022-03-15 2022-03-15 Multi-label image classification method fusing strong correlation among labels

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210250180.3A CN114648635B (en) 2022-03-15 2022-03-15 Multi-label image classification method fusing strong correlation among labels

Publications (2)

Publication Number Publication Date
CN114648635A true CN114648635A (en) 2022-06-21
CN114648635B CN114648635B (en) 2024-07-09

Family

ID=81993189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210250180.3A Active CN114648635B (en) 2022-03-15 2022-03-15 Multi-label image classification method fusing strong correlation among labels

Country Status (1)

Country Link
CN (1) CN114648635B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115964626A (en) * 2022-10-27 2023-04-14 河南大学 Community detection method based on dynamic multi-scale feature fusion network
CN117893839A (en) * 2024-03-15 2024-04-16 华东交通大学 Multi-label classification method and system based on graph attention mechanism

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019100724A1 (en) * 2017-11-24 2019-05-31 华为技术有限公司 Method and device for training multi-label classification model
WO2019100723A1 (en) * 2017-11-24 2019-05-31 华为技术有限公司 Method and device for training multi-label classification model
US20200210773A1 (en) * 2019-01-02 2020-07-02 Boe Technology Group Co., Ltd. Neural network for image multi-label identification, related method, medium and device
CN111476315A (en) * 2020-04-27 2020-07-31 中国科学院合肥物质科学研究院 Image multi-label identification method based on statistical correlation and graph convolution technology
CN112308115A (en) * 2020-09-25 2021-02-02 安徽工业大学 Multi-label image deep learning classification method and equipment
CN112906720A (en) * 2021-03-19 2021-06-04 河北工业大学 Multi-label image identification method based on graph attention network
CN113657425A (en) * 2021-06-28 2021-11-16 华南师范大学 Multi-label image classification method based on multi-scale and cross-modal attention mechanism

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019100724A1 (en) * 2017-11-24 2019-05-31 华为技术有限公司 Method and device for training multi-label classification model
WO2019100723A1 (en) * 2017-11-24 2019-05-31 华为技术有限公司 Method and device for training multi-label classification model
US20200210773A1 (en) * 2019-01-02 2020-07-02 Boe Technology Group Co., Ltd. Neural network for image multi-label identification, related method, medium and device
CN111476315A (en) * 2020-04-27 2020-07-31 中国科学院合肥物质科学研究院 Image multi-label identification method based on statistical correlation and graph convolution technology
CN112308115A (en) * 2020-09-25 2021-02-02 安徽工业大学 Multi-label image deep learning classification method and equipment
CN112906720A (en) * 2021-03-19 2021-06-04 河北工业大学 Multi-label image identification method based on graph attention network
CN113657425A (en) * 2021-06-28 2021-11-16 华南师范大学 Multi-label image classification method based on multi-scale and cross-modal attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张辉宜等: "基于图注意力网络的多标签图像分类模型", 重庆工商大学学报, vol. 39, no. 1, 28 February 2022 (2022-02-28), pages 34 - 41 *
陈科峻;张叶;: "循环神经网络多标签航空图像分类", 光学精密工程, no. 06, 9 June 2020 (2020-06-09) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115964626A (en) * 2022-10-27 2023-04-14 河南大学 Community detection method based on dynamic multi-scale feature fusion network
CN117893839A (en) * 2024-03-15 2024-04-16 华东交通大学 Multi-label classification method and system based on graph attention mechanism
CN117893839B (en) * 2024-03-15 2024-06-07 华东交通大学 Multi-label classification method and system based on graph attention mechanism

Also Published As

Publication number Publication date
CN114648635B (en) 2024-07-09

Similar Documents

Publication Publication Date Title
CN110222140B (en) Cross-modal retrieval method based on counterstudy and asymmetric hash
CN108960073B (en) Cross-modal image mode identification method for biomedical literature
CN112085012B (en) Project name and category identification method and device
Sharma et al. A survey of methods, datasets and evaluation metrics for visual question answering
CN112380435A (en) Literature recommendation method and recommendation system based on heterogeneous graph neural network
CN114648635A (en) Multi-label image classification method fusing strong correlation among labels
CN112733866A (en) Network construction method for improving text description correctness of controllable image
CN112199520A (en) Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix
CN111324765A (en) Fine-grained sketch image retrieval method based on depth cascade cross-modal correlation
CN110210534B (en) Multi-packet fusion-based high-resolution remote sensing image scene multi-label classification method
CN114398491A (en) Semantic segmentation image entity relation reasoning method based on knowledge graph
CN114386534A (en) Image augmentation model training method and image classification method based on variational self-encoder and countermeasure generation network
CN115145551A (en) Intelligent auxiliary system for machine learning application low-code development
CN113095314B (en) Formula identification method, device, storage medium and equipment
Menaga et al. Deep learning: a recent computing platform for multimedia information retrieval
CN113159067A (en) Fine-grained image identification method and device based on multi-grained local feature soft association aggregation
CN114863407A (en) Multi-task cold start target detection method based on visual language depth fusion
CN109766918A (en) Conspicuousness object detecting method based on the fusion of multi-level contextual information
CN114328934A (en) Attention mechanism-based multi-label text classification method and system
CN113240033B (en) Visual relation detection method and device based on scene graph high-order semantic structure
CN116187349A (en) Visual question-answering method based on scene graph relation information enhancement
CN117371523A (en) Education knowledge graph construction method and system based on man-machine hybrid enhancement
CN117807232A (en) Commodity classification method, commodity classification model construction method and device
CN115329210A (en) False news detection method based on interactive graph layered pooling
CN114943990A (en) Continuous sign language recognition method and device based on ResNet34 network-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant