CN116935100A - Multi-label image classification method based on feature fusion and self-attention mechanism - Google Patents

Multi-label image classification method based on feature fusion and self-attention mechanism Download PDF

Info

Publication number
CN116935100A
CN116935100A CN202310728668.7A CN202310728668A CN116935100A CN 116935100 A CN116935100 A CN 116935100A CN 202310728668 A CN202310728668 A CN 202310728668A CN 116935100 A CN116935100 A CN 116935100A
Authority
CN
China
Prior art keywords
image
matrix
feature
global
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310728668.7A
Other languages
Chinese (zh)
Inventor
高世杰
韩立新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN202310728668.7A priority Critical patent/CN116935100A/en
Publication of CN116935100A publication Critical patent/CN116935100A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a multi-label image classification method based on feature fusion and self-attention mechanism, which mainly comprises the following steps: extracting global features of the image, and extracting the global features of the image by using a deep convolutional neural network; extracting local features of the image, namely performing convolution operation with a convolution kernel of 1*1 on a feature map generated by the middle layer of the deep convolution neural network to extract the local features of the image; feature fusion, namely fusing global features and local features of the extracted images based on a self-attention mechanism, and generating feature expression under each category; and classifying the image multi-label, and generating the image label through a full-connection layer and a sigmoid activation function based on the fused feature expression. The image multi-label classification method provided by the application provides a mode of fusion of the local characteristics and the global characteristics of the image, can effectively model the visual characteristics of small targets in the image, takes semantic correlation among labels into consideration, and can improve the classification precision of the image multi-label.

Description

Multi-label image classification method based on feature fusion and self-attention mechanism
Technical Field
The application belongs to the field of image recognition, and particularly relates to a multi-label image classification method based on fusion of global features and local features of images and introduction of a self-attention mechanism.
Background
In the information age, images have become a medium and carrier for conveying information and are widely used in various fields. The rapid and accurate classification of mass digital images in the information age is realized, and the method is the main research content in the current image application field. While Convolutional Neural Networks (CNNs) exhibit good performance in single-label image classification tasks, most images in the real world contain more than one scene or object, and an image may be labeled with multiple labels that may correspond to different objects, scenes, actions, and properties in an image.
If the abundant semantic information in the image is to be extracted, a multi-label generation technology of the image is needed to be used, all the categories in the image are identified as accurately as possible, while the traditional classification is often hard classification, namely, one data is only classified into one category, so that the method has exclusivity, and the image labeling is reflected in that one image is labeled with only one label, so that the method has certain limitation. In addition, for a typical multi-label image, objects of different types are located at different positions and have different proportions and postures, and the problems of shielding, overlapping, illumination and the like among the objects can cause higher difficulty in identifying and classifying the multi-label image. The multi-label image classification is a more general and practical problem, models rich semantic information in images and the dependency relationship of the semantic information, efficiently and accurately completes the classification and identification of the multi-label images, becomes an important research direction (see 'Ji Zhong, li Huihui, he Yuqing. Zero-sample multi-label image classification based on depth example differentiation [ J ]. Computer science and exploration, 2019, 13 (1): 9'), and has wide application in a plurality of fields such as image retrieval, portrait grouping, medical image identification, scene understanding and the like.
The success of CNN in single-label image classification provides some insight into solving the problem of multi-label image classification. Thanks to the translational invariance of the convolution operation in CNNs, i.e. it detects these same features wherever the object appears in the image, outputting the same response, also when multiple objects appear in the image. Therefore, the vector output by the full connection layer in the CNN model can be simply converted into a probability value between 0 and 1 by a Sigmoid function, so that the probability that the sample belongs to each category is calculated. The probability distribution of each class output by the model is independent, namely the multi-label problem is decomposed into a plurality of independent two-class problems. However, this approach ignores the semantic correlation between labels, i.e., when an image is labeled with a label, the probability that the image has the contents of another label at the same time is high. Such as sky and clouds, are commonly present together, while water and cars are almost never present together. In addition, in the deep convolutional neural network, although the number of model parameters is reduced by the multiple convolution and pooling operations through a weight sharing and downsampling mode, meanwhile, the receptive field of neurons is continuously enlarged, and the feature map of the deep layer of the model is more reflective to global features of the image, which is beneficial in a single-label image classification task with only single targets in the image, however, in a multi-label image classification task, small targets with different sizes, positions and shapes exist in the image, and local features contained in the small targets are often ignored or diluted under the receptive field with larger deep layer of the model. Therefore, if global features are directly extracted from the whole picture, it is difficult to avoid that visual features of small targets are lost in the process of extracting features, so that multi-label classification accuracy is affected.
The method of multi-tag text classification based on tag semantic attention is proposed by sholin et al (see "sholin, chen Boli, huang Xin, etc.. Multi-tag text classification based on tag semantic attention [ J ]. Software academy, 2020, 31 (4): 11."), relies on the text of a document and the corresponding tags, uses bi-directional long and short term memories to obtain hidden representations of each word, obtains weights of each word in the document by using a tag semantic attention mechanism, and additionally tags tend to be interrelated in semantic space. Zhang Yong et al (see "Zhang Yong, liu Haoke, zhang Jie. Multi-tag classification algorithm based on generic features and instance correlations [ J ]. Pattern recognition and artificial intelligence, 2020, 33 (5): 10.") propose a multi-tag classification algorithm based on generic features and instance correlations that considers not only tag correlations but also instance features' correlations, and learns instance feature space similarity by constructing a similarity graph. Mou Jiapeng et al (see "Mou Jiapeng, cai Jian, yu Mengchi, xu Jian. Generic attribute multi-label classification algorithm based on label correlation [ J ]. Computer applied research 2020, 37 (9): 4.") propose a generic attribute multi-label classification algorithm based on label correlation, which uses the correlation between labels to measure the correlation between labels by using the distance between labels to complete the introduction of label correlation by attaching the correlation labels to the generic attribute space, so as to achieve the purpose of improving classification performance. Chen et al (see "Chen Z M, wei X S, wang P, et al Multi-label image recognition with graph convolutional networks [ C ]// Proceedings of the IEEE/CVF conference on computer vision and pattern reception.2019:5177-5186.") propose to use the graph-rolling network (GCN) to explicitly model the correlation between class labels, learn an interdependent target classifier based on the GCN' S mapping function, and can apply the generated classifier to the image features learned by any CNN model with high expansibility and flexibility. Lanchantin et al (see "Lanchantin J, wang T, ordonez V, et al general Multi-label image classification with transformers [ C ]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recgnitionion.2021:16478-16488.") propose to use a Transformer model and use Label Mask Training training strategy to randomly mask portions of real tags when training, allowing the model to predict the masked tags, thereby exploring complex dependencies between image features and tags and within the tag set.
Aiming at the problem that the common method can lose the visual characteristics of partial small targets in the image in the process of extracting the global characteristics of the image and considering the condition that the dependency relationship exists among the labels in the multi-label classification problem, it is necessary to design an efficient multi-label image classification model so as to effectively model the local characteristics owned by the small targets in the image and the dependency relationship among a plurality of labels.
Disclosure of Invention
In order to overcome the defects of the prior art, the application provides a multi-label image classification method based on feature fusion and a self-attention mechanism.
In order to achieve the above purpose, the application adopts the following technical scheme:
step 1: the ResNet50 model structure and parameters are initialized and 1*1 convolution operations are performed on the feature map output by the third convolution block of ResNet50 to extract image local features. The number of channels of the 1*1 convolution kernel should be consistent with the total number of categories of the current multi-tag classification task.
Step 2: the feature map output by the original ResNet50 model continues to pass through a subsequent convolution block and is subjected to Average Pooling (Average Pooling) to obtain a global feature matrix of the image.
Step 3: in order to discover the dependency relationship between the labels, the feature vectors are fused through a self-attention mechanism, and specifically comprises the following steps:
(1) Flattening the image local feature matrix obtained in the step 1 into a one-dimensional vector in each dimension of the channel dimension respectively; flattening the global feature matrix of the image obtained in the step 2 into a one-dimensional vector; splicing the vectors into a matrix E according to rows, wherein the local eigenvectors are subjected to linear transformation to ensure that the dimensions of the local eigenvectors are consistent with those of the global eigenvectors;
(2) Initializing a weight matrix W Q 、W K 、W V
(3) Respectively combining (1) the feature matrix E with (2) the weight vector W Q 、W K 、W V Multiplying to obtain a Query matrix, a Key matrix and a Value matrix, wherein each row of the matrixThe vectors are all associated with the aforementioned one-dimensional global feature vector or local feature vector.
(4) An attention score is calculated. Multiplying the Query matrix by the transpose of the Key matrix to obtain the attention Score matrix Score, and dividing the attention Score matrix Score by the value(d k The number of columns of the Key matrix) and then normalize each row in the matrix using the Softmax function. At this time, the numerical value of each element in the Score matrix represents the attention Score between every two feature vectors in the feature matrix E;
(5) The attention Score matrix Score is multiplied by the Value matrix, so that each row of vectors in the Value matrix is obtained by weighted summation of the attention Score and other rows of vectors.
Step 4: and (3) inputting the Value matrix into a fully-connected neural network for calculation, and finally generating a vector with the dimension equal to the category number for each image through a Sigmoid activation function, wherein the numerical Value of each dimension of the vector represents the probability that the image belongs to the corresponding category.
The beneficial effects of the application are specifically expressed as follows:
(1) According to the application, the global feature and the local feature of the image are taken into consideration at the same time, so that the problem that the traditional image feature extraction network loses small target feature information is solved to a certain extent;
(2) According to the method, convolution operation is carried out through the convolution kernel with the channel number equal to the 1*1 size of the total category number, and the feature map is independently calculated for each category, so that the classification precision is improved compared with a common method for sharing the feature map;
(3) According to the method, the self-attention mechanism is used for modeling the dependency relationship among the labels, and the semantic relevance of the labels in the multi-label classification problem is utilized for remarkably improving the classification performance of the model;
(4) The method has better anti-interference capability and strong robustness, and can meet the practical multi-label image classification application requirements.
Drawings
FIG. 1 is a flow chart of a multi-label image classification method based on feature fusion and self-attention mechanism.
Fig. 2 is a flow chart of a feature fusion process based on a self-attention mechanism.
FIG. 3 is a schematic diagram of a neural network model of a multi-label image classification method based on feature fusion and self-attention mechanisms.
Detailed Description
The application is further illustrated by the following figures and specific examples, which are intended to be illustrative of the application and not to be limiting of the scope of the application, as various equivalent modifications to the application will fall within the scope of the application as defined by the appended claims, after reading the application.
The following is an example of classifying the total number of classes 5.
S1: the ResNet50 model structure and parameters are initialized, where the parameters refer to weight data from the ResNet50 pre-training on the ImageNet large-scale visual recognition dataset. Thereafter, a 1*1 convolution operation is performed on the feature map output by the third convolution block of ResNet50 to extract image local features. Specifically, the feature map with the shape of 512 x 28 is subjected to convolution operation by using the convolution check with the shape of 5 x 1, so as to obtain the feature map with the shape of 5 x 28, wherein each channel corresponds to a corresponding category.
S2: the feature map output by the original ResNet50 model continues to pass through a subsequent convolution block and is subjected to Average Pooling (Average Pooling) to obtain a global feature map of the image, wherein the shape of the global feature map is 1 x 2048.
S3: in order to discover the dependency relationship between the labels, the feature vectors are fused through a self-attention mechanism, and specifically comprises the following steps:
s31, flattening the image local feature matrix obtained in the step S1 into one-dimensional vectors in each dimension of the channel dimension respectively, namely flattening a feature map formed as 5 x 28 into 5 x 784, and marking as F regional The method comprises the steps of carrying out a first treatment on the surface of the Flattening the image global feature matrix obtained in the step S2 into a 2048-dimensional one-dimensional vector, and marking the vector as F global The method comprises the steps of carrying out a first treatment on the surface of the The vectors are spliced in rows into a matrix E, wherein the local feature vectors should be linearTransforming to make the dimension consistent with the dimension of the global feature vector, wherein the dimension of the E matrix is 6 x 2048;
defining 784 x 2048 dimension parameter matrix θ for feature matrix F regional Performing linear transformation:
F′=F regional ·θ (9)
E=(F′;F global ) concat (10)
s32 initializing 2048 x 512-dimensional weight matrix W Q 、W K 、W V
S33, the feature matrix E of S31 and the weight vector W of S32 are respectively Q 、W K 、W V Multiplying to obtain a Query matrix, a Key matrix and a Value matrix, wherein each row of vectors of the matrix are associated with the one-dimensional global feature vector or the local feature vector. The matrix dimensions are 6 x 512.
The specific calculation method comprises the following steps:
Query=E·W Q (11)
Key=E·W K (12)
Value=E·W V (13)
s34 calculates an attention score. The attention Score matrix Score (6*6) is obtained by multiplying the Query matrix by the transpose of the Key matrix, and dividing by the value(d k The number of columns of the Key matrix) and then normalize each row in the matrix using the Softmax function. At this time, the numerical value of each element in the Score matrix represents the attention Score between every two feature vectors in the feature matrix E;
the specific calculation method comprises the following steps:
s35 multiplying the attention Score matrix Score by the Value matrix, thus, the resulting matrix F mixed Each row of vectors is based on the attention score and Value matrixAnd the corresponding other row vectors are weighted and summed.
F mixed =Score·Value (15)
S4: will F mixed The matrix is input into a fully-connected neural network for calculation, finally, a vector with the dimension equal to the category number is generated for each image through a Sigmoid activation function, and the numerical value of each dimension of the vector represents the probability that the image belongs to the corresponding category. The fully-connected neural network is specifically defined as follows, wherein the parameter matrix ω c Is 512 x 256, ω b Is 256 x 64, omega a Is 64 x 5:
out=Sigmoid(((F mixed ·ω c )x elu ·ω b ) relu ·ω a ) (16)。

Claims (5)

1. a multi-label image classification method based on feature fusion and self-attention mechanism is characterized by comprising the following steps:
step 1: initializing a model and extracting local features of an image;
step 2: extracting global features of the image;
step 3: fusing global features and local features of the image through a self-attention mechanism;
step 4: and based on the fused characteristics, performing image multi-label classification by using a fully connected neural network.
2. The multi-label image classification method based on feature fusion and self-attention mechanism according to claim 1, wherein the step 1 initializes a model, extracts image local features, and the method comprises the following steps:
the ResNet50 model structure and parameters are initialized, where the parameters refer to weight data from the ResNet50 pre-training on the ImageNet large-scale visual recognition dataset. Thereafter, a 1*1 convolution operation is performed on the feature map output by the third convolution block of ResNet50 to extract image local features.
3. The multi-label image classification method based on feature fusion and self-attention mechanism according to claim 1, wherein the step 2 extracts global features of the image, and the method is as follows:
and (3) continuously passing the feature map output by the original ResNet50 model in the step (1) through a subsequent convolution block, and carrying out Average Pooling (Average Pooling) to obtain a global feature matrix of the image.
4. The multi-label image classification method based on feature fusion and self-attention mechanism according to claim 1, wherein the step 3 fuses the global features and the local features of the image by the self-attention mechanism, and the method is as follows:
(1) Flattening the image local feature matrix obtained in the step 2 into one-dimensional vectors in each dimension of the channel dimension respectively, and marking as F regional The method comprises the steps of carrying out a first treatment on the surface of the Flattening the global feature matrix of the image obtained in the step 3 into a one-dimensional vector, and marking the one-dimensional vector as F global The method comprises the steps of carrying out a first treatment on the surface of the Splicing the vectors into a matrix E according to rows, wherein the local eigenvectors are subjected to linear transformation to ensure that the dimensions of the local eigenvectors are consistent with those of the global eigenvectors; defining a parameter matrix theta and a characteristic matrix F regional Performing linear transformation:
F′=F regional ·Θ (1)
E=(F′;F global ) concat (2)
(2) Initializing a weight matrix W Q 、W K 、W V
(3) Respectively combining (1) the feature matrix E with (2) the weight vector W Q 、W K 、W V Multiplying to obtain a Query matrix, a Key matrix and a Value matrix, wherein each row of vectors of the matrix are associated with the one-dimensional global feature vector or the local feature vector.
The specific calculation method comprises the following steps:
Query=E·W Q (3)
Key=E·W K (4)
Value=E·W V (5)
(4) An attention score is calculated. The Query matrix and KeyTransposed multiplication of the matrices to obtain the attention Score matrix Score, and division by the value(d k The number of columns of the Key matrix) and then normalize each row in the matrix using the Softmax function. At this time, the numerical value of each element in the Score matrix represents the attention Score between every two feature vectors in the feature matrix E;
the specific calculation method comprises the following steps:
(5) Multiplying the attention Score matrix Score by the Value matrix, so that the resulting matrix F mixed Each row vector in the Value matrix is obtained by weighted summation of the attention score and other corresponding row vectors in the Value matrix.
F mixed =Score·Value (7)
5. The method for classifying images according to claim 1, wherein the step 4 is based on the fused features and uses a fully connected neural network to classify the images, and the method is as follows:
will F mixed The matrix is input into a fully-connected neural network for calculation, finally, a vector with the dimension equal to the category number is generated for each image through a Sigmoid activation function, and the numerical value of each dimension of the vector represents the probability that the image belongs to the corresponding category. The fully-connected neural network is specifically defined as follows:
out=Sigmoid(((F mixed ·ω c ) relu ·ω b ) relu ·ω a ) (8)。
CN202310728668.7A 2023-06-19 2023-06-19 Multi-label image classification method based on feature fusion and self-attention mechanism Pending CN116935100A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310728668.7A CN116935100A (en) 2023-06-19 2023-06-19 Multi-label image classification method based on feature fusion and self-attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310728668.7A CN116935100A (en) 2023-06-19 2023-06-19 Multi-label image classification method based on feature fusion and self-attention mechanism

Publications (1)

Publication Number Publication Date
CN116935100A true CN116935100A (en) 2023-10-24

Family

ID=88388585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310728668.7A Pending CN116935100A (en) 2023-06-19 2023-06-19 Multi-label image classification method based on feature fusion and self-attention mechanism

Country Status (1)

Country Link
CN (1) CN116935100A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117876797A (en) * 2024-03-11 2024-04-12 中国地质大学(武汉) Image multi-label classification method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117876797A (en) * 2024-03-11 2024-04-12 中国地质大学(武汉) Image multi-label classification method, device and storage medium
CN117876797B (en) * 2024-03-11 2024-06-04 中国地质大学(武汉) Image multi-label classification method, device and storage medium

Similar Documents

Publication Publication Date Title
Han et al. A unified metric learning-based framework for co-saliency detection
CN110866140B (en) Image feature extraction model training method, image searching method and computer equipment
Zhang et al. Lightweight deep network for traffic sign classification
Zhao et al. Recurrent attention model for pedestrian attribute recognition
CN111639544B (en) Expression recognition method based on multi-branch cross-connection convolutional neural network
Xue et al. DIOD: Fast and efficient weakly semi-supervised deep complex ISAR object detection
Lin et al. RSCM: Region selection and concurrency model for multi-class weather recognition
CN111783831B (en) Complex image accurate classification method based on multi-source multi-label shared subspace learning
CN111582409A (en) Training method of image label classification network, image label classification method and device
Ji et al. Combining multilevel features for remote sensing image scene classification with attention model
Rad et al. Image annotation using multi-view non-negative matrix factorization with different number of basis vectors
Ajmal et al. Convolutional neural network based image segmentation: a review
Xia et al. Weakly supervised multimodal kernel for categorizing aerial photographs
Sun et al. Scene categorization using deeply learned gaze shifting kernel
CN113868448A (en) Fine-grained scene level sketch-based image retrieval method and system
Zhang et al. Bioinspired scene classification by deep active learning with remote sensing applications
Bouchakwa et al. A review on visual content-based and users’ tags-based image annotation: methods and techniques
CN116935100A (en) Multi-label image classification method based on feature fusion and self-attention mechanism
CN113642602B (en) Multi-label image classification method based on global and local label relation
Kumar et al. Logo detection using weakly supervised saliency map
Nie et al. Multi-label image recognition with attentive transformer-localizer module
Juyal et al. Multilabel image classification using the CNN and DC-CNN model on Pascal VOC 2012 dataset
Sun et al. A novel semantics-preserving hashing for fine-grained image retrieval
Liu et al. Iterative deep neighborhood: a deep learning model which involves both input data points and their neighbors
Oluwasanmi et al. Attentively conditioned generative adversarial network for semantic segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication