CN115830402A - Fine-grained image recognition classification model training method, device and equipment - Google Patents

Fine-grained image recognition classification model training method, device and equipment Download PDF

Info

Publication number
CN115830402A
CN115830402A CN202310140142.7A CN202310140142A CN115830402A CN 115830402 A CN115830402 A CN 115830402A CN 202310140142 A CN202310140142 A CN 202310140142A CN 115830402 A CN115830402 A CN 115830402A
Authority
CN
China
Prior art keywords
attention
classification
fine
self
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310140142.7A
Other languages
Chinese (zh)
Other versions
CN115830402B (en
Inventor
余鹰
王景辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Jiaotong University
Original Assignee
East China Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Jiaotong University filed Critical East China Jiaotong University
Priority to CN202310140142.7A priority Critical patent/CN115830402B/en
Publication of CN115830402A publication Critical patent/CN115830402A/en
Application granted granted Critical
Publication of CN115830402B publication Critical patent/CN115830402B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a method, a device and equipment for training a fine-grained image recognition classification model, wherein the method comprises the following steps: inputting the fine-grained image into a preset network model for training, wherein the preset network model comprises a plurality of self-attention layers; acquiring classification vectors obtained by learning fine-grained images by a preset number of target self-attention layers; inputting the classification vector of each target self-attention layer into a preset classifier, outputting a classification label of each target self-attention layer, and performing loss calculation on the classification label of each target self-attention layer and a preset real label; and updating network parameters through a back propagation mechanism respectively according to the loss value of each target self-attention layer. By introducing a progressive training mechanism, the method is beneficial to mining complementary information in classification vectors of different levels and using the complementary information for classification; and a multi-scale module is also provided, so that the complementary communication of the global information and the local information is realized, and the fine-grained image classification effect is improved.

Description

Fine-grained image recognition classification model training method, device and equipment
Technical Field
The invention relates to the technical field of model training, in particular to a method, a device and equipment for training a fine-grained image recognition classification model.
Background
Fine-grained image classification aims at identifying sub-categories within the same parent category. For example, benz and Audi belonging to the same vehicle category, blue crow and parrot belonging to the same bird category, labrador retriever and golden hair belonging to the same dog category, etc. The fine-grained image classification technology is of great interest because of its many practical meanings in the aspects of face recognition, traffic vehicle recognition, intelligent retail goods, agricultural disease recognition research, endangered animal protection and the like.
However, unlike the conventional image classification problem, the training data set picture for fine-grained image classification often has a great discriminative significance only in a local fine region. The existing fine-grained image classification models are roughly divided into two types: a strong supervision model and a weak supervision model. The strong supervision model depends on fine image labeling (such as manual labeling boxes, key point information and the like), the accurate and fine labeling information is mostly obtained through expert labeling in different aspects, and in addition, due to the large sample data set, the labeling work needs to consume a large amount of time and energy. Furthermore, annotation information can be subjectively affected and subject to error. Recently, a work based on weak supervision is attracting the attention of researchers, and the method does not need additional image labeling, namely, an image-level label is used as a supervision signal. For example, a Vision Transformer (Vision self-attention model, viT for short) recently proposed by Google is very colorful in the field of computer images, and a good effect can be achieved in fine-grained image classification only by using a simple ViT, but the realization of fine-grained image classification still has a defect.
Thus, many researchers are also proposing a variety of ViT-based variants with some success. However, most of the existing ViT-based work is to migrate the existing ideas of the convolutional neural network, and the thinking of a unique multi-point attention mechanism in the ViT structure is lacked. Most of recent ViT works on picture vectors (patch tokens) and a multi-attention mechanism, but neglects the importance of classification vectors (class tokens) in classification. The existing ViT and some ViT variants only consider the beneficial information learned by the last attention layer for classification, but ignore the complementary information learned by other layers, which will cause a certain loss of information, resulting in a lack of precision effect of fine-grained image classification of the model.
Disclosure of Invention
Based on this, the present invention provides a method, an apparatus, and a device for training a fine-grained image recognition classification model to solve at least one technical problem in the prior art.
The invention provides a fine-grained image recognition classification model training method, which comprises the following steps:
obtaining a fine-grained image for model training, and inputting the fine-grained image into a preset network model for training, wherein the preset network model comprises a plurality of self-attention layers, and the fine-grained image sequentially passes through each self-attention layer so as to perform classification vector learning on the fine-grained image through the self-attention layers;
obtaining classification vectors obtained by learning the fine-grained images by a preset number of target self-attention layers, wherein the target self-attention layers are positioned at the rear ends of the multiple self-attention layers;
inputting the classification vector of each target self-attention layer into a preset classifier, outputting a classification label of each target self-attention layer, and performing loss calculation on the classification label of each target self-attention layer and a preset real label to obtain a loss value of each target self-attention layer;
and updating network parameters through a back propagation mechanism according to the loss value of each target self-attention layer so as to train the fine-grained image recognition classification model.
In addition, the fine-grained image recognition classification model training method according to the above embodiment of the present invention may further have the following additional technical features:
further, still include:
calculating a final attention weight matrix of the self-attention layer after classification vector learning is carried out on the fine-grained image according to a preset calculation rule;
determining the position of a classification target according to the final self-attention weight matrix, and intercepting a classification target area image from the fine-grained image according to the position of the classification target;
and scaling the classified target area image to be the same as the fine-grained image in size, and inputting the image into the preset network model for training so as to intensively train the fine-grained image recognition classification model.
Further, the preset network model further includes a linear projection layer and a position coding layer, and the step of inputting the fine-grained image into the preset network model for training includes:
dividing the fine-grained image into preset sub-images according to a preset division rule, and mapping each sub-image to a high-dimensional feature space through the linear projection layer to obtain a picture vector of each sub-image;
coding the picture vectors of each sub-picture through the position coding layer to add position coding information to each picture vector, and adding an empty classification vector in front of the first picture vector to obtain a vector sequence;
and inputting the vector sequence into the multi-layer self-attention layer for classification vector learning, wherein the classification features learned by each layer of self-attention layer are updated in the classification vectors of the vector sequence to obtain the classification vectors of each layer of self-attention layer.
Further, the self-attention layer comprises a plurality of attention heads, and the step of calculating a final attention weight matrix of the self-attention layer after the classification vector learning of the fine-grained image according to a preset calculation rule comprises:
after the classification vector learning is carried out on the fine-grained image, in each attention head, the attention weight of the classification vector and each picture vector in the current layer is respectively calculated, and an attention weight matrix corresponding to each attention head is obtained;
and performing dot product calculation on the attention weight matrixes of all the attention heads to obtain the final attention weight matrix.
Further, the formula for calculating the attention weight is:
Figure SMS_1
in the formula ,
Figure SMS_2
is as followslThe first of the attention headsiAttention weights for the picture vector and the classification vector,Qin order to query the vector, the query vector,Kis a vector of the keys, and is,Vis a vector of values that is a function of,d k to take care of the mapping space dimensions of the head of attention,Ttransposing the matrix; wherein the attention weight matrixAExpressed as:
Figure SMS_3
wherein ,l∈1,2,…,Li∈1,2,…,KLrepresenting the number of heads of attention,Krepresenting the number of picture vectors.
Further, the step of determining the position of the classification target according to the final self-attention weight matrix includes:
calculating an average value of all attention weights in the final attention weight matrix;
comparing each attention weight in the final attention weight matrix to the average value, with attention weights greater than the average value being flagged as a first threshold and otherwise as a second threshold;
and determining the position of the classification target according to the position coding information of the target picture vector with the attention weight of the classification vector as a first threshold value.
Further, the step of respectively performing loss calculation on the classification label of each target self-attention layer and a preset real label to obtain a loss value of each target self-attention layer includes:
respectively carrying out cross entropy loss calculation on the classification label of each target self-attention layer and a preset real label to obtain a loss value of each target self-attention layer;
wherein the formula of the cross entropy loss calculation is as follows:
Figure SMS_4
in the formula ,y r is as followsrThe individual objects are from the category label of the attention layer,yin order to preset the real tag,LOSS CE (y r ,y) Is as followsrThe cross entropy loss value of the classification label of each target self-attention layer and a preset real label, the preset number being 3,r∈1,2,3。
the invention provides a fine-grained image recognition classification model training system, which comprises:
the image acquisition module is used for acquiring a fine-grained image for model training and inputting the fine-grained image into a preset network model for training, wherein the preset network model comprises a plurality of self-attention layers, and the fine-grained image sequentially passes through each self-attention layer so as to perform classification vector learning on the fine-grained image through the self-attention layers;
a vector acquisition module, configured to acquire classification vectors obtained by learning the fine-grained images through a preset number of target self-attention layers, where the target self-attention layers are located at the rear end of the multiple self-attention layers;
the loss calculation module is used for inputting the classification vector of each target self-attention layer into a preset classifier, outputting the classification label of each target self-attention layer, and performing loss calculation on the classification label of each target self-attention layer and a preset real label to obtain a loss value of each target self-attention layer;
and the progressive training module is used for updating network parameters through a back propagation mechanism respectively according to the loss value of each target self-attention layer so as to train the fine-grained image recognition classification model.
The present invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the fine-grained image recognition classification model training method described above.
The invention also provides fine-grained image recognition classification model training equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the fine-grained image recognition classification model training method when executing the program.
The invention has the beneficial effects that: by improving the traditional ViT structure and introducing a progressive training mechanism, classification vectors of different levels in the ViT structure are selected, the beneficial information learned by the last attention layer is not simply noticed, the importance of the classification vectors in classification is also noticed, the learned information can be well transmitted upwards, the complementary information in the classification vectors of different levels can be favorably mined and used for classification, and the precision effect of fine-grained image classification is improved.
Drawings
FIG. 1 is a photograph of a California gull as provided in an embodiment of the present invention;
FIG. 2 is a photograph of a gull of the north pole provided in an embodiment of the present invention;
FIG. 3 is a flowchart of a fine-grained image recognition classification model training method according to a first embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an improved ViT model provided in an embodiment of the present invention;
fig. 5 is a block diagram of a fine-grained image recognition classification model training system according to a third embodiment of the present invention.
The following detailed description will further illustrate the invention in conjunction with the above-described figures.
Detailed Description
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Several embodiments of the invention are presented in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Fine-grained image classification aims at identifying sub-categories within the same parent category. For example, benz and Audi belonging to the same vehicle category, blue crow and parrot belonging to the same bird category, labrador retriever and golden hair belonging to the same dog category, etc. The fine-grained image classification technology is of great interest because of its many practical meanings in the aspects of face recognition, traffic vehicle recognition, intelligent retail goods, agricultural disease recognition research, endangered animal protection and the like. However, unlike the conventional image classification problem, the training data set picture for fine-grained image classification often has a great discriminative significance only in a local fine region. As shown in fig. 1 to 2, fig. 1 is a california seagull, and fig. 2 is a arctic gull. Although the two types of seagulls are different, the two types of seagulls are very similar to each other and are difficult to distinguish by naked eyes of ordinary people. Moreover, the seagulls belonging to the same kind are difficult to judge whether the seagulls belong to the same kind or not due to the problems of shooting angle, illumination, flying posture and the like. Because the specific intra-class difference is large and the inter-class difference is small, the fine-grained image identification is more difficult and more challenging than the traditional image classification.
The Vision Transformer (ViT, visual self-attention model) proposed by Google recently has great diversity in the field of computer images, and a good effect can be achieved in fine-grained image classification only by using a simple ViT, but the fine-grained image classification still has a defect to be realized. Thus, many researchers are also proposing a variety of ViT-based variants with some success. However, most of the existing ViT-based work migrates the existing ideas of the convolutional neural network, and the thinking of a unique multi-attention mechanism in the ViT structure is lacked. Most of recent ViT works on picture vectors (patch tokens) and multi-attention mechanisms, but neglects the importance of classification vectors (classttokens) in classification. And the existing ViT and some ViT variants only consider the beneficial information learned by the last attention layer for classification, but ignore the complementary information learned by other layers, which will cause a certain loss of information, resulting in a lack of precision effect of fine-grained image classification of the model.
Based on the above, the invention aims to improve the traditional ViT structure and provide a set of brand-new training method for the fine-grained image classification model, so that the fine-grained image classification model obtained by training has better classification precision, and the classification effect of the model is improved. The embodiments will be described in detail with reference to the following specific examples.
Example one
Referring to fig. 3, a fine-grained image recognition classification model training method according to a first embodiment of the present invention is shown, where the fine-grained image recognition classification model training method can be implemented by software and/or hardware, and the method includes steps S01 to S04.
Step S01, obtaining a fine-grained image for model training, inputting the fine-grained image into a preset network model for training, wherein the preset network model comprises a plurality of self-attention layers, and the fine-grained image passes through each self-attention layer in sequence so as to perform classification vector learning on the fine-grained image through the self-attention layers.
In this embodiment, the preset network model is specifically an improved ViT model, please refer to fig. 4, where the improved ViT model includes multiple self-attention layers (transformer layers), where the last three self-attention layers are respectively connected to MLPHead sorting heads, and the three MLPHead sorting heads in the drawing are respectively labeled as MLP1, MLP2, and MLP3, so that the sorting vectors (classtocken) learned by the self-attention layers can output corresponding sorting results through the corresponding MLPHead sorting heads.
In specific implementation, a large number of different types of fine-grained pictures can be collected, the same type of pictures can be classified into one type, and the same real label is preset for each type of picture, for example, a large number of gull pictures and a large number of california gull pictures are collected, the gull pictures are classified into one type and are endowed with real labels capable of representing gull characteristics, and the california gull pictures are classified into one type and are endowed with real labels capable of representing gull characteristics. And then, respectively using different types of fine-grained pictures as training sets to train the improved ViT model, wherein the fine-grained pictures are sequentially input into each self-attention layer of the ViT model during training so as to learn the classification vectors of the fine-grained pictures through the self-attention layers. Preferably, in practical implementation, the real label may be a label of each category, for example, the value of the gull real label is 1, the value of the gull real label is 2, or the real label may be a name, a fine-grained characteristic, or other identifying information of each category.
And S02, acquiring classification vectors obtained by learning the fine-grained images by a preset number of target self-attention layers, wherein the target self-attention layers are positioned at the rear ends of the multiple self-attention layers.
Step S03, inputting the classification vector of each target self-attention layer into a preset classifier, outputting the classification label of each target self-attention layer, and performing loss calculation on the classification label of each target self-attention layer and a preset real label to obtain a loss value of each target self-attention layer.
In this embodiment, the predetermined classifier is an MLPHead classification header.
And S04, updating network parameters through a back propagation mechanism respectively according to the loss value of each target self-attention layer so as to train the fine-grained image recognition classification model.
In specific implementation, the last three self-attention layers are specifically selected as target self-attention layers, that is, the preset number is three, then classification vectors learned by the last three self-attention layers are classified and output through corresponding MLPHead classification heads, so as to obtain classification labels of each target self-attention layer, then the classification labels of each target self-attention layer and loss values of real labels are respectively calculated, so as to obtain loss values of the last three self-attention layers, then network parameters of the previous self-attention layers are iteratively modified by using the loss values of each self-attention layer and a back propagation mechanism, so as to finally train a fine-grained image recognition classification model capable of accurately performing fine-grained image recognition classification. Of course, in other embodiments, other numbers and/or other locations of self-attention layers may be used for classification, such as selecting the last four self-attention layers as the target self-attention layer.
Specifically, as a preferred embodiment, the step of performing loss calculation on the classification label and a preset real label of each target self-attention layer respectively to obtain a loss value of each target self-attention layer includes:
respectively carrying out cross entropy loss calculation on the classification label of each target self-attention layer and a preset real label to obtain a loss value of each target self-attention layer;
wherein the formula of the cross entropy loss calculation is as follows:
Figure SMS_5
in the formula ,y r is as followsrThe individual objects are from the category label of the attention layer,yin order to preset the real tag,LOSS CE (y r ,y) Is as followsrThe cross entropy loss value of the classification label of each target self-attention layer and a preset real label, the preset number being 3,r∈1,2,3。
that is, the last three self-attention layers are selected on the basis of the structure of the conventional ViT and are respectively connected to the MLPHead classification head, so that the beneficial information learned by the last three self-attention layers is selected for classification, and accordingly, a progressive step-by-step training mode is proposed, that is, the loss values of the last three self-attention layers are respectively used for updating the network parameters through a back propagation mechanism, so that the model is guided to learn the multi-layer complementary information. It should be noted that the training method in this embodiment is not simple to superimpose and back-propagate the losses of different layers. Instead, the parameters are propagated back and updated separately for each loss, which helps the different layers of the model to work better in concert. In addition, the information learned by the bottom layer is transmitted to the upper layer in a progressive mode, and model learning and convergence are facilitated.
In summary, in the fine-grained image recognition classification model training method in the above embodiment of the present invention, a traditional ViT structure is improved, a progressive training mechanism is introduced, classification vectors of different levels in the ViT structure are selected, instead of simply paying attention to only the beneficial information learned by the last attention layer, and paying attention to the importance of the classification vectors in classification, the learned information can be well transmitted upwards, which is beneficial to mining complementary information in the classification vectors of different levels and using the complementary information for classification, so as to improve the precision effect of fine-grained image classification.
Example two
A second embodiment of the present invention also provides a fine-grained image recognition classification model training method, which may be implemented by software and/or hardware, and please refer to fig. 4, where the improved ViT model further includes a linear projection layer (linear projection of Flattened Patches), a position encoding layer (PositionEmbedding), and a multi-scale module, where the fine-grained image is subjected to high-dimensional feature space mapping by the linear projection layer to obtain a corresponding picture vector, and then the picture vector is subjected to position encoding by the position encoding layer and then input into a following multi-layer self-attention layer (transformer layer), and meanwhile, on the basis of the conventional ViT structure, the embodiment further adds a multi-scale module, as shown in fig. 4, the multi-scale module is located on the right side of the multi-layer self-attention layer, and each self-attention layer is connected to the multi-scale module. It can be understood that the multi-head attention mechanism of ViT makes the ViT inherently pay more attention to global information, and some discriminant regions tend to be tiny local regions in fine-grained image recognition, so in order to better prompt the model to learn some salient region information, this embodiment proposes a multi-scale module, which has a main function of multiplying the attention weight of each layer by the multi-head attention of the attention layer and mapping the result to the original image, thereby finding the local discriminant region corresponding to each attention layer, then intercepting the corresponding local discriminant region from the original image, and then re-inputting the local discriminant region image intercepted by each layer into the model for training, helping the model find the discriminant local region on the basis of learning the global information, thereby implementing complementary communication between the global information and the local information, and improving the classification effect of the fine-grained image. Specifically, in this embodiment, the fine-grained image recognition classification model training method specifically includes steps S11 to S16.
And S11, segmenting the fine-grained image into preset sub-images according to a preset segmentation rule, and mapping each sub-image to a high-dimensional feature space through the linear projection layer to obtain a picture vector of each sub-image.
In a specific implementation, each fine-grained image may be equally divided into a preset number of sub-images according to a preset division size, for example, the fine-grained image may be equally divided into 9 number of sub-images according to a division size of 3 × 3, and each sub-image corresponds to one picture vector.
And S12, coding the picture vectors of each sub-image through the position coding layer to add position coding information to each picture vector, and adding an empty classification vector in front of the first picture vector to obtain a vector sequence.
In some alternative embodiments, the position coding information may be position coordinate information of the sub-image in the whole fine-grained image, and since the picture segmentation rule is known, the position coordinate information of each sub-image in the whole fine-grained image is also known. Or in other alternative embodiments, the position coding information may be the number of the sub-images, and specifically, each sub-image may be numbered according to the sequence of the segmentation.
Step S13, inputting the vector sequence into the multiple layers of self-attention layers for classification vector learning, wherein the classification features learned by each layer of self-attention layers are updated in the classification vectors of the vector sequence, so as to obtain the classification vector of each layer of self-attention layers.
And S14, selecting the last three self-attention layers as target self-attention layers, and acquiring a classification vector obtained by each target self-attention layer through learning the fine-grained image.
Step S15, inputting the classification vector of each target self-attention layer into a preset classifier, outputting the classification label of each target self-attention layer, and performing cross entropy loss calculation on the classification label of each target self-attention layer and a preset real label to obtain a loss value of each target self-attention layer.
And S16, updating network parameters through a back propagation mechanism respectively according to the loss value of each target self-attention layer so as to train the fine-grained image recognition classification model.
In addition, while performing the above model training, the fine-grained image recognition classification model training method of this embodiment further includes:
calculating a final attention weight matrix of the self-attention layer after classification vector learning is carried out on the fine-grained image according to a preset calculation rule;
determining the position of a classification target according to the final self-attention weight matrix, and intercepting a classification target area image from the fine-grained image according to the position of the classification target;
and scaling the classified target area image to be the same as the fine-grained image in size, and inputting the image into the preset network model for training so as to intensively train the fine-grained image recognition classification model.
It should be understood that the above-mentioned intensive training is part of the model training corresponding to the multi-scale module, which can be performed simultaneously during the above-mentioned progressive training process.
The step of calculating a final attention weight matrix of the self-attention layer after the classification vector learning of the fine-grained image according to a preset calculation rule specifically includes:
after the classification vector learning is carried out on the fine-grained image, in each attention head, the attention weight of the classification vector and each picture vector in the current layer is respectively calculated, and an attention weight matrix corresponding to each attention head is obtained;
and performing dot multiplication (namely matrix multiplication) on the attention weight matrixes of all the attention heads to obtain the final attention weight matrix. Wherein, the calculation formula of the attention weight is as follows:
Figure SMS_6
in the formula ,
Figure SMS_7
is a firstlThe first of the attention headsiAttention weights for the picture vectors and the classification vectors,Qin order to query the vector, the query vector,Kis a vector of the keys, and is,Vin the form of a vector of values,d k to take care of the mapping space dimensions of the head of attention,Ttransposing the matrix; in specific implementation, the classification vector and the picture vector are divided into three parts, one part is a query vectorQOne is a key vectorKOne is a value vectorVThen, according to the classification vector and the corresponding query vector of the picture vectorQKey vectorKVector of sum valuesVAnd calculating the degree of relation between the classification vector and the picture vector to obtain the attention weight.
Wherein the attention weight matrixAExpressed as:
Figure SMS_8
wherein ,l∈1,2,…,Li∈1,2,…,KLrepresenting the number of heads of attention,Krepresenting the number of picture vectors.
Based on this, the step of determining the position of the classification target according to the final self-attention weight matrix specifically includes:
calculating an average value of all attention weights in the final attention weight matrix;
comparing each attention weight in the final attention weight matrix with the average value, wherein the attention weight larger than the average value is marked as a first threshold, otherwise, the attention weight is marked as a second threshold, and in particular implementation, the first threshold can be set to be 1, and the second threshold can be set to be 0;
and determining the position of the classification target according to the position coding information of the target picture vector with the attention weight of the classification vector as a first threshold.
It should be noted that, because the classification vector is obtained by performing classification learning on the whole fine-grained image from the attention layer, and pays more attention to the location area of the classification target (e.g., gull arctica) in the whole fine-grained image, the target picture vector whose attention weight is the first threshold with the classification vector is necessarily a picture that is closer to or belongs to the location of the classification target, so that the location of the classification target can be determined according to the location encoding information of the target picture vector whose attention weight is the first threshold with the classification vector, and the classification target area image where the classification target is located can be mapped to the original image to intercept the classification target area image where the classification target is located, and then the classification target area image is used for strengthening training to find the discriminant local area on the basis of learning the global information, thereby implementing complementary communication between the global information and the local information, and further improving the classification effect of the fine-grained image.
Compared with the traditional ViT structure and other ViT variants, the model provided by the embodiment has at least the following advantages, and the performance and the accuracy of a fine-grained image classification task can be effectively improved. The method comprises the following specific steps:
1) The embodiment provides a training method of a fine-grained image recognition classification model, which can perform end-to-end training and can perform training only by using picture-level labels;
2) The conventional ViT structure is improved, progressive training is introduced, and classification vectors of different levels in the ViT structure are selected, so that learned information can be well transmitted upwards, and the complementary information in the classification vectors of different levels can be mined and used for classification;
3) The embodiment provides the multi-scale module, which helps the model to learn the global information and find the discriminant local area, so that the complementary communication of the global information and the local information is realized, and the fine-grained image classification effect is improved.
EXAMPLE III
Another aspect of the present invention further provides a fine-grained image recognition classification model training system, referring to fig. 5, which is a fine-grained image recognition classification model training system according to a third embodiment of the present invention, and the fine-grained image recognition classification model training system includes:
the image acquisition module 11 is configured to acquire a fine-grained image for model training, and input the fine-grained image into a preset network model for training, where the preset network model includes multiple self-attention layers, and the fine-grained image sequentially passes through each self-attention layer to perform classification vector learning on the fine-grained image through the self-attention layers;
a vector obtaining module 12, configured to obtain classification vectors obtained by learning the fine-grained images through a preset number of target self-attention layers, where the target self-attention layers are located at the rear end of the multiple self-attention layers;
the loss calculation module 13 is configured to input the classification vector of each target self-attention layer into a preset classifier, output a classification label of each target self-attention layer, and perform loss calculation on the classification label of each target self-attention layer and a preset real label to obtain a loss value of each target self-attention layer;
and the progressive training module 14 is configured to update network parameters through a back propagation mechanism according to the loss value of each target self-attention layer, so as to train the fine-grained image recognition classification model.
Further, in some optional embodiments of the invention, the system further comprises:
the multi-scale training module is used for calculating a final attention weight matrix of the self-attention layer after classification vector learning is carried out on the fine-grained image according to a preset calculation rule; determining the position of a classification target according to the final self-attention weight matrix, and intercepting a classification target area image from the fine-grained image according to the position of the classification target; and scaling the classified target area image to be the same as the fine-grained image in size, and inputting the image into the preset network model for training so as to intensively train the fine-grained image recognition classification model.
Further, in some optional embodiments of the present invention, the preset network model further includes a linear projection layer and a position coding layer, and the image obtaining module 11 is further configured to segment the fine-grained image into preset sub-images according to a preset segmentation rule, and map each sub-image to a high-dimensional feature space through the linear projection layer, so as to obtain a picture vector of each sub-image; coding the picture vectors of each sub-picture through the position coding layer to add position coding information to each picture vector, and adding an empty classification vector in front of the first picture vector to obtain a vector sequence; and inputting the vector sequence into the multi-layer self-attention layer for classification vector learning, wherein the classification features learned by each layer of self-attention layer are updated in the classification vectors of the vector sequence to obtain the classification vectors of each layer of self-attention layer.
Further, in some optional embodiments of the present invention, the self-attention layer includes a plurality of attention heads, and the multi-scale training module is further configured to, after performing classification vector learning on the fine-grained image, respectively calculate attention weights of a classification vector and each picture vector in the self-attention layer in each of the attention heads, and obtain an attention weight matrix corresponding to each of the attention heads; and performing dot product calculation on the attention weight matrixes of all the attention heads to obtain the final attention weight matrix.
Wherein, the calculation formula of the attention weight is as follows:
Figure SMS_9
in the formula ,
Figure SMS_10
is as followslThe first of the attention headsiAttention weights for the picture vector and the classification vector,Qin order to query the vector, the query vector,Kis a vector of the keys, and is,Vin the form of a vector of values,d k t is the mapping space dimension of the attention head and is the matrix transposition; wherein the attention weight matrixAExpressed as:
Figure SMS_11
wherein ,l∈1,2,…,Li∈1,2,…,KLrepresenting the number of heads of attention,Krepresenting the number of picture vectors.
Further, in some optional embodiments of the present invention, the multi-scale training module is further configured to calculate an average value of all attention weights in the final attention weight matrix; comparing each attention weight in the final attention weight matrix with the average value, wherein the attention weight larger than the average value is marked as a first threshold value, and otherwise, the attention weight is marked as a second threshold value; and determining the position of the classification target according to the position coding information of the target picture vector with the attention weight of the classification vector as a first threshold.
Further, in some optional embodiments of the present invention, the loss calculating module 13 is further configured to perform cross entropy loss calculation on the classification label of each target self-attention layer and a preset real label, respectively, to obtain a loss value of each target self-attention layer;
wherein the formula of the cross entropy loss calculation is as follows:
Figure SMS_12
in the formula ,y r is as followsrThe individual objects are from the category label of the attention layer,yin order to preset the real tag,LOSS CE (y r ,y) Is as followsrThe cross entropy loss value of the classification label of each target self-attention layer and a preset real label, wherein the preset number is 3,r∈1,2,3。
the present invention also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the fine-grained image recognition classification model training method as described above.
The invention further provides a fine-grained image recognition classification model training device, which comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to implement the fine-grained image recognition classification model training method.
The fine-grained image recognition classification model training equipment can be a computer, a server, a camera device and the like. The processor may be, in some embodiments, a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data processing chip for executing program code stored in memory or processing data, such as executing access restriction programs.
Wherein the memory includes at least one type of readable storage medium including flash memory, hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory may be an internal storage unit of the fine-grained image recognition classification model training apparatus in some embodiments, for example, a hard disk of the fine-grained image recognition classification model training apparatus. The memory may also be an external storage device of the fine-grained image recognition and classification model training device in other embodiments, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard), and the like, which are equipped on the fine-grained image recognition and classification model training device. Further, the memory may also include both an internal storage unit and an external storage device of the fine-grained image recognition classification model training apparatus. The memory can be used for storing application software installed in the fine-grained image recognition classification model training equipment and various types of data, and can also be used for temporarily storing data which is output or is to be output.
Those of skill in the art will understand that the logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be viewed as implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for training a fine-grained image recognition classification model is characterized by comprising the following steps:
obtaining a fine-grained image for model training, and inputting the fine-grained image into a preset network model for training, wherein the preset network model comprises a plurality of self-attention layers, and the fine-grained image sequentially passes through each self-attention layer so as to perform classification vector learning on the fine-grained image through the self-attention layers;
acquiring classification vectors obtained by learning the fine-grained images by a preset number of target self-attention layers, wherein the target self-attention layers are positioned at the rear ends of the multiple self-attention layers;
inputting the classification vector of each target self-attention layer into a preset classifier, outputting a classification label of each target self-attention layer, and performing loss calculation on the classification label of each target self-attention layer and a preset real label to obtain a loss value of each target self-attention layer;
and updating network parameters through a back propagation mechanism according to the loss value of each target self-attention layer so as to train the fine-grained image recognition classification model.
2. The fine-grained image recognition classification model training method according to claim 1, further comprising:
calculating a final attention weight matrix of the self-attention layer after classification vector learning is carried out on the fine-grained image according to a preset calculation rule;
determining the position of a classification target according to the final self-attention weight matrix, and intercepting a classification target area image from the fine-grained image according to the position of the classification target;
and scaling the classified target area image to be the same as the fine-grained image in size, and inputting the image into the preset network model for training so as to intensively train the fine-grained image recognition classification model.
3. The fine-grained image recognition classification model training method according to claim 2, wherein the preset network model further comprises a linear projection layer and a position coding layer, and the step of inputting the fine-grained image into the preset network model for training comprises:
dividing the fine-grained image into preset sub-images according to a preset division rule, and mapping each sub-image to a high-dimensional feature space through the linear projection layer to obtain a picture vector of each sub-image;
coding the picture vectors of each sub-picture through the position coding layer to add position coding information to each picture vector, and adding an empty classification vector in front of the first picture vector to obtain a vector sequence;
and inputting the vector sequence into the multi-layer self-attention layer for classification vector learning, wherein the classification features learned by each layer of self-attention layer are updated in the classification vectors of the vector sequence to obtain the classification vectors of each layer of self-attention layer.
4. The fine-grained image recognition classification model training method according to claim 3, wherein the self-attention layer comprises a plurality of attention heads, and the step of calculating a final attention weight matrix of the self-attention layer after classification vector learning on the fine-grained image according to a preset calculation rule comprises the steps of:
after the classification vector learning is carried out on the fine-grained image, in each attention head, the attention weight of the classification vector and each picture vector in the current layer is respectively calculated, and an attention weight matrix corresponding to each attention head is obtained;
and performing dot product calculation on the attention weight matrixes of all the attention heads to obtain the final attention weight matrix.
5. The fine-grained image recognition classification model training method according to claim 4, wherein the calculation formula of the attention weight is as follows:
Figure QLYQS_1
in the formula ,
Figure QLYQS_2
is as followslThe first of the attention headsiAttention weights for the picture vector and the classification vector,Qin order to query the vector, the query vector,Kis a vector of the keys, and is,Vin the form of a vector of values,d k to take care of the mapping space dimensions of the head of attention,Ttransposing the matrix; wherein the attention weight matrixAExpressed as:
Figure QLYQS_3
wherein ,l∈1,2,…,L i∈1,2,…,KLrepresenting the number of heads of attention,Krepresenting the number of picture vectors.
6. The fine-grained image recognition classification model training method according to claim 3, wherein the step of determining the position of the classification target according to the final self-attention weight matrix comprises the steps of:
calculating an average value of all attention weights in the final attention weight matrix;
comparing each attention weight in the final attention weight matrix to the average value, with attention weights greater than the average value being flagged as a first threshold and otherwise as a second threshold;
and determining the position of the classification target according to the position coding information of the target picture vector with the attention weight of the classification vector as a first threshold.
7. The fine-grained image recognition classification model training method according to claim 1, wherein the step of performing loss calculation on the classification label of each target self-attention layer and a preset real label to obtain a loss value of each target self-attention layer comprises the steps of:
respectively carrying out cross entropy loss calculation on the classification label of each target self-attention layer and a preset real label to obtain a loss value of each target self-attention layer;
wherein the formula of the cross entropy loss calculation is as follows:
Figure QLYQS_4
in the formula ,y r is as followsrThe individual objects are from the category label of the attention layer,yin order to preset the real tag,LOSS CE (y r ,y) Is as followsrThe cross entropy loss value of the classification label of each target self-attention layer and a preset real label, wherein the preset number is 3,r∈1,2,3。
8. a system for training a fine-grained image recognition classification model, the system comprising:
the image acquisition module is used for acquiring a fine-grained image for model training and inputting the fine-grained image into a preset network model for training, wherein the preset network model comprises a plurality of self-attention layers, and the fine-grained image sequentially passes through each self-attention layer so as to perform classification vector learning on the fine-grained image through the self-attention layers;
a vector acquisition module, configured to acquire classification vectors obtained by learning the fine-grained images through a preset number of target self-attention layers, where the target self-attention layers are located at the rear end of the multiple self-attention layers;
the loss calculation module is used for inputting the classification vector of each target self-attention layer into a preset classifier, outputting the classification label of each target self-attention layer, and performing loss calculation on the classification label of each target self-attention layer and a preset real label to obtain a loss value of each target self-attention layer;
and the progressive training module is used for updating network parameters through a back propagation mechanism respectively according to the loss value of each target self-attention layer so as to train the fine-grained image recognition classification model.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a fine-grained image recognition classification model training method according to any one of claims 1 to 7.
10. A fine-grained image recognition classification model training apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the fine-grained image recognition classification model training method according to any one of claims 1 to 7 when executing the program.
CN202310140142.7A 2023-02-21 2023-02-21 Fine-granularity image recognition classification model training method, device and equipment Active CN115830402B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310140142.7A CN115830402B (en) 2023-02-21 2023-02-21 Fine-granularity image recognition classification model training method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310140142.7A CN115830402B (en) 2023-02-21 2023-02-21 Fine-granularity image recognition classification model training method, device and equipment

Publications (2)

Publication Number Publication Date
CN115830402A true CN115830402A (en) 2023-03-21
CN115830402B CN115830402B (en) 2023-09-12

Family

ID=85521972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310140142.7A Active CN115830402B (en) 2023-02-21 2023-02-21 Fine-granularity image recognition classification model training method, device and equipment

Country Status (1)

Country Link
CN (1) CN115830402B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071939A (en) * 2023-03-24 2023-05-05 华东交通大学 Traffic signal control model building method and control method
CN116109629A (en) * 2023-04-10 2023-05-12 厦门微图软件科技有限公司 Defect classification method based on fine granularity recognition and attention mechanism
CN116608866A (en) * 2023-07-20 2023-08-18 华南理工大学 Picture navigation method, device and medium based on multi-scale fine granularity feature fusion
CN117326557A (en) * 2023-09-28 2024-01-02 连云港市沃鑫高新材料有限公司 Preparation method of silicon carbide high-purity micro powder for reaction sintering ceramic structural part

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325205A (en) * 2020-03-02 2020-06-23 北京三快在线科技有限公司 Document image direction recognition method and device and model training method and device
CN112487229A (en) * 2020-11-27 2021-03-12 北京邮电大学 Fine-grained image classification method and system and prediction model training method
CN113761936A (en) * 2021-08-19 2021-12-07 哈尔滨工业大学(威海) Multi-task chapter-level event extraction method based on multi-head self-attention mechanism
CN114049303A (en) * 2021-10-12 2022-02-15 杭州电子科技大学 Progressive bone age assessment method based on multi-granularity feature fusion
CN114119979A (en) * 2021-12-06 2022-03-01 西安电子科技大学 Fine-grained image classification method based on segmentation mask and self-attention neural network
CN114332544A (en) * 2022-03-14 2022-04-12 之江实验室 Image block scoring-based fine-grained image classification method and device
CN114386582A (en) * 2022-01-17 2022-04-22 大连理工大学 Human body action prediction method based on confrontation training attention mechanism
CN114549970A (en) * 2022-01-13 2022-05-27 山东师范大学 Night small target fruit detection method and system fusing global fine-grained information
CN114565752A (en) * 2022-02-10 2022-05-31 北京交通大学 Image weak supervision target detection method based on class-agnostic foreground mining
CN114564953A (en) * 2022-02-28 2022-05-31 中山大学 Emotion target extraction model based on multiple word embedding fusion and attention mechanism
CN114580510A (en) * 2022-02-23 2022-06-03 华南理工大学 Bone marrow cell fine-grained classification method, system, computer device and storage medium
CN114821146A (en) * 2021-01-27 2022-07-29 四川大学 Enhanced weak supervision-based fine-grained Alzheimer's disease classification method
CN115035389A (en) * 2022-08-10 2022-09-09 华东交通大学 Fine-grained image identification method and device based on reliability evaluation and iterative learning
CN115034496A (en) * 2022-06-27 2022-09-09 北京交通大学 Urban rail transit holiday short-term passenger flow prediction method based on GCN-Transformer
CN115204241A (en) * 2022-08-16 2022-10-18 北京航空航天大学 Deep learning tiny fault diagnosis method and system considering fault time positioning
CN115294265A (en) * 2022-06-27 2022-11-04 北京大学深圳研究生院 Method and system for reconstructing three-dimensional human body grid by utilizing two-dimensional human body posture based on attention of graph skeleton

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325205A (en) * 2020-03-02 2020-06-23 北京三快在线科技有限公司 Document image direction recognition method and device and model training method and device
CN112487229A (en) * 2020-11-27 2021-03-12 北京邮电大学 Fine-grained image classification method and system and prediction model training method
CN114821146A (en) * 2021-01-27 2022-07-29 四川大学 Enhanced weak supervision-based fine-grained Alzheimer's disease classification method
CN113761936A (en) * 2021-08-19 2021-12-07 哈尔滨工业大学(威海) Multi-task chapter-level event extraction method based on multi-head self-attention mechanism
CN114049303A (en) * 2021-10-12 2022-02-15 杭州电子科技大学 Progressive bone age assessment method based on multi-granularity feature fusion
CN114119979A (en) * 2021-12-06 2022-03-01 西安电子科技大学 Fine-grained image classification method based on segmentation mask and self-attention neural network
CN114549970A (en) * 2022-01-13 2022-05-27 山东师范大学 Night small target fruit detection method and system fusing global fine-grained information
CN114386582A (en) * 2022-01-17 2022-04-22 大连理工大学 Human body action prediction method based on confrontation training attention mechanism
CN114565752A (en) * 2022-02-10 2022-05-31 北京交通大学 Image weak supervision target detection method based on class-agnostic foreground mining
CN114580510A (en) * 2022-02-23 2022-06-03 华南理工大学 Bone marrow cell fine-grained classification method, system, computer device and storage medium
CN114564953A (en) * 2022-02-28 2022-05-31 中山大学 Emotion target extraction model based on multiple word embedding fusion and attention mechanism
CN114332544A (en) * 2022-03-14 2022-04-12 之江实验室 Image block scoring-based fine-grained image classification method and device
CN115034496A (en) * 2022-06-27 2022-09-09 北京交通大学 Urban rail transit holiday short-term passenger flow prediction method based on GCN-Transformer
CN115294265A (en) * 2022-06-27 2022-11-04 北京大学深圳研究生院 Method and system for reconstructing three-dimensional human body grid by utilizing two-dimensional human body posture based on attention of graph skeleton
CN115035389A (en) * 2022-08-10 2022-09-09 华东交通大学 Fine-grained image identification method and device based on reliability evaluation and iterative learning
CN115204241A (en) * 2022-08-16 2022-10-18 北京航空航天大学 Deep learning tiny fault diagnosis method and system considering fault time positioning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SUPRAGYA SONKAR ET AL: "Multi-Head Attention on Image Captioning Model with Bert Embedding", 《2021 INTERNATIONAL CONFERENCE ON COMMUNICATION, CONTROL AND INFORMATION SCIENCES (ICCISC)》, pages 124 - 130 *
朱晨光著 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071939A (en) * 2023-03-24 2023-05-05 华东交通大学 Traffic signal control model building method and control method
CN116109629A (en) * 2023-04-10 2023-05-12 厦门微图软件科技有限公司 Defect classification method based on fine granularity recognition and attention mechanism
CN116109629B (en) * 2023-04-10 2023-07-25 厦门微图软件科技有限公司 Defect classification method based on fine granularity recognition and attention mechanism
CN116608866A (en) * 2023-07-20 2023-08-18 华南理工大学 Picture navigation method, device and medium based on multi-scale fine granularity feature fusion
CN116608866B (en) * 2023-07-20 2023-09-26 华南理工大学 Picture navigation method, device and medium based on multi-scale fine granularity feature fusion
CN117326557A (en) * 2023-09-28 2024-01-02 连云港市沃鑫高新材料有限公司 Preparation method of silicon carbide high-purity micro powder for reaction sintering ceramic structural part

Also Published As

Publication number Publication date
CN115830402B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
CN115830402A (en) Fine-grained image recognition classification model training method, device and equipment
Huang et al. Point cloud labeling using 3d convolutional neural network
US10410353B2 (en) Multi-label semantic boundary detection system
US8340363B2 (en) System and method for efficient interpretation of images in terms of objects and their parts
Dai et al. Learning to localize detected objects
CN113822209B (en) Hyperspectral image recognition method and device, electronic equipment and readable storage medium
CN114332544B (en) Image block scoring-based fine-grained image classification method and device
CN109740479A (en) A kind of vehicle recognition methods, device, equipment and readable storage medium storing program for executing again
CN110210625A (en) Modeling method, device, computer equipment and storage medium based on transfer learning
CN108491848A (en) Image significance detection method based on depth information and device
CN112801236B (en) Image recognition model migration method, device, equipment and storage medium
CN112052845A (en) Image recognition method, device, equipment and storage medium
CN109522970A (en) Image classification method, apparatus and system
CN110222772B (en) Medical image annotation recommendation method based on block-level active learning
CN117893839B (en) Multi-label classification method and system based on graph attention mechanism
CN109271842A (en) A kind of generic object detection method, system, terminal and storage medium returned based on key point
CN113706551A (en) Image segmentation method, device, equipment and storage medium
CN108509828A (en) A kind of face identification method and face identification device
CN111144466A (en) Image sample self-adaptive depth measurement learning method
CN113780066B (en) Pedestrian re-recognition method and device, electronic equipment and readable storage medium
Liu et al. L2-LiteSeg: A Real-Time Semantic Segmentation Method for End-to-End Autonomous Driving
Rubino et al. Semantic multi-body motion segmentation
CN114005005B (en) Double-batch standardized zero-instance image classification method
Corso Discriminative modeling by boosting on multilevel aggregates
CN117152546B (en) Remote sensing scene classification method, system, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant