CN113033603A - Weak supervision image multi-label classification method based on meta-learning - Google Patents

Weak supervision image multi-label classification method based on meta-learning Download PDF

Info

Publication number
CN113033603A
CN113033603A CN202110162956.1A CN202110162956A CN113033603A CN 113033603 A CN113033603 A CN 113033603A CN 202110162956 A CN202110162956 A CN 202110162956A CN 113033603 A CN113033603 A CN 113033603A
Authority
CN
China
Prior art keywords
label
image
meta
loss
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110162956.1A
Other languages
Chinese (zh)
Other versions
CN113033603B (en
Inventor
陈刚
陈珂
董合德
寿黎但
骆歆远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110162956.1A priority Critical patent/CN113033603B/en
Publication of CN113033603A publication Critical patent/CN113033603A/en
Application granted granted Critical
Publication of CN113033603B publication Critical patent/CN113033603B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a weak supervision image multi-label classification method based on meta-learning, and belongs to the technical field of image processing. The method is used for solving the problem that the label dependency relationship cannot be effectively modeled due to label loss, and provides an image multi-label classification model based on label information enhancement. In response to the phenomenon of model overfitting caused by insufficient monitoring information in a weak monitoring environment, the invention provides a teacher-student network architecture training method based on meta learning, and the accuracy of image annotation is further improved.

Description

Weak supervision image multi-label classification method based on meta-learning
Technical Field
The invention belongs to the technical field of image processing, relates to an image multi-label classification method, and particularly relates to a weak supervision image multi-label classification method based on meta-learning.
Background
An image is a record of a real scene that tends to contain rich and complex semantic concepts. How to quickly and accurately identify a plurality of different semantic concepts contained in an image is a target of an image multi-label classification task. In addition, the image multi-label technology is widely applied to the fields of target detection, robot assistance technology, automatic driving and the like. For example, given FIG. 1 as an input, we need to identify two semantic concepts "airplane", "sky" contained in the image.
At present, the deep learning method makes remarkable progress in the image multi-label classification task. However, the deep learning network needs a large amount of completely labeled supervision data, and the acquisition of completely labeled supervision data is time-consuming and expensive due to the problems that the image multi-label classification task may have complex semantic concepts contained in the image, the image is not easily distinguished, the predefined label set is too large, the concepts between the labels overlap and the like. In order to solve the problem, a weak supervision image multi-label classification task is provided. The task of multi-label classification of weakly supervised images refers to the construction of well-behaved image multi-table prediction models in weakly supervised environment. The weak supervision environment means that a data set provides partial labels, even partial label-free data set, namely a training set of the weak supervision environment is composed of labeled data, partial label missing data and label-free data. The existing mainstream method for multi-label classification of the weakly supervised image relies on prior knowledge to construct a graph network, the graph network needs to contain data of all training sets, and the problem of construction performance of the graph network is verified when the training data scale is increased, so that the expansibility of the scheme is poor. Therefore, how to improve the generalization capability and the universality of the model under limited supervision information and better model the tag dependency relationship is a main challenge of the task of multi-tag classification of the weakly supervised image.
Disclosure of Invention
In order to solve the problem that the performance of the existing weak supervision image multi-label classification method is insufficient in a large-scale data set, the invention provides a weak supervision image multi-label classification method based on meta-learning. Firstly, the invention provides a label information enhancement-based deep learning model suitable for a weak supervision scene, which predicts labels in a label sequence in sequence according to a predefined label sequence so as to obtain image-related labels. The traditional label sequence only contains an image correlation sequence, so that in a weak supervision scene, a missing label is regarded as an irrelevant label, and a model is misled. Therefore, irrelevant tags are introduced into the designed tag sequence, so that the co-occurrence and mutual exclusion relation among the tags can be explicitly modeled, and the problem of insufficient tag sequence information caused by the loss of the relevant tags is solved. Then, for the problem that the performance of the model is low due to the fact that the training data are incomplete in the weak supervision scene, the invention provides a teacher-network training framework based on meta-learning, a teacher network with better robustness is built through an exponential moving average algorithm, extra supervision information is provided for non-label data in the training process, in addition, a label covering mechanism is adopted to build meta-tasks according to a model-independent deep meta-learning paradigm, the model learns more different tasks under the limited supervision information, and the generalization of the model is improved.
The technical scheme adopted by the invention is as follows: a weak supervision image multi-label classification method based on meta-learning is realized on a weak supervision image multi-label classification system, wherein the weak supervision image multi-label classification system comprises an image multi-label classification network based on label information enhancement and a teacher-student network training framework based on meta-learning; the multi-label classification network comprises an encoding layer and a decoding layer; the encoder receives an image as input, and a ResNet-152 pre-training model is adopted to obtain a low-dimensional feature matrix and a high-dimensional feature vector of the image; the decoder is an LSTM sequence decoding structure and is used for generating a label labeling sequence; the meta learning based teacher-student network architecture includes a teacher model and a student model. The weak supervision image multi-label classification method comprises the following steps:
(1) the image is input to an encoder, and the outputs of the last and last-but-one layers in the encoder network are selected as the low-dimensional feature matrix and the high-dimensional feature vector of the image.
(2) And taking the coded high-dimensional feature vector as an abstract representation of the image, taking the abstract representation as an initial input of a decoder, and predicting whether a first label in a predefined label sequence is related or not.
(3) And predicting whether the current label in the label sequence is relevant according to the prediction information of the previous label in the label sequence as the current input of the decoder.
(4) And (4) repeating the step (3) until all the labels in the label sequence are predicted.
(5) And comparing the obtained label sequence with a correct label sequence, calculating a loss value by adopting a teacher-student network architecture based on meta learning, and minimizing the loss value by an optimization method to finally obtain the trained multi-label classification network for image annotation.
(6) And acquiring an image to be detected, inputting the image to be detected into a trained multi-label classification network for image labeling, and acquiring a labeling result of the image.
Further: the step (3) comprises the following substeps:
(3.1) assuming that the t label in the current predicted label sequence has a predicted probability value of the t-1 label
Figure BDA0002936224210000021
Obtaining a corresponding label vector representation value e according to the prediction probability valuet-1
Figure BDA0002936224210000022
Where tau denotes a threshold-value-over-parameter,
Figure BDA0002936224210000023
and
Figure BDA0002936224210000024
the trainable characterization vectors represent the relevant and irrelevant correspondences of the t-1 th label, respectively.
(3.2) low-dimensional feature V obtained by the encoderfeatAnd the t-1 hidden state h of the decodert-1To carry outInteracting to obtain filtered independent feature image representation zt
Figure BDA0002936224210000025
αi,t=fatt(vi,ht-1)
Figure BDA0002936224210000031
Wherein f isattRepresenting an Attention network.
(3.3) characterizing the tag by a characteristic et-1And an image characterizing feature ztStitching to obtain an input representation x for a current decodertThen the corresponding t-th hidden state h is obtained by the decodertH is to betInputting the label into the t-th label classification layer to obtain a corresponding label predicted value
Figure BDA0002936224210000032
Figure BDA0002936224210000033
ht=fLSTM(xt,ht-1,ct-1)
Figure BDA0002936224210000034
Figure BDA0002936224210000035
Wherein f isLSTMDenotes an LSTM cell, WtAnd btRepresenting trainable parameters of the t-th label classification level.
Further: the step (5) comprises the following substeps:
(5.1) constructing two deep learning models based on tag information enhancement and respectively using the two deep learning models as student models fstu(I;. theta.) and teacher model
Figure BDA0002936224210000036
The initial model parameters are respectively theta0And
Figure BDA0002936224210000037
(5.2) for student model parameter θ0And carrying out random initialization, then updating according to the gradient, and updating the model parameters of the teacher by adopting an exponential moving average algorithm.
Figure BDA0002936224210000038
Wherein theta istRepresenting the parameters of the student model in the tth training iteration,
Figure BDA0002936224210000039
and
Figure BDA00029362242100000310
and respectively representing parameters of the teacher model in the t-th training iteration and the t-1 st training iteration, wherein beta is more than 0 and represents the weighting weight of the hyperparameter. In addition, the
Figure BDA00029362242100000311
And (5.3) randomly extracting a small batch data set from the weak supervision training data set.
(5.4) Small batch data
Figure BDA00029362242100000312
The image I in (1) is input into the student model fstuObtaining corresponding predicted value in (I; theta)
Figure BDA00029362242100000313
Calculating corresponding supervision loss by adopting two-classification cross entropy
Figure BDA00029362242100000314
Figure BDA00029362242100000315
Figure BDA00029362242100000316
(5.5) data of small batches
Figure BDA00029362242100000317
The image I in (1) is input into the teacher model
Figure BDA00029362242100000318
To obtain the corresponding predicted value
Figure BDA00029362242100000319
Calculating corresponding consistency loss by using mean square error
Figure BDA00029362242100000320
Figure BDA0002936224210000041
(5.6) loss supervision will be carried out
Figure BDA0002936224210000042
And loss of consistency
Figure BDA0002936224210000043
Adding as a final supervision loss
Figure BDA0002936224210000044
Figure BDA0002936224210000045
Wherein alpha is3Is a hyper-parametric trade-off weight.
(5.7) loss according to final supervision
Figure BDA0002936224210000046
Calculating the gradient of the student model parameter theta
Figure BDA0002936224210000047
Figure BDA0002936224210000048
(5.8) from small batch data sets
Figure BDA0002936224210000049
Extract eta data to put
Figure BDA00029362242100000410
Putting the remaining data into a meta-test data set
Figure BDA00029362242100000411
(5.9) for
Figure BDA00029362242100000412
Will be provided with
Figure BDA00029362242100000413
The image I in the image processing system is turned, cut and denoised according to a certain probability lambda to obtain
Figure BDA00029362242100000414
Will be provided with
Figure BDA00029362242100000415
The image y in (1) is obtained according to the labels of random masking probability rho quantity
Figure BDA00029362242100000416
Finally obtaining a meta-training data set
Figure BDA00029362242100000417
(5.10) training the meta data
Figure BDA00029362242100000418
The image I in (1) is input into the student model fstuObtaining corresponding predicted value in (I; theta)
Figure BDA00029362242100000419
Calculating corresponding supervision loss by adopting two-classification cross entropy
Figure BDA00029362242100000420
(5.11) training the meta data
Figure BDA00029362242100000421
The image I in (1) is input into the teacher model
Figure BDA00029362242100000422
To obtain the corresponding predicted value
Figure BDA00029362242100000423
Calculating corresponding consistency loss by using mean square error
Figure BDA00029362242100000424
(5.12) loss supervision will be carried out
Figure BDA00029362242100000425
And loss of consistency
Figure BDA00029362242100000426
Adding as final meta learning goal
Figure BDA00029362242100000427
Figure BDA00029362242100000428
Wherein alpha is1Is a hyper-parametric trade-off weight.
(5.13) training the data set in meta-mode according to the meta-learning objective
Figure BDA00029362242100000429
And calculating one-step gradient descent to obtain a meta-learning parameter theta'.
Figure BDA00029362242100000430
Where γ is the hyperparametric trade-off weight.
(5.14) meta-test data
Figure BDA00029362242100000431
The image I in (1) is input into the student model fstu(I; theta') to obtain corresponding predicted values
Figure BDA00029362242100000432
Calculating corresponding supervision loss by adopting two-classification cross entropy
Figure BDA00029362242100000433
(5.15) metadata test data
Figure BDA00029362242100000434
The image I in (1) is input into the teacher model
Figure BDA00029362242100000435
To obtain the corresponding predicted value
Figure BDA0002936224210000051
Calculating corresponding consistency loss by using mean square error
Figure BDA0002936224210000052
(5.16) loss supervision will be carried out
Figure BDA0002936224210000053
And loss of consistency
Figure BDA0002936224210000054
Adding as final meta-learning loss
Figure BDA0002936224210000055
Figure BDA0002936224210000056
Wherein alpha is1Is a hyper-parametric trade-off weight.
(5.17) testing the data set in meta-manner according to the meta-learning loss
Figure BDA0002936224210000057
Calculating gradients
Figure BDA0002936224210000058
And then with the gradient obtained in step (3)
Figure BDA0002936224210000059
Adding to obtain the final gradient
Figure BDA00029362242100000510
And updating the student model parameter theta.
Figure BDA00029362242100000511
Figure BDA00029362242100000512
Where α is the hyper-parametric learning rate.
Compared with the prior art, the invention has the following beneficial effects: the invention provides a brand-new teacher-student multi-label classification network architecture for weak supervision image multi-label classification based on meta learning. The multi-label classification network can effectively model the relation among labels in a weak supervision environment, improves the image labeling accuracy, can improve the generalization of the network in the weak supervision environment based on a teacher-student training architecture of meta learning, can be theoretically applied to a common image multi-label classification network, and has certain universality. Experiments show that the weak supervision image multi-label classification method can effectively improve the image labeling accuracy.
Drawings
FIG. 1 is a sample exemplary diagram of an image multi-label dataset;
FIG. 2 is a diagram of a teacher-student network architecture for employing the present invention in connection with meta-learning;
FIG. 3 is an exemplary diagram of image labeling using the image multi-label classification method according to the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention shall be described in further detail with reference to the following detailed description and accompanying drawings.
The invention provides a weak supervision image multi-label classification method based on meta-learning. The method mainly comprises two parts, as shown in figure 1, and is a teacher-student network training architecture based on meta learning provided by the invention; as shown in fig. 2, the image multi-label classification network enhanced based on label information is proposed in the present invention. Firstly, training the image multi-label classification network based on label information enhancement through the teacher-student network training architecture based on meta learning to obtain a trained image multi-label classification network; and then labeling the image to be detected by using the trained image multi-label classification network. The weak supervision image multi-label classification method based on meta-learning comprises the following steps:
(1) the image is input to an encoder, and the outputs of the last and last-but-one layers in the encoder network are selected as the low-dimensional feature matrix and the high-dimensional feature vector of the image.
(2) And taking the coded high-dimensional feature vector as an abstract representation of the image, taking the abstract representation as an initial input of a decoder, and predicting whether a first label in a predefined label sequence is related or not.
(3) And predicting whether the current label in the label sequence is relevant according to the prediction information of the previous label in the label sequence as the current input of the decoder. The method specifically comprises the following substeps:
(3.1) assuming that the t label in the current predicted label sequence has a predicted probability value of the t-1 label
Figure BDA0002936224210000061
Obtaining a corresponding label vector representation value e according to the prediction probability valuet-1
Figure BDA0002936224210000062
Where tau denotes a threshold-value-over-parameter,
Figure BDA0002936224210000063
and
Figure BDA0002936224210000064
the trainable characterization vectors represent the relevant and irrelevant correspondences of the t-1 th label, respectively.
(3.2) low-dimensional feature V obtained by the encoderfeatAnd the t-1 hidden state h of the decodert-1Interacting to obtain image representation z after filtering irrelevant featurest
Figure BDA0002936224210000065
αi,t=fatt(vi,ht-1)
Figure BDA0002936224210000066
Wherein f isattRepresenting an Attention network.
(3.3) characterizing the tag by a characteristic et-1And an image characterizing feature ztStitching to obtain an input representation x for a current decodertThen the corresponding t-th hidden state h is obtained by the decodertH is to betInputting the label into the t-th label classification layer to obtain a corresponding label predicted value
Figure BDA0002936224210000067
Figure BDA0002936224210000068
ht=fLSTM(xt,ht-1,ct-1)
Figure BDA0002936224210000069
Figure BDA00029362242100000610
Wherein f isLSTMDenotes an LSTM cell, WtAnd btTrainable parameters representing the t-th label classification level,
Figure BDA00029362242100000611
representing the concatenation of two vectors.
(4) And (4) repeating the step (3) until all the labels in the label sequence are predicted.
(5) And comparing the obtained label sequence with a correct label sequence, calculating a loss value by adopting a teacher-student network architecture based on meta learning, and minimizing the loss value by an optimization method to finally obtain the trained multi-label classification network for image annotation. The method specifically comprises the following substeps:
(5.1) constructing two deep learning models based on tag information enhancement and respectively using the two deep learning models as student models fstu(I;. theta.) and teacher model
Figure BDA0002936224210000071
The initial model parameters are respectively theta0And
Figure BDA0002936224210000072
(5.2) for student model parameter θ0And carrying out random initialization, then updating according to the gradient, and updating the model parameters of the teacher by adopting an exponential moving average algorithm.
Figure BDA0002936224210000073
Wherein theta istRepresenting the parameters of the student model in the tth training iteration,
Figure BDA0002936224210000074
and
Figure BDA0002936224210000075
and respectively representing parameters of the teacher model in the t-th training iteration and the t-1 st training iteration, wherein beta is more than 0 and represents the weighting weight of the hyperparameter. In addition, the
Figure BDA0002936224210000076
And (5.3) randomly extracting a small batch data set from the weak supervision training data set.
(5.4) Small batch data
Figure BDA0002936224210000077
The image I in (1) is input into the student model fstuObtaining corresponding predicted value in (I; theta)
Figure BDA0002936224210000078
Calculating corresponding supervision loss by adopting two-classification cross entropy
Figure BDA0002936224210000079
Figure BDA00029362242100000710
Figure BDA00029362242100000711
(5.5) data of small batches
Figure BDA00029362242100000712
The image I in (1) is input into the teacher model
Figure BDA00029362242100000713
To obtain the corresponding predicted value
Figure BDA00029362242100000714
Calculating corresponding consistency loss by using mean square error
Figure BDA00029362242100000715
Figure BDA00029362242100000716
(5.6) loss supervision will be carried out
Figure BDA00029362242100000717
And loss of consistency
Figure BDA00029362242100000718
Adding as a final supervision loss
Figure BDA00029362242100000719
Figure BDA00029362242100000720
Wherein alpha is3Is a hyper-parametric trade-off weight.
(5.7) loss according to final supervisionMedicine for treating chronic hepatitis B
Figure BDA00029362242100000721
Calculating the gradient of the student model parameter theta
Figure BDA00029362242100000722
Figure BDA00029362242100000723
(5.8) from small batch data sets
Figure BDA00029362242100000724
Extract eta data to put
Figure BDA00029362242100000725
Putting the remaining data into a meta-test data set
Figure BDA00029362242100000726
(5.9) for
Figure BDA00029362242100000727
Will be provided with
Figure BDA00029362242100000728
The image I in the image processing system is turned, cut and denoised according to a certain probability lambda to obtain
Figure BDA00029362242100000729
Will be provided with
Figure BDA00029362242100000730
The image y in (1) is obtained according to the labels of random masking probability rho quantity
Figure BDA00029362242100000731
Finally obtaining a meta-training data set
Figure BDA0002936224210000081
(5.10) training the meta data
Figure BDA0002936224210000082
The image I in (1) is input into the student model fstuObtaining corresponding predicted value in (I; theta)
Figure BDA0002936224210000083
Calculating corresponding supervision loss by adopting two-classification cross entropy
Figure BDA0002936224210000084
(5.11) training the meta data
Figure BDA0002936224210000085
The image I in (1) is input into the teacher model
Figure BDA0002936224210000086
To obtain the corresponding predicted value
Figure BDA0002936224210000087
Calculating corresponding consistency loss by using mean square error
Figure BDA0002936224210000088
(5.12) loss supervision will be carried out
Figure BDA0002936224210000089
And loss of consistency
Figure BDA00029362242100000810
Adding as final meta learning goal
Figure BDA00029362242100000811
Figure BDA00029362242100000812
Wherein alpha is1Is a hyper-parametric trade-off weight.
(5.13) training the data set in meta-mode according to the meta-learning objective
Figure BDA00029362242100000813
And calculating one-step gradient descent to obtain a meta-learning parameter theta'.
Figure BDA00029362242100000814
Where γ is the hyperparametric trade-off weight.
(5.14) meta-test data
Figure BDA00029362242100000815
The image I in (1) is input into the student model fstu(I; theta') to obtain corresponding predicted values
Figure BDA00029362242100000816
Calculating corresponding supervision loss by adopting two-classification cross entropy
Figure BDA00029362242100000817
(5.15) metadata test data
Figure BDA00029362242100000818
The image I in (1) is input into the teacher model
Figure BDA00029362242100000819
To obtain the corresponding predicted value
Figure BDA00029362242100000820
Calculating corresponding consistency loss by using mean square error
Figure BDA00029362242100000821
(5.16) loss supervision will be carried out
Figure BDA00029362242100000822
And loss of consistency
Figure BDA00029362242100000823
Adding as final meta-learning loss
Figure BDA00029362242100000824
Figure BDA00029362242100000825
Wherein alpha is1Is a hyper-parametric trade-off weight.
(5.17) testing the data set in meta-manner according to the meta-learning loss
Figure BDA00029362242100000826
Calculating gradients
Figure BDA00029362242100000827
And then with the gradient obtained in step (3)
Figure BDA00029362242100000828
Adding to obtain the final gradient
Figure BDA00029362242100000829
And updating the student model parameter theta.
Figure BDA00029362242100000830
Figure BDA00029362242100000831
Where α is the hyper-parametric learning rate.
(6) And acquiring an image to be detected, inputting the image to be detected into a trained multi-label classification network for image labeling, and acquiring a labeling result of the image.
Examples
The following takes fig. 1 as an example, and refers to fig. 3, which is a supplementary description of the image multi-label classification process. Assuming that the predefined sequence of tags is in order { airplane, train, sky }, the set of related tags of fig. 1 is { airplane, sky }.
(1) Based on the weakly supervised data set, a teacher-student network training architecture based on meta learning is adopted to train an image multi-label classification network based on label information enhancement, and the network comprises an encoder and a decoder. In the above example, the input image is shown in fig. 1, the output tag sequence is { airplane-related, train-unrelated, sky-related }, and the tag sequence to be labeled by the network is { airplane, train, sky }.
(2) The image is input to an encoder, and the outputs of the last and last-but-one layers in the encoder network are selected as the low-dimensional feature matrix and the high-dimensional feature vector of the image.
(4) And taking the coded high-dimensional feature vector as an abstract representation of the image, taking the abstract representation as an initial input of a decoder, and predicting whether a first label { airplane } in a predefined label sequence is related or not, wherein the prediction result is assumed to be { airplane related }.
(5) And predicting whether the current label { train } in the label sequence is related according to the prediction information { airplane related } of the previous label in the label sequence as the current input of a decoder, and assuming that the prediction result is { train independent }.
(6) And predicting whether the current label { sky } in the label sequence is related according to the prediction information { train-independent } of the previous label in the label sequence as the current input of a decoder, and assuming that the prediction result is { sky-related }.
(7) All the labels in the label sequence are labeled, the model is ended to predict, and finally the predicted label sequence { airplane correlation, train independence and sky correlation } is obtained, so that the related labels in the graph 1 are { airplane and sky }.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the principle and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (3)

1. A weak supervision image multi-label classification method based on meta-learning is characterized in that the method is implemented on a weak supervision image multi-label classification system, and the weak supervision image multi-label classification system comprises an image multi-label classification network based on label information enhancement and a teacher-student network training framework based on meta-learning; the multi-label classification network comprises an encoding layer and a decoding layer; the encoder receives an image as input, and a ResNet-152 pre-training model is adopted to obtain a low-dimensional feature matrix and a high-dimensional feature vector of the image; the decoder is an LSTM sequence decoding structure and is used for generating a label labeling sequence; the meta learning based teacher-student network architecture includes a teacher model and a student model. The weak supervision image multi-label classification method comprises the following steps:
(1) the image is input to an encoder, and the outputs of the last and last-but-one layers in the encoder network are selected as the low-dimensional feature matrix and the high-dimensional feature vector of the image.
(2) And taking the coded high-dimensional feature vector as an abstract representation of the image, taking the abstract representation as an initial input of a decoder, and predicting whether a first label in a predefined label sequence is related or not.
(3) And predicting whether the current label in the label sequence is relevant according to the prediction information of the previous label in the label sequence as the current input of the decoder.
(4) And (4) repeating the step (3) until all the labels in the label sequence are predicted.
(5) And comparing the obtained label sequence with a correct label sequence, calculating a loss value by adopting a teacher-student network architecture based on meta learning, and minimizing the loss value by an optimization method to finally obtain the trained multi-label classification network for image annotation.
(6) And acquiring an image to be detected, inputting the image to be detected into a trained multi-label classification network for image labeling, and acquiring a labeling result of the image.
2. The weak supervision image multi-label classification method according to claim 1, characterized by: the step (3) comprises the following substeps:
(3.1) assuming that the t label in the current predicted label sequence has a predicted probability value of the t-1 label
Figure FDA0002936224200000011
Obtaining a corresponding label vector representation value e according to the prediction probability valuet-1
Figure FDA0002936224200000012
Where tau denotes a threshold-value-over-parameter,
Figure FDA0002936224200000013
and
Figure FDA0002936224200000014
the trainable characterization vectors represent the relevant and irrelevant correspondences of the t-1 th label, respectively.
(3.2) low-dimensional feature V obtained by the encoderfeatAnd the t-1 hidden state h of the decodert-1Interacting to obtain image representation z after filtering irrelevant featurest
Figure FDA0002936224200000015
αi,t=fatt(vi,ht-1)
Figure FDA0002936224200000021
Wherein f isattRepresenting an Attention network.
(3.3) characterizing the tag by a characteristic et-1And an image characterizing feature ztPerforming stitching to obtainInput characterization x of the current decodertThen the corresponding t-th hidden state h is obtained by the decodertH is to betInputting the label into the t-th label classification layer to obtain a corresponding label predicted value
Figure FDA0002936224200000022
Figure FDA0002936224200000023
ht=fLSTM(xt,ht-1,ct-1)
Figure FDA0002936224200000024
Figure FDA0002936224200000025
Wherein f isLSTMDenotes an LSTM cell, WtAnd btRepresenting trainable parameters of the t-th label classification level.
3. The weak supervision image multi-label classification method according to claim 1, characterized by: the step (5) comprises the following substeps:
(5.1) constructing two deep learning models based on tag information enhancement and respectively using the two deep learning models as student models fstu(I;. theta.) and teacher model
Figure FDA0002936224200000026
The initial model parameters are respectively theta0And
Figure FDA0002936224200000027
(5.2) for student model parameter θ0A random initialization is performed followed by an update according to the gradient, andand updating the model parameters of the teacher by adopting an exponential moving average algorithm.
Figure FDA0002936224200000028
Wherein theta istRepresenting the parameters of the student model in the tth training iteration,
Figure FDA0002936224200000029
and
Figure FDA00029362242000000210
represents the parameters of the teacher model in the t-th and t-1-th training iterations, beta>0 represents a hyperparametric weighting. In addition, the
Figure FDA00029362242000000211
And (5.3) randomly extracting a small batch data set from the weak supervision training data set.
(5.4) Small batch data
Figure FDA00029362242000000212
The image I in (1) is input into the student model fstuObtaining corresponding predicted value in (I; theta)
Figure FDA00029362242000000213
Calculating corresponding supervision loss by adopting two-classification cross entropy
Figure FDA00029362242000000214
Figure FDA00029362242000000215
Figure FDA00029362242000000216
(5.5) data of small batches
Figure FDA00029362242000000217
The image I in (1) is input into the teacher model
Figure FDA00029362242000000218
To obtain the corresponding predicted value
Figure FDA00029362242000000219
Calculating corresponding consistency loss by using mean square error
Figure FDA00029362242000000220
Figure FDA0002936224200000031
(5.6) loss supervision will be carried out
Figure FDA0002936224200000032
And loss of consistency
Figure FDA0002936224200000033
Adding as a final supervision loss
Figure FDA0002936224200000034
Figure FDA0002936224200000035
Wherein alpha is3Is a hyper-parametric trade-off weight.
(5.7) loss according to final supervision
Figure FDA0002936224200000036
Calculating the gradient of the student model parameter theta
Figure FDA0002936224200000037
Figure FDA0002936224200000038
(5.8) from small batch data sets
Figure FDA0002936224200000039
Extract eta data to put
Figure FDA00029362242000000310
Putting the remaining data into a meta-test data set
Figure FDA00029362242000000311
(5.9) for
Figure FDA00029362242000000312
Will be provided with
Figure FDA00029362242000000313
The image I in the image processing system is turned, cut and denoised according to a certain probability lambda to obtain
Figure FDA00029362242000000314
Will be provided with
Figure FDA00029362242000000315
The image y in (1) is obtained according to the labels of random masking probability rho quantity
Figure FDA00029362242000000316
Finally obtaining a meta-training data set
Figure FDA00029362242000000317
(5.10) training the meta data
Figure FDA00029362242000000318
The image I in (1) is input into the student model fstuObtaining corresponding predicted value in (I; theta)
Figure FDA00029362242000000319
Calculating corresponding supervision loss by adopting two-classification cross entropy
Figure FDA00029362242000000320
(5.11) training the meta data
Figure FDA00029362242000000321
The image I in (1) is input into the teacher model
Figure FDA00029362242000000322
To obtain the corresponding predicted value
Figure FDA00029362242000000323
Calculating corresponding consistency loss by using mean square error
Figure FDA00029362242000000324
(5.12) loss supervision will be carried out
Figure FDA00029362242000000325
And loss of consistency
Figure FDA00029362242000000326
Adding as final meta learning goal
Figure FDA00029362242000000327
Figure FDA00029362242000000328
Wherein alpha is1Is a hyper-parametric trade-off weight.
(5.13) training the data set in meta-mode according to the meta-learning objective
Figure FDA00029362242000000329
And calculating one-step gradient descent to obtain a meta-learning parameter theta'.
Figure FDA00029362242000000330
Where γ is the hyperparametric trade-off weight.
(5.14) meta-test data
Figure FDA00029362242000000331
The image I in (1) is input into the student model fstu(I; theta') to obtain corresponding predicted values
Figure FDA00029362242000000332
Calculating corresponding supervision loss by adopting two-classification cross entropy
Figure FDA00029362242000000333
(5.15) metadata test data
Figure FDA00029362242000000334
The image I in (1) is input into the teacher model
Figure FDA00029362242000000335
To obtain the corresponding predicted value
Figure FDA0002936224200000041
Calculating corresponding consistency loss by using mean square error
Figure FDA0002936224200000042
(5.16) loss supervision will be carried out
Figure FDA0002936224200000043
And loss of consistency
Figure FDA0002936224200000044
Adding as final meta-learning loss
Figure FDA0002936224200000045
Figure FDA0002936224200000046
Wherein alpha is1Is a hyper-parametric trade-off weight.
(5.17) testing the data set in meta-manner according to the meta-learning loss
Figure FDA0002936224200000047
Calculating gradients
Figure FDA0002936224200000048
And then with the gradient obtained in step (3)
Figure FDA0002936224200000049
Adding to obtain the final gradient
Figure FDA00029362242000000410
And updating the student model parameter theta.
Figure FDA00029362242000000411
Figure FDA00029362242000000412
Where α is the hyper-parametric learning rate.
CN202110162956.1A 2021-02-05 2021-02-05 Weak supervision image multi-label classification method based on meta-learning Active CN113033603B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110162956.1A CN113033603B (en) 2021-02-05 2021-02-05 Weak supervision image multi-label classification method based on meta-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110162956.1A CN113033603B (en) 2021-02-05 2021-02-05 Weak supervision image multi-label classification method based on meta-learning

Publications (2)

Publication Number Publication Date
CN113033603A true CN113033603A (en) 2021-06-25
CN113033603B CN113033603B (en) 2022-11-15

Family

ID=76460165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110162956.1A Active CN113033603B (en) 2021-02-05 2021-02-05 Weak supervision image multi-label classification method based on meta-learning

Country Status (1)

Country Link
CN (1) CN113033603B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657267A (en) * 2021-08-17 2021-11-16 中国科学院长春光学精密机械与物理研究所 Semi-supervised pedestrian re-identification model, method and device
CN113705646A (en) * 2021-08-18 2021-11-26 西安交通大学 Radio frequency fine characteristic information extraction method and system based on semi-supervised meta learning
CN113743499A (en) * 2021-09-02 2021-12-03 广东工业大学 Visual angle irrelevant feature dissociation method and system based on contrast learning
CN113947571A (en) * 2021-09-30 2022-01-18 北京百度网讯科技有限公司 Training method of vehicle damage detection model and vehicle damage identification method
CN114565972A (en) * 2022-02-23 2022-05-31 中国科学技术大学 Skeleton action recognition method, system, equipment and storage medium
CN114758172A (en) * 2022-04-07 2022-07-15 南京工业大学 Semi-supervised medical image classification method based on safety contrast self-integration framework
CN114821204A (en) * 2022-06-30 2022-07-29 山东建筑大学 Meta-learning-based embedded semi-supervised learning image classification method and system
CN115019218A (en) * 2022-08-08 2022-09-06 阿里巴巴(中国)有限公司 Image processing method and processor
WO2024016949A1 (en) * 2022-07-20 2024-01-25 马上消费金融股份有限公司 Label generation method and apparatus, image classification model method and apparatus, and image classification method and apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7336832B2 (en) * 2002-07-19 2008-02-26 Sony Corporation Processor and processing method for an image signal, image display apparatus, generation apparatus and generation method for coefficient data used therein, program for executing each of these methods, and computer-readable medium recording the program
CN109034219A (en) * 2018-07-12 2018-12-18 上海商汤智能科技有限公司 Multi-tag class prediction method and device, electronic equipment and the storage medium of image
CN110163344A (en) * 2019-04-26 2019-08-23 北京迈格威科技有限公司 Neural network training method, device, equipment and storage medium
CN111126514A (en) * 2020-03-30 2020-05-08 同盾控股有限公司 Image multi-label classification method, device, equipment and medium
US20200160177A1 (en) * 2018-11-16 2020-05-21 Royal Bank Of Canada System and method for a convolutional neural network for multi-label classification with partial annotations

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7336832B2 (en) * 2002-07-19 2008-02-26 Sony Corporation Processor and processing method for an image signal, image display apparatus, generation apparatus and generation method for coefficient data used therein, program for executing each of these methods, and computer-readable medium recording the program
CN109034219A (en) * 2018-07-12 2018-12-18 上海商汤智能科技有限公司 Multi-tag class prediction method and device, electronic equipment and the storage medium of image
US20200160177A1 (en) * 2018-11-16 2020-05-21 Royal Bank Of Canada System and method for a convolutional neural network for multi-label classification with partial annotations
CN110163344A (en) * 2019-04-26 2019-08-23 北京迈格威科技有限公司 Neural network training method, device, equipment and storage medium
CN111126514A (en) * 2020-03-30 2020-05-08 同盾控股有限公司 Image multi-label classification method, device, equipment and medium

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657267A (en) * 2021-08-17 2021-11-16 中国科学院长春光学精密机械与物理研究所 Semi-supervised pedestrian re-identification model, method and device
CN113657267B (en) * 2021-08-17 2024-01-12 中国科学院长春光学精密机械与物理研究所 Semi-supervised pedestrian re-identification method and device
CN113705646A (en) * 2021-08-18 2021-11-26 西安交通大学 Radio frequency fine characteristic information extraction method and system based on semi-supervised meta learning
CN113705646B (en) * 2021-08-18 2024-05-10 西安交通大学 Radio frequency fine feature information extraction method and system based on semi-supervised meta learning
CN113743499B (en) * 2021-09-02 2023-09-05 广东工业大学 View angle irrelevant feature dissociation method and system based on contrast learning
CN113743499A (en) * 2021-09-02 2021-12-03 广东工业大学 Visual angle irrelevant feature dissociation method and system based on contrast learning
CN113947571A (en) * 2021-09-30 2022-01-18 北京百度网讯科技有限公司 Training method of vehicle damage detection model and vehicle damage identification method
CN114565972B (en) * 2022-02-23 2024-04-02 中国科学技术大学 Skeleton action recognition method, system, equipment and storage medium
CN114565972A (en) * 2022-02-23 2022-05-31 中国科学技术大学 Skeleton action recognition method, system, equipment and storage medium
CN114758172A (en) * 2022-04-07 2022-07-15 南京工业大学 Semi-supervised medical image classification method based on safety contrast self-integration framework
CN114821204A (en) * 2022-06-30 2022-07-29 山东建筑大学 Meta-learning-based embedded semi-supervised learning image classification method and system
WO2024016949A1 (en) * 2022-07-20 2024-01-25 马上消费金融股份有限公司 Label generation method and apparatus, image classification model method and apparatus, and image classification method and apparatus
CN115019218A (en) * 2022-08-08 2022-09-06 阿里巴巴(中国)有限公司 Image processing method and processor

Also Published As

Publication number Publication date
CN113033603B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
CN113033603B (en) Weak supervision image multi-label classification method based on meta-learning
CN110414432B (en) Training method of object recognition model, object recognition method and corresponding device
CN108399406B (en) Method and system for detecting weakly supervised salient object based on deep learning
Li et al. 2-D stochastic configuration networks for image data analytics
Vandenhende et al. Revisiting multi-task learning in the deep learning era
Quattoni et al. An efficient projection for l 1,∞ regularization
CN111275092B (en) Image classification method based on unsupervised domain adaptation
Chen et al. Learning linear regression via single-convolutional layer for visual object tracking
CN110189305B (en) Automatic analysis method for multitasking tongue picture
RU2665273C2 (en) Trained visual markers and the method of their production
CN110188827B (en) Scene recognition method based on convolutional neural network and recursive automatic encoder model
CN111400494B (en) Emotion analysis method based on GCN-Attention
CN113657561A (en) Semi-supervised night image classification method based on multi-task decoupling learning
CN107247952B (en) Deep supervision-based visual saliency detection method for cyclic convolution neural network
Sarraf et al. A comprehensive review of deep learning architectures for computer vision applications
CN114692732A (en) Method, system, device and storage medium for updating online label
Baek et al. Decomposed knowledge distillation for class-incremental semantic segmentation
CN112949929A (en) Knowledge tracking method and system based on collaborative embedded enhanced topic representation
CN111259938B (en) Manifold learning and gradient lifting model-based image multi-label classification method
CN115331284A (en) Self-healing mechanism-based facial expression recognition method and system in real scene
Abdelrazik et al. Efficient hybrid algorithm for human action recognition
Moradkhani et al. Segmentation of waterbodies in remote sensing images using deep stacked ensemble model
Shrivastava et al. Dictionary-based multiple instance learning
Bi et al. Entropy-weighted reconstruction adversary and curriculum pseudo labeling for domain adaptation in semantic segmentation
Widhianingsih et al. Augmented domain agreement for adaptable Meta-Learner on Few-Shot classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant