CN113033603A

CN113033603A - Weak supervision image multi-label classification method based on meta-learning

Info

Publication number: CN113033603A
Application number: CN202110162956.1A
Authority: CN
Inventors: 陈刚; 陈珂; 董合德; 寿黎但; 骆歆远
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-06-25
Anticipated expiration: 2041-02-05
Also published as: CN113033603B

Abstract

The invention discloses a weak supervision image multi-label classification method based on meta-learning, and belongs to the technical field of image processing. The method is used for solving the problem that the label dependency relationship cannot be effectively modeled due to label loss, and provides an image multi-label classification model based on label information enhancement. In response to the phenomenon of model overfitting caused by insufficient monitoring information in a weak monitoring environment, the invention provides a teacher-student network architecture training method based on meta learning, and the accuracy of image annotation is further improved.

Description

Weak supervision image multi-label classification method based on meta-learning

Technical Field

The invention belongs to the technical field of image processing, relates to an image multi-label classification method, and particularly relates to a weak supervision image multi-label classification method based on meta-learning.

Background

An image is a record of a real scene that tends to contain rich and complex semantic concepts. How to quickly and accurately identify a plurality of different semantic concepts contained in an image is a target of an image multi-label classification task. In addition, the image multi-label technology is widely applied to the fields of target detection, robot assistance technology, automatic driving and the like. For example, given FIG. 1 as an input, we need to identify two semantic concepts "airplane", "sky" contained in the image.

At present, the deep learning method makes remarkable progress in the image multi-label classification task. However, the deep learning network needs a large amount of completely labeled supervision data, and the acquisition of completely labeled supervision data is time-consuming and expensive due to the problems that the image multi-label classification task may have complex semantic concepts contained in the image, the image is not easily distinguished, the predefined label set is too large, the concepts between the labels overlap and the like. In order to solve the problem, a weak supervision image multi-label classification task is provided. The task of multi-label classification of weakly supervised images refers to the construction of well-behaved image multi-table prediction models in weakly supervised environment. The weak supervision environment means that a data set provides partial labels, even partial label-free data set, namely a training set of the weak supervision environment is composed of labeled data, partial label missing data and label-free data. The existing mainstream method for multi-label classification of the weakly supervised image relies on prior knowledge to construct a graph network, the graph network needs to contain data of all training sets, and the problem of construction performance of the graph network is verified when the training data scale is increased, so that the expansibility of the scheme is poor. Therefore, how to improve the generalization capability and the universality of the model under limited supervision information and better model the tag dependency relationship is a main challenge of the task of multi-tag classification of the weakly supervised image.

Disclosure of Invention

In order to solve the problem that the performance of the existing weak supervision image multi-label classification method is insufficient in a large-scale data set, the invention provides a weak supervision image multi-label classification method based on meta-learning. Firstly, the invention provides a label information enhancement-based deep learning model suitable for a weak supervision scene, which predicts labels in a label sequence in sequence according to a predefined label sequence so as to obtain image-related labels. The traditional label sequence only contains an image correlation sequence, so that in a weak supervision scene, a missing label is regarded as an irrelevant label, and a model is misled. Therefore, irrelevant tags are introduced into the designed tag sequence, so that the co-occurrence and mutual exclusion relation among the tags can be explicitly modeled, and the problem of insufficient tag sequence information caused by the loss of the relevant tags is solved. Then, for the problem that the performance of the model is low due to the fact that the training data are incomplete in the weak supervision scene, the invention provides a teacher-network training framework based on meta-learning, a teacher network with better robustness is built through an exponential moving average algorithm, extra supervision information is provided for non-label data in the training process, in addition, a label covering mechanism is adopted to build meta-tasks according to a model-independent deep meta-learning paradigm, the model learns more different tasks under the limited supervision information, and the generalization of the model is improved.

The technical scheme adopted by the invention is as follows: a weak supervision image multi-label classification method based on meta-learning is realized on a weak supervision image multi-label classification system, wherein the weak supervision image multi-label classification system comprises an image multi-label classification network based on label information enhancement and a teacher-student network training framework based on meta-learning; the multi-label classification network comprises an encoding layer and a decoding layer; the encoder receives an image as input, and a ResNet-152 pre-training model is adopted to obtain a low-dimensional feature matrix and a high-dimensional feature vector of the image; the decoder is an LSTM sequence decoding structure and is used for generating a label labeling sequence; the meta learning based teacher-student network architecture includes a teacher model and a student model. The weak supervision image multi-label classification method comprises the following steps:

(1) the image is input to an encoder, and the outputs of the last and last-but-one layers in the encoder network are selected as the low-dimensional feature matrix and the high-dimensional feature vector of the image.

(2) And taking the coded high-dimensional feature vector as an abstract representation of the image, taking the abstract representation as an initial input of a decoder, and predicting whether a first label in a predefined label sequence is related or not.

(3) And predicting whether the current label in the label sequence is relevant according to the prediction information of the previous label in the label sequence as the current input of the decoder.

(4) And (4) repeating the step (3) until all the labels in the label sequence are predicted.

(5) And comparing the obtained label sequence with a correct label sequence, calculating a loss value by adopting a teacher-student network architecture based on meta learning, and minimizing the loss value by an optimization method to finally obtain the trained multi-label classification network for image annotation.

(6) And acquiring an image to be detected, inputting the image to be detected into a trained multi-label classification network for image labeling, and acquiring a labeling result of the image.

Further: the step (3) comprises the following substeps:

(3.1) assuming that the t label in the current predicted label sequence has a predicted probability value of the t-1 label

Obtaining a corresponding label vector representation value e according to the prediction probability value_t-1。

Where tau denotes a threshold-value-over-parameter,

and

the trainable characterization vectors represent the relevant and irrelevant correspondences of the t-1 th label, respectively.

(3.2) low-dimensional feature V obtained by the encoder_featAnd the t-1 hidden state h of the decoder_t-1To carry outInteracting to obtain filtered independent feature image representation z_t。

α_i，t＝f_att(v_i，h_t-1)

Wherein f is_attRepresenting an Attention network.

(3.3) characterizing the tag by a characteristic e_t-1And an image characterizing feature z_tStitching to obtain an input representation x for a current decoder_tThen the corresponding t-th hidden state h is obtained by the decoder_tH is to be_tInputting the label into the t-th label classification layer to obtain a corresponding label predicted value

h_t＝f_LSTM(x_t，h_t-1，c_t-1)

Wherein f is_LSTMDenotes an LSTM cell, W_tAnd b_tRepresenting trainable parameters of the t-th label classification level.

Further: the step (5) comprises the following substeps:

(5.1) constructing two deep learning models based on tag information enhancement and respectively using the two deep learning models as student models f_stu(I;. theta.) and teacher model

The initial model parameters are respectively theta₀And

(5.2) for student model parameter θ₀And carrying out random initialization, then updating according to the gradient, and updating the model parameters of the teacher by adopting an exponential moving average algorithm.

Wherein theta is_tRepresenting the parameters of the student model in the tth training iteration,

and

and respectively representing parameters of the teacher model in the t-th training iteration and the t-1 st training iteration, wherein beta is more than 0 and represents the weighting weight of the hyperparameter. In addition, the

And (5.3) randomly extracting a small batch data set from the weak supervision training data set.

(5.4) Small batch data

The image I in (1) is input into the student model f_stuObtaining corresponding predicted value in (I; theta)

Calculating corresponding supervision loss by adopting two-classification cross entropy

(5.5) data of small batches

The image I in (1) is input into the teacher model

To obtain the corresponding predicted value

Calculating corresponding consistency loss by using mean square error

(5.6) loss supervision will be carried out

And loss of consistency

Adding as a final supervision loss

Wherein alpha is₃Is a hyper-parametric trade-off weight.

(5.7) loss according to final supervision

Calculating the gradient of the student model parameter theta

(5.8) from small batch data sets

Extract eta data to put

Putting the remaining data into a meta-test data set

(5.9) for

Will be provided with

The image I in the image processing system is turned, cut and denoised according to a certain probability lambda to obtain

Will be provided with

The image y in (1) is obtained according to the labels of random masking probability rho quantity

Finally obtaining a meta-training data set

(5.10) training the meta data

(5.11) training the meta data

The image I in (1) is input into the teacher model

To obtain the corresponding predicted value

Calculating corresponding consistency loss by using mean square error

(5.12) loss supervision will be carried out

And loss of consistency

Adding as final meta learning goal

Wherein alpha is₁Is a hyper-parametric trade-off weight.

(5.13) training the data set in meta-mode according to the meta-learning objective

And calculating one-step gradient descent to obtain a meta-learning parameter theta'.

Where γ is the hyperparametric trade-off weight.

(5.14) meta-test data

The image I in (1) is input into the student model f_stu(I; theta') to obtain corresponding predicted values

(5.15) metadata test data

The image I in (1) is input into the teacher model

To obtain the corresponding predicted value

Calculating corresponding consistency loss by using mean square error

(5.16) loss supervision will be carried out

And loss of consistency

Adding as final meta-learning loss

Wherein alpha is₁Is a hyper-parametric trade-off weight.

(5.17) testing the data set in meta-manner according to the meta-learning loss

Calculating gradients

And then with the gradient obtained in step (3)

Adding to obtain the final gradient

And updating the student model parameter theta.

Where α is the hyper-parametric learning rate.

Compared with the prior art, the invention has the following beneficial effects: the invention provides a brand-new teacher-student multi-label classification network architecture for weak supervision image multi-label classification based on meta learning. The multi-label classification network can effectively model the relation among labels in a weak supervision environment, improves the image labeling accuracy, can improve the generalization of the network in the weak supervision environment based on a teacher-student training architecture of meta learning, can be theoretically applied to a common image multi-label classification network, and has certain universality. Experiments show that the weak supervision image multi-label classification method can effectively improve the image labeling accuracy.

Drawings

FIG. 1 is a sample exemplary diagram of an image multi-label dataset;

FIG. 2 is a diagram of a teacher-student network architecture for employing the present invention in connection with meta-learning;

FIG. 3 is an exemplary diagram of image labeling using the image multi-label classification method according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention shall be described in further detail with reference to the following detailed description and accompanying drawings.

The invention provides a weak supervision image multi-label classification method based on meta-learning. The method mainly comprises two parts, as shown in figure 1, and is a teacher-student network training architecture based on meta learning provided by the invention; as shown in fig. 2, the image multi-label classification network enhanced based on label information is proposed in the present invention. Firstly, training the image multi-label classification network based on label information enhancement through the teacher-student network training architecture based on meta learning to obtain a trained image multi-label classification network; and then labeling the image to be detected by using the trained image multi-label classification network. The weak supervision image multi-label classification method based on meta-learning comprises the following steps:

(3) And predicting whether the current label in the label sequence is relevant according to the prediction information of the previous label in the label sequence as the current input of the decoder. The method specifically comprises the following substeps:

Where tau denotes a threshold-value-over-parameter,

and

(3.2) low-dimensional feature V obtained by the encoder_featAnd the t-1 hidden state h of the decoder_t-1Interacting to obtain image representation z after filtering irrelevant features_t。

α_i，t＝f_att(v_i，h_t-1)

Wherein f is_attRepresenting an Attention network.

h_t＝f_LSTM(x_t，h_t-1，c_t-1)

Wherein f is_LSTMDenotes an LSTM cell, W_tAnd b_tTrainable parameters representing the t-th label classification level,

representing the concatenation of two vectors.

(5) And comparing the obtained label sequence with a correct label sequence, calculating a loss value by adopting a teacher-student network architecture based on meta learning, and minimizing the loss value by an optimization method to finally obtain the trained multi-label classification network for image annotation. The method specifically comprises the following substeps:

The initial model parameters are respectively theta₀And

and

(5.4) Small batch data

(5.5) data of small batches

The image I in (1) is input into the teacher model

To obtain the corresponding predicted value

Calculating corresponding consistency loss by using mean square error

(5.6) loss supervision will be carried out

And loss of consistency

Adding as a final supervision loss

Wherein alpha is₃Is a hyper-parametric trade-off weight.

(5.7) loss according to final supervisionMedicine for treating chronic hepatitis B

Calculating the gradient of the student model parameter theta

(5.8) from small batch data sets

Extract eta data to put

Putting the remaining data into a meta-test data set

(5.9) for

Will be provided with

Will be provided with

Finally obtaining a meta-training data set

(5.10) training the meta data

(5.11) training the meta data

The image I in (1) is input into the teacher model

To obtain the corresponding predicted value

Calculating corresponding consistency loss by using mean square error

(5.12) loss supervision will be carried out

And loss of consistency

Adding as final meta learning goal

Wherein alpha is₁Is a hyper-parametric trade-off weight.

Where γ is the hyperparametric trade-off weight.

(5.14) meta-test data

(5.15) metadata test data

The image I in (1) is input into the teacher model

To obtain the corresponding predicted value

Calculating corresponding consistency loss by using mean square error

(5.16) loss supervision will be carried out

And loss of consistency

Adding as final meta-learning loss

Wherein alpha is₁Is a hyper-parametric trade-off weight.

(5.17) testing the data set in meta-manner according to the meta-learning loss

Calculating gradients

And then with the gradient obtained in step (3)

Adding to obtain the final gradient

And updating the student model parameter theta.

Where α is the hyper-parametric learning rate.

Examples

The following takes fig. 1 as an example, and refers to fig. 3, which is a supplementary description of the image multi-label classification process. Assuming that the predefined sequence of tags is in order { airplane, train, sky }, the set of related tags of fig. 1 is { airplane, sky }.

(1) Based on the weakly supervised data set, a teacher-student network training architecture based on meta learning is adopted to train an image multi-label classification network based on label information enhancement, and the network comprises an encoder and a decoder. In the above example, the input image is shown in fig. 1, the output tag sequence is { airplane-related, train-unrelated, sky-related }, and the tag sequence to be labeled by the network is { airplane, train, sky }.

(2) The image is input to an encoder, and the outputs of the last and last-but-one layers in the encoder network are selected as the low-dimensional feature matrix and the high-dimensional feature vector of the image.

(4) And taking the coded high-dimensional feature vector as an abstract representation of the image, taking the abstract representation as an initial input of a decoder, and predicting whether a first label { airplane } in a predefined label sequence is related or not, wherein the prediction result is assumed to be { airplane related }.

(5) And predicting whether the current label { train } in the label sequence is related according to the prediction information { airplane related } of the previous label in the label sequence as the current input of a decoder, and assuming that the prediction result is { train independent }.

(6) And predicting whether the current label { sky } in the label sequence is related according to the prediction information { train-independent } of the previous label in the label sequence as the current input of a decoder, and assuming that the prediction result is { sky-related }.

(7) All the labels in the label sequence are labeled, the model is ended to predict, and finally the predicted label sequence { airplane correlation, train independence and sky correlation } is obtained, so that the related labels in the graph 1 are { airplane and sky }.

The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the principle and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims

1. A weak supervision image multi-label classification method based on meta-learning is characterized in that the method is implemented on a weak supervision image multi-label classification system, and the weak supervision image multi-label classification system comprises an image multi-label classification network based on label information enhancement and a teacher-student network training framework based on meta-learning; the multi-label classification network comprises an encoding layer and a decoding layer; the encoder receives an image as input, and a ResNet-152 pre-training model is adopted to obtain a low-dimensional feature matrix and a high-dimensional feature vector of the image; the decoder is an LSTM sequence decoding structure and is used for generating a label labeling sequence; the meta learning based teacher-student network architecture includes a teacher model and a student model. The weak supervision image multi-label classification method comprises the following steps:

2. The weak supervision image multi-label classification method according to claim 1, characterized by: the step (3) comprises the following substeps:

Where tau denotes a threshold-value-over-parameter,

and

α_i,t＝f_att(v_i,h_t-1)

Wherein f is_attRepresenting an Attention network.

(3.3) characterizing the tag by a characteristic e_t-1And an image characterizing feature z_tPerforming stitching to obtainInput characterization x of the current decoder_tThen the corresponding t-th hidden state h is obtained by the decoder_tH is to be_tInputting the label into the t-th label classification layer to obtain a corresponding label predicted value

h_t＝f_LSTM(x_t,h_t-1,c_t-1)

3. The weak supervision image multi-label classification method according to claim 1, characterized by: the step (5) comprises the following substeps:

The initial model parameters are respectively theta₀And

(5.2) for student model parameter θ₀A random initialization is performed followed by an update according to the gradient, andand updating the model parameters of the teacher by adopting an exponential moving average algorithm.

and

represents the parameters of the teacher model in the t-th and t-1-th training iterations, beta>0 represents a hyperparametric weighting. In addition, the