CN114299326A

CN114299326A - Small sample classification method based on conversion network and self-supervision

Info

Publication number: CN114299326A
Application number: CN202111483193.7A
Authority: CN
Inventors: 于云龙; 靳莉莎
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2022-04-08

Abstract

The invention discloses a small sample classification method based on a conversion network and self-supervision, which is characterized in that a conversion network module is added on the basis of a general classification model, different noises are added for characteristic enhancement, and characteristic embedding with distinctiveness and diversity is synthesized, so that a trained model can be better suitable for downstream tasks of small samples. The method specifically comprises the following steps: acquiring an image data set for training a feature extractor and a conversion network module; sending the image data set into a network, using a feature enhancement method to obtain feature embedding with distinctiveness and diversity, and combining a self-supervision learning training feature extractor and a conversion network module to optimize the sum of several cross entropy losses and KL divergence; and obtaining a trained feature extractor and a conversion network module, and applying the trained feature extractor and the conversion network module to a small sample classification task. The invention has good performance on 4 small sample classification task benchmarks (miniImageNet, tiered ImageNet, CIFAR-FS and Caltech-UCSD), and proves the effectiveness and superiority of the performance.

Description

Small sample classification method based on conversion network and self-supervision

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a small sample classification method for adding a conversion network and self-supervision.

Background

Small sample learning aims at identifying target classes with only a small number of samples per class. To accomplish this task, many existing methods train models with base classes, each of which contains a large number of labeled samples, and then apply the trained models to the testing task. Existing small sample learning methods can be roughly classified into three classes based on data migrated from the base class: a meta-learning based approach; a metric-based learning method; a method based on data enhancement.

The meta-learning-based method tries to learn a meta-learner which can adjust an optimization algorithm so that the meta-learner can quickly adapt to a small sample task;

the method based on metric learning refers to learning a migratable distance metric function to evaluate the similarity between samples;

the data enhancement-based method refers to enhancing data by using a general image transformation technique or generating a countermeasure network. However, this method is not always satisfactory in performance because it lacks the characteristics required by the small sample task.

The classification problem in small sample learning mainly refers to a C-way K-shot problem, which refers to: in the training stage, C classes are randomly extracted from the training set, K samples (C × K data in total) of each class are input as a support set of the model, and Q samples are extracted from the remaining data in the C classes as a query set of the model, that is, how the model distinguishes the C classes from the C × K data is required.

Disclosure of Invention

The invention provides a small sample classification method added with a conversion network and self-supervision, which is better suitable for downstream tasks of small samples. The method is characterized in that a conversion network module is added, the conversion network module is composed of a pair of an encoder and a decoder, and the output is a synthesized characteristic embedding. The method uses a simple feature synthesis technology to disturb the feature space, and synthesizes the feature embedding with distinctiveness and diversity, which is realized by correctly classifying the synthesized feature embedding into the type of the original feature embedding, and simultaneously classifying the synthesized feature embedding into different subclasses according to different added interferences. In addition, in the process of ensuring diversity, self-supervision learning is utilized. This is just a desirable feature for small sample tasks.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a small sample classification method for adding a switching network and self-supervision comprises the following steps:

s1, acquiring an image data set for training the feature extractor and the conversion network module;

s2, sending the image data set to a network, using a feature enhancement method to obtain feature embedding with distinctiveness and diversity, and combining a self-supervision learning training feature extractor and a conversion network module;

and S3, using the trained model for a small sample classification task.

Further, in step S1, a base class is given

Where n is the total number of images in the data set, x_iAnd y_iRespectively representing the ith image and its corresponding class label, y_iE { 1.., C }, C representing the total number of classes, each class containing multiple images.

Further, step S2 specifically includes:

s21, randomly sampling a batch of image samples from the image data set in a batch processing mode during deep neural network training

Wherein the batch size N_bsPresetting;

and S22, sending the batch image samples in the B into a model consisting of a backbone network and a classifier to obtain the prediction probabilities of the batch image samples. The optimization goal of the model using cross-entropy (CE) loss is

Where f and g represent the feature extractor and classifier, respectively, Θ is the parameter set, L^ceDenotes CE loss, R denotes the regularization term of the parameter set, and λ is a hyper-parameter.

And S23, in order to ensure the embedding distinctiveness of the synthesized features, sending the synthesized features into a classification network of the original visual features, and enabling the prediction class to be consistent with the class to which the original visual features belong. The classification of the composite feature embedding is

Where t is the number of additional composite feature insertions, c_jThe j is the characteristics of the Gaussian distribution noise, T is the conversion network module, y_ijIs the synthesis of the feature T (f (x)_i)，c_j) Class label of (2), which is related to the original visual feature f (x)_i) The class labels of (a) are the same, Θ represents the parameter set of the entire model.

And S24, in order to ensure the diversity of the embedding of the synthesized features, the features synthesized with different noises are divided into different subclasses. Embedding the original visual features and the synthesized features into a classifier different from the above classifier, and outputting the original visual features and the synthesized features into different categories

Wherein l_ijIs an auto-supervised class label that is manually annotated according to different distributions of noise, and h denotes an auto-supervised classifier.

S25, regularizing the composite feature embedding in the label space by using the real visual features to ensure that the composite feature embedding retains the inter-class relation of the real visual features

Wherein KL represents the Kullback Leibler divergence, x_ijIs class y_iIn (1)And (4) real samples. f (x)_ij) Embedding T (f (x) as a composite feature_i)，c_j) The monitor of (2) is not optimized.

S26, the overall optimization objective is

L_all＝L₁+L₂+αL₃+βL₄

Where α and β are hyperparameters.

S27, training a deep neural network by using a random gradient descent optimizer with momentum and a back propagation algorithm according to the obtained total loss function;

and S28, repeating the steps S21 to S27 until the model converges.

Further, step S3 specifically includes:

s31, given a C-way K-shot classification task, the support set is S. For each support sample x_uFirstly, a final feature representation is obtained through a feature extractor and a conversion network module

S32, calculating visual prototype of each category

Wherein c represents a certain class, S_cAnd | S_cAnd | is the support set and number of samples in the support set for category c.

S33 test sample x in query set_uThe probability that it belongs to class c is

Where d is a similarity metric function. Finally, according to the probability of the test sample belonging to the N classes, the class to which the test sample belongs is predicted, and the class with the highest probability is the predicted class.

The small sample classification method for adding the conversion network and self-supervision has the following advantages:

firstly, the method directly synthesizes visual features instead of input data, and ensures the distinctiveness and diversity of the embedding of the synthesized features by introducing SSL supervision;

secondly, the method proves that the synthesis feature embedding can provide an additional mode for feature representation, so that the model is better suitable for a downstream small sample task;

the small sample classification method added with the conversion network and the self-supervision has good performance on 4 small sample classification task benchmarks (miniImageNet, tiered ImageNet, CIFAR-FS and Caltech-UCSD), and proves the effectiveness and superiority of the method in performance.

Drawings

Fig. 1 is a schematic flow chart of a small sample classification method for joining a transition network and self-supervision according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims.

Referring to fig. 1, in a preferred embodiment of the present invention, a method for joining a transition network and self-supervision small sample classification includes the following steps:

first, an image dataset is obtained for training the feature extractor and the transformation network module.

In particular, the base class is given

Where n is the total number of images in the data set, x_iAnd y_iRespectively representing the ith image and its corresponding class label, y_i∈{1，.., C representing the total number of categories, each category containing multiple images.

Then, the image data set is sent into a network, feature embedding with distinguishability and diversity is obtained by using a feature enhancement method, and a feature extractor and a conversion network module are trained by combining self-supervision learning. The method specifically comprises the following steps:

firstly, a batch processing mode is adopted when the deep neural network is trained, firstly, a batch of image samples are randomly sampled from an image data set

Wherein the batch size N_bsIs given in advance.

And secondly, sending the batch image samples in the B into a model consisting of a backbone network and a classifier to obtain the prediction probability of the batch image samples. The optimization goal of the model using cross-entropy loss is

And thirdly, in order to ensure the embedding distinctiveness of the synthesized features, the synthesized features are sent into a classification network of the original visual features, and the prediction classes are consistent with the classes to which the original visual features belong. The classification of the composite feature embedding is

And fourthly, in order to ensure the diversity of the embedding of the synthesized features, the features synthesized with different noises are divided into different subclasses. Embedding the original visual features and the synthesized features into a classifier different from the above classifier, and outputting the original visual features and the synthesized features into different categories

Fifthly, regularizing the embedding of the synthesized features in the label space by using the real visual features to ensure that the embedding of the synthesized features preserves the inter-class relationship of the real visual features

Wherein KL represents the Kullback Leibler divergence, x_ijIs class y_iOf (4) is determined. f (x)_ij) Embedding T (f (x) as a composite feature_i)，c_j) The monitor of (2) is not optimized.

The sixth step, get the total optimization goal to

L_all＝L₁+L₂+αL₃+βL₄

Where α and β are hyperparameters.

Seventhly, training a deep neural network by using a random gradient descent optimizer with momentum and a back propagation algorithm according to the obtained total loss function;

and finally, repeating the steps until the model converges.

And finally, using the trained model for a small sample classification task.

Given a C-way K-shot classification task, the support set is S. For each support sample x_uFirstly, a final feature representation is obtained through a feature extractor and a conversion network module

Then, visual prototypes of the respective categories are calculated

Further, for test sample x in the query set_uThe probability that it belongs to class c is

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A small sample classification method for adding a switching network and self-supervision is characterized by comprising the following steps:

and S3, using the trained model for a small sample classification task.

2. The method for small sample classification for joining transition network and auto-supervision according to claim 1, wherein in step S1, the base class is given

3. The method for small sample classification for joining a switching network and self-supervision according to claim 2, wherein the step S2 specifically includes:

s21, during training, a batch processing mode is adopted, firstly, a batch of image samples are randomly sampled from the image data set

Wherein the batch size N_bsPresetting;

s22, sending the batch image samples in the B into a model consisting of a backbone network and a classifier to obtain the prediction probability of the batch image samples; the optimization goal of the model using cross-entropy loss is

Where f and g represent the feature extractor and classifier, respectively, Θ is the parameter set, L^ceRepresenting CE loss, R representing the regularization term of the parameter set, λ being a hyper-parameter;

s23, in order to ensure the embedding distinguishability of the synthesized features, the synthesized features are sent into a classification network of the original visual features, and the prediction categories are consistent with the categories to which the original visual features belong; the classification of the composite feature embedding is

Where t is the number of additional composite feature insertions, c_jThe j is the characteristics of the Gaussian distribution noise, T is the conversion network module, y_ijIs the synthesis of the feature T (f (x)_i)，c_j) Class label of (2), which is related to the original visual feature f (x)_i) The class labels of (a) are the same, Θ represents the parameter set of the entire model;

s24, in order to ensure the diversity of the embedding of the synthesized features, the features synthesized with different noises are divided into different subclasses, the original visual features and the synthesized features are embedded and sent to a classifier different from the classifier, and the original visual features and the synthesized features are output into different classes

Wherein l_ijIs an auto-supervised class label manually annotated according to different distributions of noise, h represents an auto-supervised classifier;

Wherein KL represents the Kullback Leibler divergence, x_ijIs class y_iThe true sample of (1); f (x)_ij) Embedding T (f (x) as a composite feature_i)，c_j) The monitor of (2) does not perform optimization;

s26, the overall optimization objective is

L_all＝L₁+L₂+αL₃+βL₄

Wherein α and β are hyperparameters;

and S28, repeating the steps S21 to S27 until the model converges.

4. The method for small sample classification for joining a switching network and self-supervision according to any one of claims 1 to 3, wherein the step S3 specifically includes:

S32, calculating visual prototype of each category

Wherein c represents a certain class, S_cAnd | S_cI is the support set and the number of samples in the support set for category c;

s33 test sample x in query set_uThe probability that it belongs to class c is

Where d is the similarity measure function and the cosine similarity function is used in the present invention. Finally, according to the probability of the test sample belonging to the N classes, the class to which the test sample belongs is predicted, and the class with the highest probability is the predicted class.