CN114299326A - Small sample classification method based on conversion network and self-supervision - Google Patents

Small sample classification method based on conversion network and self-supervision Download PDF

Info

Publication number
CN114299326A
CN114299326A CN202111483193.7A CN202111483193A CN114299326A CN 114299326 A CN114299326 A CN 114299326A CN 202111483193 A CN202111483193 A CN 202111483193A CN 114299326 A CN114299326 A CN 114299326A
Authority
CN
China
Prior art keywords
class
feature
embedding
small sample
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111483193.7A
Other languages
Chinese (zh)
Inventor
于云龙
靳莉莎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202111483193.7A priority Critical patent/CN114299326A/en
Publication of CN114299326A publication Critical patent/CN114299326A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a small sample classification method based on a conversion network and self-supervision, which is characterized in that a conversion network module is added on the basis of a general classification model, different noises are added for characteristic enhancement, and characteristic embedding with distinctiveness and diversity is synthesized, so that a trained model can be better suitable for downstream tasks of small samples. The method specifically comprises the following steps: acquiring an image data set for training a feature extractor and a conversion network module; sending the image data set into a network, using a feature enhancement method to obtain feature embedding with distinctiveness and diversity, and combining a self-supervision learning training feature extractor and a conversion network module to optimize the sum of several cross entropy losses and KL divergence; and obtaining a trained feature extractor and a conversion network module, and applying the trained feature extractor and the conversion network module to a small sample classification task. The invention has good performance on 4 small sample classification task benchmarks (miniImageNet, tiered ImageNet, CIFAR-FS and Caltech-UCSD), and proves the effectiveness and superiority of the performance.

Description

Small sample classification method based on conversion network and self-supervision
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a small sample classification method for adding a conversion network and self-supervision.
Background
Small sample learning aims at identifying target classes with only a small number of samples per class. To accomplish this task, many existing methods train models with base classes, each of which contains a large number of labeled samples, and then apply the trained models to the testing task. Existing small sample learning methods can be roughly classified into three classes based on data migrated from the base class: a meta-learning based approach; a metric-based learning method; a method based on data enhancement.
The meta-learning-based method tries to learn a meta-learner which can adjust an optimization algorithm so that the meta-learner can quickly adapt to a small sample task;
the method based on metric learning refers to learning a migratable distance metric function to evaluate the similarity between samples;
the data enhancement-based method refers to enhancing data by using a general image transformation technique or generating a countermeasure network. However, this method is not always satisfactory in performance because it lacks the characteristics required by the small sample task.
The classification problem in small sample learning mainly refers to a C-way K-shot problem, which refers to: in the training stage, C classes are randomly extracted from the training set, K samples (C × K data in total) of each class are input as a support set of the model, and Q samples are extracted from the remaining data in the C classes as a query set of the model, that is, how the model distinguishes the C classes from the C × K data is required.
Disclosure of Invention
The invention provides a small sample classification method added with a conversion network and self-supervision, which is better suitable for downstream tasks of small samples. The method is characterized in that a conversion network module is added, the conversion network module is composed of a pair of an encoder and a decoder, and the output is a synthesized characteristic embedding. The method uses a simple feature synthesis technology to disturb the feature space, and synthesizes the feature embedding with distinctiveness and diversity, which is realized by correctly classifying the synthesized feature embedding into the type of the original feature embedding, and simultaneously classifying the synthesized feature embedding into different subclasses according to different added interferences. In addition, in the process of ensuring diversity, self-supervision learning is utilized. This is just a desirable feature for small sample tasks.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a small sample classification method for adding a switching network and self-supervision comprises the following steps:
s1, acquiring an image data set for training the feature extractor and the conversion network module;
s2, sending the image data set to a network, using a feature enhancement method to obtain feature embedding with distinctiveness and diversity, and combining a self-supervision learning training feature extractor and a conversion network module;
and S3, using the trained model for a small sample classification task.
Further, in step S1, a base class is given
Figure BDA0003396268840000021
Where n is the total number of images in the data set, xiAnd yiRespectively representing the ith image and its corresponding class label, yiE { 1.., C }, C representing the total number of classes, each class containing multiple images.
Further, step S2 specifically includes:
s21, randomly sampling a batch of image samples from the image data set in a batch processing mode during deep neural network training
Figure BDA0003396268840000022
Wherein the batch size NbsPresetting;
and S22, sending the batch image samples in the B into a model consisting of a backbone network and a classifier to obtain the prediction probabilities of the batch image samples. The optimization goal of the model using cross-entropy (CE) loss is
Figure BDA0003396268840000031
Where f and g represent the feature extractor and classifier, respectively, Θ is the parameter set, LceDenotes CE loss, R denotes the regularization term of the parameter set, and λ is a hyper-parameter.
And S23, in order to ensure the embedding distinctiveness of the synthesized features, sending the synthesized features into a classification network of the original visual features, and enabling the prediction class to be consistent with the class to which the original visual features belong. The classification of the composite feature embedding is
Figure BDA0003396268840000032
Where t is the number of additional composite feature insertions, cjThe j is the characteristics of the Gaussian distribution noise, T is the conversion network module, yijIs the synthesis of the feature T (f (x)i),cj) Class label of (2), which is related to the original visual feature f (x)i) The class labels of (a) are the same, Θ represents the parameter set of the entire model.
And S24, in order to ensure the diversity of the embedding of the synthesized features, the features synthesized with different noises are divided into different subclasses. Embedding the original visual features and the synthesized features into a classifier different from the above classifier, and outputting the original visual features and the synthesized features into different categories
Figure BDA0003396268840000033
Wherein lijIs an auto-supervised class label that is manually annotated according to different distributions of noise, and h denotes an auto-supervised classifier.
S25, regularizing the composite feature embedding in the label space by using the real visual features to ensure that the composite feature embedding retains the inter-class relation of the real visual features
Figure BDA0003396268840000034
Wherein KL represents the Kullback Leibler divergence, xijIs class yiIn (1)And (4) real samples. f (x)ij) Embedding T (f (x) as a composite featurei),cj) The monitor of (2) is not optimized.
S26, the overall optimization objective is
Lall=L1+L2+αL3+βL4
Where α and β are hyperparameters.
S27, training a deep neural network by using a random gradient descent optimizer with momentum and a back propagation algorithm according to the obtained total loss function;
and S28, repeating the steps S21 to S27 until the model converges.
Further, step S3 specifically includes:
s31, given a C-way K-shot classification task, the support set is S. For each support sample xuFirstly, a final feature representation is obtained through a feature extractor and a conversion network module
Figure BDA0003396268840000041
S32, calculating visual prototype of each category
Figure BDA0003396268840000042
Wherein c represents a certain class, ScAnd | ScAnd | is the support set and number of samples in the support set for category c.
S33 test sample x in query setuThe probability that it belongs to class c is
Figure BDA0003396268840000043
Where d is a similarity metric function. Finally, according to the probability of the test sample belonging to the N classes, the class to which the test sample belongs is predicted, and the class with the highest probability is the predicted class.
The small sample classification method for adding the conversion network and self-supervision has the following advantages:
firstly, the method directly synthesizes visual features instead of input data, and ensures the distinctiveness and diversity of the embedding of the synthesized features by introducing SSL supervision;
secondly, the method proves that the synthesis feature embedding can provide an additional mode for feature representation, so that the model is better suitable for a downstream small sample task;
the small sample classification method added with the conversion network and the self-supervision has good performance on 4 small sample classification task benchmarks (miniImageNet, tiered ImageNet, CIFAR-FS and Caltech-UCSD), and proves the effectiveness and superiority of the method in performance.
Drawings
Fig. 1 is a schematic flow chart of a small sample classification method for joining a transition network and self-supervision according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims.
Referring to fig. 1, in a preferred embodiment of the present invention, a method for joining a transition network and self-supervision small sample classification includes the following steps:
first, an image dataset is obtained for training the feature extractor and the transformation network module.
In particular, the base class is given
Figure BDA0003396268840000051
Where n is the total number of images in the data set, xiAnd yiRespectively representing the ith image and its corresponding class label, yi∈{1,.., C representing the total number of categories, each category containing multiple images.
Then, the image data set is sent into a network, feature embedding with distinguishability and diversity is obtained by using a feature enhancement method, and a feature extractor and a conversion network module are trained by combining self-supervision learning. The method specifically comprises the following steps:
firstly, a batch processing mode is adopted when the deep neural network is trained, firstly, a batch of image samples are randomly sampled from an image data set
Figure BDA0003396268840000061
Wherein the batch size NbsIs given in advance.
And secondly, sending the batch image samples in the B into a model consisting of a backbone network and a classifier to obtain the prediction probability of the batch image samples. The optimization goal of the model using cross-entropy loss is
Figure BDA0003396268840000062
Where f and g represent the feature extractor and classifier, respectively, Θ is the parameter set, LceDenotes CE loss, R denotes the regularization term of the parameter set, and λ is a hyper-parameter.
And thirdly, in order to ensure the embedding distinctiveness of the synthesized features, the synthesized features are sent into a classification network of the original visual features, and the prediction classes are consistent with the classes to which the original visual features belong. The classification of the composite feature embedding is
Figure BDA0003396268840000063
Where t is the number of additional composite feature insertions, cjThe j is the characteristics of the Gaussian distribution noise, T is the conversion network module, yijIs the synthesis of the feature T (f (x)i),cj) Class label of (2), which is related to the original visual feature f (x)i) The class labels of (a) are the same, Θ represents the parameter set of the entire model.
And fourthly, in order to ensure the diversity of the embedding of the synthesized features, the features synthesized with different noises are divided into different subclasses. Embedding the original visual features and the synthesized features into a classifier different from the above classifier, and outputting the original visual features and the synthesized features into different categories
Figure BDA0003396268840000064
Wherein lijIs an auto-supervised class label that is manually annotated according to different distributions of noise, and h denotes an auto-supervised classifier.
Fifthly, regularizing the embedding of the synthesized features in the label space by using the real visual features to ensure that the embedding of the synthesized features preserves the inter-class relationship of the real visual features
Figure BDA0003396268840000071
Wherein KL represents the Kullback Leibler divergence, xijIs class yiOf (4) is determined. f (x)ij) Embedding T (f (x) as a composite featurei),cj) The monitor of (2) is not optimized.
The sixth step, get the total optimization goal to
Lall=L1+L2+αL3+βL4
Where α and β are hyperparameters.
Seventhly, training a deep neural network by using a random gradient descent optimizer with momentum and a back propagation algorithm according to the obtained total loss function;
and finally, repeating the steps until the model converges.
And finally, using the trained model for a small sample classification task.
Given a C-way K-shot classification task, the support set is S. For each support sample xuFirstly, a final feature representation is obtained through a feature extractor and a conversion network module
Figure BDA0003396268840000072
Then, visual prototypes of the respective categories are calculated
Figure BDA0003396268840000073
Wherein c represents a certain class, ScAnd | ScAnd | is the support set and number of samples in the support set for category c.
Further, for test sample x in the query setuThe probability that it belongs to class c is
Figure BDA0003396268840000074
Where d is a similarity metric function. Finally, according to the probability of the test sample belonging to the N classes, the class to which the test sample belongs is predicted, and the class with the highest probability is the predicted class.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (4)

1. A small sample classification method for adding a switching network and self-supervision is characterized by comprising the following steps:
s1, acquiring an image data set for training the feature extractor and the conversion network module;
s2, sending the image data set to a network, using a feature enhancement method to obtain feature embedding with distinctiveness and diversity, and combining a self-supervision learning training feature extractor and a conversion network module;
and S3, using the trained model for a small sample classification task.
2. The method for small sample classification for joining transition network and auto-supervision according to claim 1, wherein in step S1, the base class is given
Figure FDA0003396268830000011
Where n is the total number of images in the data set, xiAnd yiRespectively representing the ith image and its corresponding class label, yiE { 1.., C }, C representing the total number of classes, each class containing multiple images.
3. The method for small sample classification for joining a switching network and self-supervision according to claim 2, wherein the step S2 specifically includes:
s21, during training, a batch processing mode is adopted, firstly, a batch of image samples are randomly sampled from the image data set
Figure FDA0003396268830000012
Wherein the batch size NbsPresetting;
s22, sending the batch image samples in the B into a model consisting of a backbone network and a classifier to obtain the prediction probability of the batch image samples; the optimization goal of the model using cross-entropy loss is
Figure FDA0003396268830000013
Where f and g represent the feature extractor and classifier, respectively, Θ is the parameter set, LceRepresenting CE loss, R representing the regularization term of the parameter set, λ being a hyper-parameter;
s23, in order to ensure the embedding distinguishability of the synthesized features, the synthesized features are sent into a classification network of the original visual features, and the prediction categories are consistent with the categories to which the original visual features belong; the classification of the composite feature embedding is
Figure FDA0003396268830000021
Where t is the number of additional composite feature insertions, cjThe j is the characteristics of the Gaussian distribution noise, T is the conversion network module, yijIs the synthesis of the feature T (f (x)i),cj) Class label of (2), which is related to the original visual feature f (x)i) The class labels of (a) are the same, Θ represents the parameter set of the entire model;
s24, in order to ensure the diversity of the embedding of the synthesized features, the features synthesized with different noises are divided into different subclasses, the original visual features and the synthesized features are embedded and sent to a classifier different from the classifier, and the original visual features and the synthesized features are output into different classes
Figure FDA0003396268830000022
Wherein lijIs an auto-supervised class label manually annotated according to different distributions of noise, h represents an auto-supervised classifier;
s25, regularizing the composite feature embedding in the label space by using the real visual features to ensure that the composite feature embedding retains the inter-class relation of the real visual features
Figure FDA0003396268830000023
Wherein KL represents the Kullback Leibler divergence, xijIs class yiThe true sample of (1); f (x)ij) Embedding T (f (x) as a composite featurei),cj) The monitor of (2) does not perform optimization;
s26, the overall optimization objective is
Lall=L1+L2+αL3+βL4
Wherein α and β are hyperparameters;
s27, training a deep neural network by using a random gradient descent optimizer with momentum and a back propagation algorithm according to the obtained total loss function;
and S28, repeating the steps S21 to S27 until the model converges.
4. The method for small sample classification for joining a switching network and self-supervision according to any one of claims 1 to 3, wherein the step S3 specifically includes:
s31, given a C-way K-shot classification task, the support set is S. For each support sample xuFirstly, a final feature representation is obtained through a feature extractor and a conversion network module
Figure FDA0003396268830000031
S32, calculating visual prototype of each category
Figure FDA0003396268830000032
Wherein c represents a certain class, ScAnd | ScI is the support set and the number of samples in the support set for category c;
s33 test sample x in query setuThe probability that it belongs to class c is
Figure FDA0003396268830000033
Where d is the similarity measure function and the cosine similarity function is used in the present invention. Finally, according to the probability of the test sample belonging to the N classes, the class to which the test sample belongs is predicted, and the class with the highest probability is the predicted class.
CN202111483193.7A 2021-12-07 2021-12-07 Small sample classification method based on conversion network and self-supervision Pending CN114299326A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111483193.7A CN114299326A (en) 2021-12-07 2021-12-07 Small sample classification method based on conversion network and self-supervision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111483193.7A CN114299326A (en) 2021-12-07 2021-12-07 Small sample classification method based on conversion network and self-supervision

Publications (1)

Publication Number Publication Date
CN114299326A true CN114299326A (en) 2022-04-08

Family

ID=80965005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111483193.7A Pending CN114299326A (en) 2021-12-07 2021-12-07 Small sample classification method based on conversion network and self-supervision

Country Status (1)

Country Link
CN (1) CN114299326A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114936615A (en) * 2022-07-25 2022-08-23 南京大数据集团有限公司 Small sample log information anomaly detection method based on characterization consistency correction
CN116071609A (en) * 2023-03-29 2023-05-05 中国科学技术大学 Small sample image classification method based on dynamic self-adaptive extraction of target features

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114936615A (en) * 2022-07-25 2022-08-23 南京大数据集团有限公司 Small sample log information anomaly detection method based on characterization consistency correction
CN114936615B (en) * 2022-07-25 2022-10-14 南京大数据集团有限公司 Small sample log information anomaly detection method based on characterization consistency correction
CN116071609A (en) * 2023-03-29 2023-05-05 中国科学技术大学 Small sample image classification method based on dynamic self-adaptive extraction of target features

Similar Documents

Publication Publication Date Title
CN110334705B (en) Language identification method of scene text image combining global and local information
Oord et al. Representation learning with contrastive predictive coding
CN111428718B (en) Natural scene text recognition method based on image enhancement
CN111552807B (en) Short text multi-label classification method
CN111444340A (en) Text classification and recommendation method, device, equipment and storage medium
CN109189767B (en) Data processing method and device, electronic equipment and storage medium
CN111079847B (en) Remote sensing image automatic labeling method based on deep learning
CN111738169B (en) Handwriting formula recognition method based on end-to-end network model
CN110188827B (en) Scene recognition method based on convolutional neural network and recursive automatic encoder model
CN113610173A (en) Knowledge distillation-based multi-span domain few-sample classification method
CN113626589B (en) Multi-label text classification method based on mixed attention mechanism
CN114299326A (en) Small sample classification method based on conversion network and self-supervision
CN113052017B (en) Unsupervised pedestrian re-identification method based on multi-granularity feature representation and domain self-adaptive learning
CN112232395B (en) Semi-supervised image classification method for generating countermeasure network based on joint training
CN113434683A (en) Text classification method, device, medium and electronic equipment
CN112990196A (en) Scene character recognition method and system based on hyper-parameter search and two-stage training
Xiao et al. An extended attention mechanism for scene text recognition
CN114328934A (en) Attention mechanism-based multi-label text classification method and system
CN114780767A (en) Large-scale image retrieval method and system based on deep convolutional neural network
CN114416991A (en) Method and system for analyzing text emotion reason based on prompt
CN116226357B (en) Document retrieval method under input containing error information
CN116385946B (en) Video-oriented target fragment positioning method, system, storage medium and equipment
CN116483942A (en) Legal case element extraction method based on re-attention mechanism and contrast loss
CN116521863A (en) Tag anti-noise text classification method based on semi-supervised learning
CN115098681A (en) Open service intention detection method based on supervised contrast learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination