CN115908949B

CN115908949B - Long-tail image recognition method based on class balance encoder

Info

Publication number: CN115908949B
Application number: CN202310014823.9A
Authority: CN
Inventors: 魏秀参; 沈阳; 孙旭豪
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-11-17
Anticipated expiration: 2043-01-06
Also published as: CN115908949A

Abstract

The invention discloses a long-tail image identification method based on a class balance encoder, which comprises the following steps: two different data enhancement methods are used for an input picture, and two groups of samples are obtained; respectively inputting the two groups of samples into an encoder, a class balance encoder and a momentum encoder to obtain three groups of characteristic representations; classifying the characteristic representations output by the encoder and the class balance encoder into a classifier, and calculating weighted cross entropy loss according to the image real labels; respectively carrying out nonlinear mapping and two-norm normalization on the three groups of characteristic representations; calculating cosine similarity loss; random gradient descent optimization is performed on the encoder, the class balance encoder and the classifier, and momentum optimization is performed on the momentum encoder. In a natural species identification task, the invention utilizes the class balance encoder to enhance the feature learning of the rare species sample, and uses the self-supervision training to learn more comprehensive feature representation, thereby improving the species image identification accuracy in a natural scene.

Description

Long-tail image recognition method based on class balance encoder

Technical Field

The invention belongs to the field of class unbalanced image recognition, and particularly relates to a long-tail image recognition method based on a class balanced encoder in a natural scene.

Background

In the related research of image classification, unbalanced data, especially data with sample size distribution in a long tail shape, is one of the popular directions of current research, and related research contents and achievements also meet the requirements of practical application. Unbalanced image data is that the number of images occupied by different classes in a dataset is unequal, with very few classes (head class) occupying most of the samples in the dataset and most classes (tail class) occupying very little image data in a long tail data distribution.

The neural network model performs poorly on the tail categories after training on the unbalanced dataset, mainly because of the small number of pictures in the tail categories. In the neural network model training process, most training data are occupied by head types, and tail type pictures utilized in model training are too few compared with the head types, so that the classification performance of the model on the tail type data is poor. For the phenomenon that the neural network model performs poorly in the tail class, one conventional solution is a class-weight balancing strategy, i.e., increasing the contribution of tail class samples to model parameter optimization during training, such as increasing the sampling frequency of tail class samples or assigning a larger weight to the training samples of the tail class in a loss function.

The class weight balancing technology has good classifying effect on unbalanced data, but the tail class only contains a small amount of information, and increasing the proportion of the tail class in training can lead to the fact that the model cannot fully utilize head class data containing more information, and the characterization learning of the model is destroyed.

Disclosure of Invention

The invention aims to provide a long-tail image recognition method based on a class balance encoder in a natural scene.

The technical scheme for realizing the purpose of the invention is as follows: in a first aspect, the present invention provides a long-tail image data training method based on a class-balanced encoder, comprising the steps of:

step 1, obtaining species image data samples in a natural scene, respectively enhancing an input image by two times to obtain two groups of samples, and inputting the first group of samples into an encoder and a class balance encoder to obtain characteristic representationAnd->The second set of samples is input with the momentum encoder acquisition characteristic representation +.>；

The data enhancement includes autoaugmentor, random horizontal flipping, random changing of image brightness, contrast, saturation, gray scale, and random gaussian blur. AutoAutoAutoAutoAutoAutosegments randomly select one at a time from histogram equalization, inversion, tilting, rotation, sharpening, brightness adjustment, and color adjustment to enhance the image. The encoder and the momentum encoder are convolutional neural networks with the same structure and the same initial parameters.

Step 2, representing the characteristicsAnd->Input classifier->、/>Respectively calculating weighted cross entropy loss;

step 3, respectively representing the characteristicsAnd s' input different nonlinear mapper +.>And->And performing two-norm normalization to obtain new characteristic representation ++>And->Calculate->And->Cosine betweenSimilarity loss and predict +.>Prediction confidence->Storing the data into a characteristic buffer area;

step 4, calculatingAnd->Between (I) and (II)>And->Cosine similarity loss between the two;

step 5, updating coder, class balance coder, classifier and nonlinear mapper by using random gradient descent methodIs used to update the parameters of the momentum encoder and the non-linear mapper simultaneously>Is used for completing training and saving model parameters.

In a second aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when the program is executed.

In a third aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of the first aspect.

In a fourth aspect, the invention provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of the first aspect.

Compared with the prior art, the invention has the remarkable advantages that: (1) The method has the advantages that the characterization learning of all the classes is considered through a double-branch structure formed by the encoders of the head part class and the class balance encoder of the tail part class; (2) The self-supervision learning based on cosine similarity loss is used for exploring more comprehensive image characteristics, so that the model has stronger generalization.

Drawings

FIG. 1 is a flow chart of a long tail image recognition method based on a class balance encoder.

Detailed Description

The invention discloses a long-tail image identification method based on a class balance encoder, which comprises the following steps: two different data enhancement methods are used for an input picture, and two groups of samples are obtained; respectively inputting the two groups of samples into an encoder, a class balance encoder and a momentum encoder to obtain three groups of characteristic representations; classifying the characteristic representations output by the encoder and the class balance encoder into a classifier, calculating weighted cross entropy loss according to the image real label, and calculating the weight according to the proportion of the class of the image in the data set; respectively carrying out nonlinear mapping and two-norm normalization on the three groups of characteristic representations; calculating cosine similarity losses between the first and third sets, the second and third sets of feature representations, respectively; and performing random gradient descent optimization on the encoder, the class balance encoder and the classifier according to the loss, and performing momentum optimization on the momentum encoder. In a natural species identification task, the invention utilizes the class balance encoder to enhance the feature learning of the rare species sample, and uses the self-supervision training to learn more comprehensive feature representation, thereby improving the species image identification accuracy in a natural scene.

The technical scheme of the invention is described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a long tail image recognition method based on a class balance encoder specifically includes the following steps:

step 1, acquiring a species image dataset in a natural scene by using terminal equipment;

step 2, the input image is respectively subjected to two times of data enhancement to obtain two groups of samples, and the first group of samples are input into an encoder and a class balance encoder to obtain characteristic representationAnd->The second set of samples is input with the momentum encoder acquisition characteristic representation +.>Encoders, quasi-smooth encoders and momentum encoders use a Resnet network structure;

the data enhancement used included AutoAutoAutoAutoAutomation, random horizontal flipping, random changing image brightness, contrast, saturation, gray scale, and random Gaussian blur. AutoAutoAutoAutoAutoAutosegments randomly select one at a time from histogram equalization, inversion, tilting, rotation, sharpening, brightness adjustment, and color adjustment to enhance the image. The encoder and the momentum encoder are convolutional neural networks with the same structure and the same initial parameters.

Step 3, representing the characteristicsAnd->Input classifier->、/>Respectively calculating weighted cross entropy loss;

wherein the weight isWherein->For inputting picture category->Class +.>And (3) calculating the cross entropy loss after adding the weight to the classifier output.

Step 4, representing the characteristicsAnd->Input non-linear mapper->And performing two-norm normalization to obtain new characteristic representation ++>And->Express the characteristic +.>Input non-linear mapper->And performing two-norm normalization to obtain new characteristic representation ++>；

The nonlinear mapper is composed of linear classifier, batch normalization and ReLU activation structureAnd (3) forming the finished product.And->The structure is the same as the initialization parameters.

Step 5, calculatingAnd->Between (I) and (II)>And->Cosine similarity loss between the two;

and->The cosine similarity loss between them is

Wherein the method comprises the steps ofTo control the superparameter of the loss weights.

And->Cosine phase betweenThe similarity loss is

Wherein the method comprises the steps ofTo control the superparameter of the penalty weight, +.>For class balance weight, the class balance encoder is controlled to pay more attention to tail class samples than the encoder, and the calculation mode is that

Wherein the method comprises the steps ofRepresenting the average value of various sample sizes in training set, +.>Representing the category to which the input image belongs->Sample size of->And (5) controlling the super-parameters of the attention degree of the class balance encoder to the tail class.

Step 6, updating the coder, the class balance coder, the classifier and the nonlinear mapper by using a random gradient descent methodIs used to update the parameters of the momentum encoder and the non-linear mapper simultaneously>Finishing training and saving parameters of the encoder, the class balance encoder and the classifier;

the momentum update method is that

Wherein the method comprises the steps ofIs a parameter of the momentum encoder, < >>For encoder parameters +.>Balance encoder parameters for class->To control the super-parameters of the momentum update rate.

And 7, loading model parameters to identify natural species images.

As other embodiments, the structures of the encoder, the quasi-balanced encoder and the momentum encoder can be designed according to actual requirements.

As other embodiments, the random gradient descent method may be replaced with other parameter optimization methods.

As other embodiments, the natural species image dataset may be replaced with long tail image data of other fields according to practical application requirements.

Claims

1. The long-tail image recognition method based on the class balance encoder is characterized by comprising the following steps of:

step 1, obtaining species image data samples in a natural scene, respectively enhancing an input image by two times to obtain two groups of samples, and inputting the first group of samples into an encoder and a class balance encoderObtaining a feature representationAnd->The second set of samples is input with the momentum encoder acquisition characteristic representation +.>；

step 3, representing the characteristicsAnd->Input non-linear mapper->And performing two-norm normalization to obtain new characteristic representationAnd->Express the characteristic +.>Input non-linear mapper->And performing two-norm normalization to obtain new characteristic representation ++>；

and->The cosine similarity loss between them is

Wherein the method comprises the steps ofSuper parameters for controlling loss weights;

and->The cosine similarity loss between them is

Wherein the method comprises the steps ofTo control the superparameter of the penalty weight, +.>The method is to calculate the class balance weight by

Wherein the method comprises the steps ofRepresenting the average value of various sample sizes in training set, +.>Representing the category to which the input image belongs->Sample size of->Super-parameters for controlling the attention degree of the class balance encoder to the tail class;

step 5, updating coder, class balance coder, classifier and nonlinear mapper by using random gradient descent methodIs used to update the parameters of the momentum encoder and the non-linear mapper simultaneously>Parameters of (2)Finishing training and saving parameters of an encoder, a class smoothing encoder and a classifier;

the momentum update method is that

Wherein the method comprises the steps ofIs a parameter of the momentum encoder, < >>For encoder parameters +.>Balance encoder parameters for class->Super parameters for controlling momentum update rate;

and 6, loading model parameters to identify natural species images.

2. The method of claim 1, wherein the data enhancement used in step 1 comprises auto-segment, random horizontal flipping, random changing of image brightness, contrast, saturation, gray scale, and random gaussian blur; the AutoAutoAutoAutoAutoAutoAutosegments randomly select one from histogram equalization, inversion, tilting, rotation, sharpening, brightness adjustment and color adjustment each time to enhance the image; the encoder and the momentum encoder are convolutional neural networks with the same structure and the same initial parameters.

3. The long-tail image recognition method based on the class-balanced encoder according to claim 1, wherein the weight of the step 2 cross entropy loss isWhich is provided withMiddle->For inputting picture category->Class +.>And (3) calculating the cross entropy loss after adding the weight to the classifier output.

4. The long-tail image recognition method based on class-balanced encoders according to claim 1, characterized in that in step 3, the nonlinear mapper consists of linear classifiers, batch normalization, reLU activation;and->The structure is the same as the initialization parameters.

5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1-4 when the computer program is executed.

6. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1-4.