CN115908949B - Long-tail image recognition method based on class balance encoder - Google Patents

Long-tail image recognition method based on class balance encoder Download PDF

Info

Publication number
CN115908949B
CN115908949B CN202310014823.9A CN202310014823A CN115908949B CN 115908949 B CN115908949 B CN 115908949B CN 202310014823 A CN202310014823 A CN 202310014823A CN 115908949 B CN115908949 B CN 115908949B
Authority
CN
China
Prior art keywords
encoder
class
parameters
momentum
steps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310014823.9A
Other languages
Chinese (zh)
Other versions
CN115908949A (en
Inventor
魏秀参
沈阳
孙旭豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202310014823.9A priority Critical patent/CN115908949B/en
Publication of CN115908949A publication Critical patent/CN115908949A/en
Application granted granted Critical
Publication of CN115908949B publication Critical patent/CN115908949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a long-tail image identification method based on a class balance encoder, which comprises the following steps: two different data enhancement methods are used for an input picture, and two groups of samples are obtained; respectively inputting the two groups of samples into an encoder, a class balance encoder and a momentum encoder to obtain three groups of characteristic representations; classifying the characteristic representations output by the encoder and the class balance encoder into a classifier, and calculating weighted cross entropy loss according to the image real labels; respectively carrying out nonlinear mapping and two-norm normalization on the three groups of characteristic representations; calculating cosine similarity loss; random gradient descent optimization is performed on the encoder, the class balance encoder and the classifier, and momentum optimization is performed on the momentum encoder. In a natural species identification task, the invention utilizes the class balance encoder to enhance the feature learning of the rare species sample, and uses the self-supervision training to learn more comprehensive feature representation, thereby improving the species image identification accuracy in a natural scene.

Description

Long-tail image recognition method based on class balance encoder
Technical Field
The invention belongs to the field of class unbalanced image recognition, and particularly relates to a long-tail image recognition method based on a class balanced encoder in a natural scene.
Background
In the related research of image classification, unbalanced data, especially data with sample size distribution in a long tail shape, is one of the popular directions of current research, and related research contents and achievements also meet the requirements of practical application. Unbalanced image data is that the number of images occupied by different classes in a dataset is unequal, with very few classes (head class) occupying most of the samples in the dataset and most classes (tail class) occupying very little image data in a long tail data distribution.
The neural network model performs poorly on the tail categories after training on the unbalanced dataset, mainly because of the small number of pictures in the tail categories. In the neural network model training process, most training data are occupied by head types, and tail type pictures utilized in model training are too few compared with the head types, so that the classification performance of the model on the tail type data is poor. For the phenomenon that the neural network model performs poorly in the tail class, one conventional solution is a class-weight balancing strategy, i.e., increasing the contribution of tail class samples to model parameter optimization during training, such as increasing the sampling frequency of tail class samples or assigning a larger weight to the training samples of the tail class in a loss function.
The class weight balancing technology has good classifying effect on unbalanced data, but the tail class only contains a small amount of information, and increasing the proportion of the tail class in training can lead to the fact that the model cannot fully utilize head class data containing more information, and the characterization learning of the model is destroyed.
Disclosure of Invention
The invention aims to provide a long-tail image recognition method based on a class balance encoder in a natural scene.
The technical scheme for realizing the purpose of the invention is as follows: in a first aspect, the present invention provides a long-tail image data training method based on a class-balanced encoder, comprising the steps of:
step 1, obtaining species image data samples in a natural scene, respectively enhancing an input image by two times to obtain two groups of samples, and inputting the first group of samples into an encoder and a class balance encoder to obtain characteristic representationAnd->The second set of samples is input with the momentum encoder acquisition characteristic representation +.>
The data enhancement includes autoaugmentor, random horizontal flipping, random changing of image brightness, contrast, saturation, gray scale, and random gaussian blur. AutoAutoAutoAutoAutoAutosegments randomly select one at a time from histogram equalization, inversion, tilting, rotation, sharpening, brightness adjustment, and color adjustment to enhance the image. The encoder and the momentum encoder are convolutional neural networks with the same structure and the same initial parameters.
Step 2, representing the characteristicsAnd->Input classifier->、/>Respectively calculating weighted cross entropy loss;
step 3, respectively representing the characteristicsAnd s' input different nonlinear mapper +.>And->And performing two-norm normalization to obtain new characteristic representation ++>And->Calculate->And->Cosine betweenSimilarity loss and predict +.>Prediction confidence->Storing the data into a characteristic buffer area;
step 4, calculatingAnd->Between (I) and (II)>And->Cosine similarity loss between the two;
step 5, updating coder, class balance coder, classifier and nonlinear mapper by using random gradient descent methodIs used to update the parameters of the momentum encoder and the non-linear mapper simultaneously>Is used for completing training and saving model parameters.
In a second aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when the program is executed.
In a third aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of the first aspect.
In a fourth aspect, the invention provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
Compared with the prior art, the invention has the remarkable advantages that: (1) The method has the advantages that the characterization learning of all the classes is considered through a double-branch structure formed by the encoders of the head part class and the class balance encoder of the tail part class; (2) The self-supervision learning based on cosine similarity loss is used for exploring more comprehensive image characteristics, so that the model has stronger generalization.
Drawings
FIG. 1 is a flow chart of a long tail image recognition method based on a class balance encoder.
Detailed Description
The invention discloses a long-tail image identification method based on a class balance encoder, which comprises the following steps: two different data enhancement methods are used for an input picture, and two groups of samples are obtained; respectively inputting the two groups of samples into an encoder, a class balance encoder and a momentum encoder to obtain three groups of characteristic representations; classifying the characteristic representations output by the encoder and the class balance encoder into a classifier, calculating weighted cross entropy loss according to the image real label, and calculating the weight according to the proportion of the class of the image in the data set; respectively carrying out nonlinear mapping and two-norm normalization on the three groups of characteristic representations; calculating cosine similarity losses between the first and third sets, the second and third sets of feature representations, respectively; and performing random gradient descent optimization on the encoder, the class balance encoder and the classifier according to the loss, and performing momentum optimization on the momentum encoder. In a natural species identification task, the invention utilizes the class balance encoder to enhance the feature learning of the rare species sample, and uses the self-supervision training to learn more comprehensive feature representation, thereby improving the species image identification accuracy in a natural scene.
The technical scheme of the invention is described in detail below with reference to the accompanying drawings.
Referring to fig. 1, a long tail image recognition method based on a class balance encoder specifically includes the following steps:
step 1, acquiring a species image dataset in a natural scene by using terminal equipment;
step 2, the input image is respectively subjected to two times of data enhancement to obtain two groups of samples, and the first group of samples are input into an encoder and a class balance encoder to obtain characteristic representationAnd->The second set of samples is input with the momentum encoder acquisition characteristic representation +.>Encoders, quasi-smooth encoders and momentum encoders use a Resnet network structure;
the data enhancement used included AutoAutoAutoAutoAutomation, random horizontal flipping, random changing image brightness, contrast, saturation, gray scale, and random Gaussian blur. AutoAutoAutoAutoAutoAutosegments randomly select one at a time from histogram equalization, inversion, tilting, rotation, sharpening, brightness adjustment, and color adjustment to enhance the image. The encoder and the momentum encoder are convolutional neural networks with the same structure and the same initial parameters.
Step 3, representing the characteristicsAnd->Input classifier->、/>Respectively calculating weighted cross entropy loss;
wherein the weight isWherein->For inputting picture category->Class +.>And (3) calculating the cross entropy loss after adding the weight to the classifier output.
Step 4, representing the characteristicsAnd->Input non-linear mapper->And performing two-norm normalization to obtain new characteristic representation ++>And->Express the characteristic +.>Input non-linear mapper->And performing two-norm normalization to obtain new characteristic representation ++>
The nonlinear mapper is composed of linear classifier, batch normalization and ReLU activation structureAnd (3) forming the finished product.And->The structure is the same as the initialization parameters.
Step 5, calculatingAnd->Between (I) and (II)>And->Cosine similarity loss between the two;
and->The cosine similarity loss between them is
Wherein the method comprises the steps ofTo control the superparameter of the loss weights.
And->Cosine phase betweenThe similarity loss is
Wherein the method comprises the steps ofTo control the superparameter of the penalty weight, +.>For class balance weight, the class balance encoder is controlled to pay more attention to tail class samples than the encoder, and the calculation mode is that
Wherein the method comprises the steps ofRepresenting the average value of various sample sizes in training set, +.>Representing the category to which the input image belongs->Sample size of->And (5) controlling the super-parameters of the attention degree of the class balance encoder to the tail class.
Step 6, updating the coder, the class balance coder, the classifier and the nonlinear mapper by using a random gradient descent methodIs used to update the parameters of the momentum encoder and the non-linear mapper simultaneously>Finishing training and saving parameters of the encoder, the class balance encoder and the classifier;
the momentum update method is that
Wherein the method comprises the steps ofIs a parameter of the momentum encoder, < >>For encoder parameters +.>Balance encoder parameters for class->To control the super-parameters of the momentum update rate.
And 7, loading model parameters to identify natural species images.
As other embodiments, the structures of the encoder, the quasi-balanced encoder and the momentum encoder can be designed according to actual requirements.
As other embodiments, the random gradient descent method may be replaced with other parameter optimization methods.
As other embodiments, the natural species image dataset may be replaced with long tail image data of other fields according to practical application requirements.

Claims (6)

1. The long-tail image recognition method based on the class balance encoder is characterized by comprising the following steps of:
step 1, obtaining species image data samples in a natural scene, respectively enhancing an input image by two times to obtain two groups of samples, and inputting the first group of samples into an encoder and a class balance encoderObtaining a feature representationAnd->The second set of samples is input with the momentum encoder acquisition characteristic representation +.>
Step 2, representing the characteristicsAnd->Input classifier->、/>Respectively calculating weighted cross entropy loss;
step 3, representing the characteristicsAnd->Input non-linear mapper->And performing two-norm normalization to obtain new characteristic representationAnd->Express the characteristic +.>Input non-linear mapper->And performing two-norm normalization to obtain new characteristic representation ++>
Step 4, calculatingAnd->Between (I) and (II)>And->Cosine similarity loss between the two;
and->The cosine similarity loss between them is
Wherein the method comprises the steps ofSuper parameters for controlling loss weights;
and->The cosine similarity loss between them is
Wherein the method comprises the steps ofTo control the superparameter of the penalty weight, +.>The method is to calculate the class balance weight by
Wherein the method comprises the steps ofRepresenting the average value of various sample sizes in training set, +.>Representing the category to which the input image belongs->Sample size of->Super-parameters for controlling the attention degree of the class balance encoder to the tail class;
step 5, updating coder, class balance coder, classifier and nonlinear mapper by using random gradient descent methodIs used to update the parameters of the momentum encoder and the non-linear mapper simultaneously>Parameters of (2)Finishing training and saving parameters of an encoder, a class smoothing encoder and a classifier;
the momentum update method is that
Wherein the method comprises the steps ofIs a parameter of the momentum encoder, < >>For encoder parameters +.>Balance encoder parameters for class->Super parameters for controlling momentum update rate;
and 6, loading model parameters to identify natural species images.
2. The method of claim 1, wherein the data enhancement used in step 1 comprises auto-segment, random horizontal flipping, random changing of image brightness, contrast, saturation, gray scale, and random gaussian blur; the AutoAutoAutoAutoAutoAutoAutosegments randomly select one from histogram equalization, inversion, tilting, rotation, sharpening, brightness adjustment and color adjustment each time to enhance the image; the encoder and the momentum encoder are convolutional neural networks with the same structure and the same initial parameters.
3. The long-tail image recognition method based on the class-balanced encoder according to claim 1, wherein the weight of the step 2 cross entropy loss isWhich is provided withMiddle->For inputting picture category->Class +.>And (3) calculating the cross entropy loss after adding the weight to the classifier output.
4. The long-tail image recognition method based on class-balanced encoders according to claim 1, characterized in that in step 3, the nonlinear mapper consists of linear classifiers, batch normalization, reLU activation;and->The structure is the same as the initialization parameters.
5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1-4 when the computer program is executed.
6. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1-4.
CN202310014823.9A 2023-01-06 2023-01-06 Long-tail image recognition method based on class balance encoder Active CN115908949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310014823.9A CN115908949B (en) 2023-01-06 2023-01-06 Long-tail image recognition method based on class balance encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310014823.9A CN115908949B (en) 2023-01-06 2023-01-06 Long-tail image recognition method based on class balance encoder

Publications (2)

Publication Number Publication Date
CN115908949A CN115908949A (en) 2023-04-04
CN115908949B true CN115908949B (en) 2023-11-17

Family

ID=86484722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310014823.9A Active CN115908949B (en) 2023-01-06 2023-01-06 Long-tail image recognition method based on class balance encoder

Country Status (1)

Country Link
CN (1) CN115908949B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241564A (en) * 2021-12-17 2022-03-25 东南大学 Facial expression recognition method based on inter-class difference strengthening network
CN115205592A (en) * 2022-07-15 2022-10-18 东北大学 Multi-mode data based rebalance long-tail image data classification method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220067506A1 (en) * 2020-08-28 2022-03-03 Salesforce.Com, Inc. Systems and methods for partially supervised learning with momentum prototypes

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241564A (en) * 2021-12-17 2022-03-25 东南大学 Facial expression recognition method based on inter-class difference strengthening network
CN115205592A (en) * 2022-07-15 2022-10-18 东北大学 Multi-mode data based rebalance long-tail image data classification method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Balanced Contrastive Learning for Long-Tailed Visual Recognition;Jianggang Zhu等;2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR);第6908-6917页 *
Momentum Contrast for Unsupervised Visual Representation Learning;Kaiming He等;https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9157636;第9726-9735页 *

Also Published As

Publication number Publication date
CN115908949A (en) 2023-04-04

Similar Documents

Publication Publication Date Title
CN107729819B (en) Face labeling method based on sparse fully-convolutional neural network
CN109583501B (en) Method, device, equipment and medium for generating image classification and classification recognition model
CN109447906B (en) Picture synthesis method based on generation countermeasure network
CN109299716A (en) Training method, image partition method, device, equipment and the medium of neural network
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN111145116A (en) Sea surface rainy day image sample augmentation method based on generation of countermeasure network
CN106778852A (en) A kind of picture material recognition methods for correcting erroneous judgement
CN113033537A (en) Method, apparatus, device, medium and program product for training a model
CN114511576B (en) Image segmentation method and system of scale self-adaptive feature enhanced deep neural network
CN111986125A (en) Method for multi-target task instance segmentation
CN110929836B (en) Neural network training and image processing method and device, electronic equipment and medium
CN107480723B (en) Texture Recognition based on partial binary threshold learning network
CN111489364A (en) Medical image segmentation method based on lightweight full convolution neural network
CN110298898B (en) Method for changing color of automobile image body and algorithm structure thereof
CN112381030A (en) Satellite optical remote sensing image target detection method based on feature fusion
CN114581552A (en) Gray level image colorizing method based on generation countermeasure network
CN108647696A (en) Picture face value determines method and device, electronic equipment, storage medium
CN117253071B (en) Semi-supervised target detection method and system based on multistage pseudo tag enhancement
CN115908949B (en) Long-tail image recognition method based on class balance encoder
Jolly et al. Bringing monochrome to life: A GAN-based approach to colorizing black and white images
CN116561622A (en) Federal learning method for class unbalanced data distribution
CN107729992B (en) Deep learning method based on back propagation
CN116030302A (en) Long-tail image recognition method based on characterization data enhancement and loss rebalancing
CN107274357B (en) Gray level image enhancement processing system with optimal parameters
CN114913588A (en) Face image restoration and recognition method applied to complex scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Wei Xiucan

Inventor after: Shen Yang

Inventor after: Sun Xuhao

Inventor before: Shen Yang

Inventor before: Sun Xuhao

Inventor before: Wei Xiucan

GR01 Patent grant
GR01 Patent grant