CN115908949B - Long-tail image recognition method based on class balance encoder - Google Patents
Long-tail image recognition method based on class balance encoder Download PDFInfo
- Publication number
- CN115908949B CN115908949B CN202310014823.9A CN202310014823A CN115908949B CN 115908949 B CN115908949 B CN 115908949B CN 202310014823 A CN202310014823 A CN 202310014823A CN 115908949 B CN115908949 B CN 115908949B
- Authority
- CN
- China
- Prior art keywords
- encoder
- class
- parameters
- momentum
- steps
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000010606 normalization Methods 0.000 claims abstract description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 238000011478 gradient descent method Methods 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 230000002708 enhancing effect Effects 0.000 claims description 2
- 238000009499 grossing Methods 0.000 claims 1
- 238000005457 optimization Methods 0.000 abstract description 6
- 238000013507 mapping Methods 0.000 abstract description 2
- 238000003062 neural network model Methods 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a long-tail image identification method based on a class balance encoder, which comprises the following steps: two different data enhancement methods are used for an input picture, and two groups of samples are obtained; respectively inputting the two groups of samples into an encoder, a class balance encoder and a momentum encoder to obtain three groups of characteristic representations; classifying the characteristic representations output by the encoder and the class balance encoder into a classifier, and calculating weighted cross entropy loss according to the image real labels; respectively carrying out nonlinear mapping and two-norm normalization on the three groups of characteristic representations; calculating cosine similarity loss; random gradient descent optimization is performed on the encoder, the class balance encoder and the classifier, and momentum optimization is performed on the momentum encoder. In a natural species identification task, the invention utilizes the class balance encoder to enhance the feature learning of the rare species sample, and uses the self-supervision training to learn more comprehensive feature representation, thereby improving the species image identification accuracy in a natural scene.
Description
Technical Field
The invention belongs to the field of class unbalanced image recognition, and particularly relates to a long-tail image recognition method based on a class balanced encoder in a natural scene.
Background
In the related research of image classification, unbalanced data, especially data with sample size distribution in a long tail shape, is one of the popular directions of current research, and related research contents and achievements also meet the requirements of practical application. Unbalanced image data is that the number of images occupied by different classes in a dataset is unequal, with very few classes (head class) occupying most of the samples in the dataset and most classes (tail class) occupying very little image data in a long tail data distribution.
The neural network model performs poorly on the tail categories after training on the unbalanced dataset, mainly because of the small number of pictures in the tail categories. In the neural network model training process, most training data are occupied by head types, and tail type pictures utilized in model training are too few compared with the head types, so that the classification performance of the model on the tail type data is poor. For the phenomenon that the neural network model performs poorly in the tail class, one conventional solution is a class-weight balancing strategy, i.e., increasing the contribution of tail class samples to model parameter optimization during training, such as increasing the sampling frequency of tail class samples or assigning a larger weight to the training samples of the tail class in a loss function.
The class weight balancing technology has good classifying effect on unbalanced data, but the tail class only contains a small amount of information, and increasing the proportion of the tail class in training can lead to the fact that the model cannot fully utilize head class data containing more information, and the characterization learning of the model is destroyed.
Disclosure of Invention
The invention aims to provide a long-tail image recognition method based on a class balance encoder in a natural scene.
The technical scheme for realizing the purpose of the invention is as follows: in a first aspect, the present invention provides a long-tail image data training method based on a class-balanced encoder, comprising the steps of:
step 1, obtaining species image data samples in a natural scene, respectively enhancing an input image by two times to obtain two groups of samples, and inputting the first group of samples into an encoder and a class balance encoder to obtain characteristic representationAnd->The second set of samples is input with the momentum encoder acquisition characteristic representation +.>;
The data enhancement includes autoaugmentor, random horizontal flipping, random changing of image brightness, contrast, saturation, gray scale, and random gaussian blur. AutoAutoAutoAutoAutoAutosegments randomly select one at a time from histogram equalization, inversion, tilting, rotation, sharpening, brightness adjustment, and color adjustment to enhance the image. The encoder and the momentum encoder are convolutional neural networks with the same structure and the same initial parameters.
Step 2, representing the characteristicsAnd->Input classifier->、/>Respectively calculating weighted cross entropy loss;
step 3, respectively representing the characteristicsAnd s' input different nonlinear mapper +.>And->And performing two-norm normalization to obtain new characteristic representation ++>And->Calculate->And->Cosine betweenSimilarity loss and predict +.>Prediction confidence->Storing the data into a characteristic buffer area;
step 4, calculatingAnd->Between (I) and (II)>And->Cosine similarity loss between the two;
step 5, updating coder, class balance coder, classifier and nonlinear mapper by using random gradient descent methodIs used to update the parameters of the momentum encoder and the non-linear mapper simultaneously>Is used for completing training and saving model parameters.
In a second aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when the program is executed.
In a third aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of the first aspect.
In a fourth aspect, the invention provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
Compared with the prior art, the invention has the remarkable advantages that: (1) The method has the advantages that the characterization learning of all the classes is considered through a double-branch structure formed by the encoders of the head part class and the class balance encoder of the tail part class; (2) The self-supervision learning based on cosine similarity loss is used for exploring more comprehensive image characteristics, so that the model has stronger generalization.
Drawings
FIG. 1 is a flow chart of a long tail image recognition method based on a class balance encoder.
Detailed Description
The invention discloses a long-tail image identification method based on a class balance encoder, which comprises the following steps: two different data enhancement methods are used for an input picture, and two groups of samples are obtained; respectively inputting the two groups of samples into an encoder, a class balance encoder and a momentum encoder to obtain three groups of characteristic representations; classifying the characteristic representations output by the encoder and the class balance encoder into a classifier, calculating weighted cross entropy loss according to the image real label, and calculating the weight according to the proportion of the class of the image in the data set; respectively carrying out nonlinear mapping and two-norm normalization on the three groups of characteristic representations; calculating cosine similarity losses between the first and third sets, the second and third sets of feature representations, respectively; and performing random gradient descent optimization on the encoder, the class balance encoder and the classifier according to the loss, and performing momentum optimization on the momentum encoder. In a natural species identification task, the invention utilizes the class balance encoder to enhance the feature learning of the rare species sample, and uses the self-supervision training to learn more comprehensive feature representation, thereby improving the species image identification accuracy in a natural scene.
The technical scheme of the invention is described in detail below with reference to the accompanying drawings.
Referring to fig. 1, a long tail image recognition method based on a class balance encoder specifically includes the following steps:
step 1, acquiring a species image dataset in a natural scene by using terminal equipment;
step 2, the input image is respectively subjected to two times of data enhancement to obtain two groups of samples, and the first group of samples are input into an encoder and a class balance encoder to obtain characteristic representationAnd->The second set of samples is input with the momentum encoder acquisition characteristic representation +.>Encoders, quasi-smooth encoders and momentum encoders use a Resnet network structure;
the data enhancement used included AutoAutoAutoAutoAutomation, random horizontal flipping, random changing image brightness, contrast, saturation, gray scale, and random Gaussian blur. AutoAutoAutoAutoAutoAutosegments randomly select one at a time from histogram equalization, inversion, tilting, rotation, sharpening, brightness adjustment, and color adjustment to enhance the image. The encoder and the momentum encoder are convolutional neural networks with the same structure and the same initial parameters.
Step 3, representing the characteristicsAnd->Input classifier->、/>Respectively calculating weighted cross entropy loss;
wherein the weight isWherein->For inputting picture category->Class +.>And (3) calculating the cross entropy loss after adding the weight to the classifier output.
Step 4, representing the characteristicsAnd->Input non-linear mapper->And performing two-norm normalization to obtain new characteristic representation ++>And->Express the characteristic +.>Input non-linear mapper->And performing two-norm normalization to obtain new characteristic representation ++>;
The nonlinear mapper is composed of linear classifier, batch normalization and ReLU activation structureAnd (3) forming the finished product.And->The structure is the same as the initialization parameters.
Step 5, calculatingAnd->Between (I) and (II)>And->Cosine similarity loss between the two;
and->The cosine similarity loss between them is
Wherein the method comprises the steps ofTo control the superparameter of the loss weights.
And->Cosine phase betweenThe similarity loss is
Wherein the method comprises the steps ofTo control the superparameter of the penalty weight, +.>For class balance weight, the class balance encoder is controlled to pay more attention to tail class samples than the encoder, and the calculation mode is that
Wherein the method comprises the steps ofRepresenting the average value of various sample sizes in training set, +.>Representing the category to which the input image belongs->Sample size of->And (5) controlling the super-parameters of the attention degree of the class balance encoder to the tail class.
Step 6, updating the coder, the class balance coder, the classifier and the nonlinear mapper by using a random gradient descent methodIs used to update the parameters of the momentum encoder and the non-linear mapper simultaneously>Finishing training and saving parameters of the encoder, the class balance encoder and the classifier;
the momentum update method is that
Wherein the method comprises the steps ofIs a parameter of the momentum encoder, < >>For encoder parameters +.>Balance encoder parameters for class->To control the super-parameters of the momentum update rate.
And 7, loading model parameters to identify natural species images.
As other embodiments, the structures of the encoder, the quasi-balanced encoder and the momentum encoder can be designed according to actual requirements.
As other embodiments, the random gradient descent method may be replaced with other parameter optimization methods.
As other embodiments, the natural species image dataset may be replaced with long tail image data of other fields according to practical application requirements.
Claims (6)
1. The long-tail image recognition method based on the class balance encoder is characterized by comprising the following steps of:
step 1, obtaining species image data samples in a natural scene, respectively enhancing an input image by two times to obtain two groups of samples, and inputting the first group of samples into an encoder and a class balance encoderObtaining a feature representationAnd->The second set of samples is input with the momentum encoder acquisition characteristic representation +.>;
Step 2, representing the characteristicsAnd->Input classifier->、/>Respectively calculating weighted cross entropy loss;
step 3, representing the characteristicsAnd->Input non-linear mapper->And performing two-norm normalization to obtain new characteristic representationAnd->Express the characteristic +.>Input non-linear mapper->And performing two-norm normalization to obtain new characteristic representation ++>;
Step 4, calculatingAnd->Between (I) and (II)>And->Cosine similarity loss between the two;
and->The cosine similarity loss between them is
Wherein the method comprises the steps ofSuper parameters for controlling loss weights;
and->The cosine similarity loss between them is
Wherein the method comprises the steps ofTo control the superparameter of the penalty weight, +.>The method is to calculate the class balance weight by
Wherein the method comprises the steps ofRepresenting the average value of various sample sizes in training set, +.>Representing the category to which the input image belongs->Sample size of->Super-parameters for controlling the attention degree of the class balance encoder to the tail class;
step 5, updating coder, class balance coder, classifier and nonlinear mapper by using random gradient descent methodIs used to update the parameters of the momentum encoder and the non-linear mapper simultaneously>Parameters of (2)Finishing training and saving parameters of an encoder, a class smoothing encoder and a classifier;
the momentum update method is that
Wherein the method comprises the steps ofIs a parameter of the momentum encoder, < >>For encoder parameters +.>Balance encoder parameters for class->Super parameters for controlling momentum update rate;
and 6, loading model parameters to identify natural species images.
2. The method of claim 1, wherein the data enhancement used in step 1 comprises auto-segment, random horizontal flipping, random changing of image brightness, contrast, saturation, gray scale, and random gaussian blur; the AutoAutoAutoAutoAutoAutoAutosegments randomly select one from histogram equalization, inversion, tilting, rotation, sharpening, brightness adjustment and color adjustment each time to enhance the image; the encoder and the momentum encoder are convolutional neural networks with the same structure and the same initial parameters.
3. The long-tail image recognition method based on the class-balanced encoder according to claim 1, wherein the weight of the step 2 cross entropy loss isWhich is provided withMiddle->For inputting picture category->Class +.>And (3) calculating the cross entropy loss after adding the weight to the classifier output.
4. The long-tail image recognition method based on class-balanced encoders according to claim 1, characterized in that in step 3, the nonlinear mapper consists of linear classifiers, batch normalization, reLU activation;and->The structure is the same as the initialization parameters.
5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1-4 when the computer program is executed.
6. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310014823.9A CN115908949B (en) | 2023-01-06 | 2023-01-06 | Long-tail image recognition method based on class balance encoder |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310014823.9A CN115908949B (en) | 2023-01-06 | 2023-01-06 | Long-tail image recognition method based on class balance encoder |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115908949A CN115908949A (en) | 2023-04-04 |
CN115908949B true CN115908949B (en) | 2023-11-17 |
Family
ID=86484722
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310014823.9A Active CN115908949B (en) | 2023-01-06 | 2023-01-06 | Long-tail image recognition method based on class balance encoder |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115908949B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114241564A (en) * | 2021-12-17 | 2022-03-25 | 东南大学 | Facial expression recognition method based on inter-class difference strengthening network |
CN115205592A (en) * | 2022-07-15 | 2022-10-18 | 东北大学 | Multi-mode data based rebalance long-tail image data classification method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220067506A1 (en) * | 2020-08-28 | 2022-03-03 | Salesforce.Com, Inc. | Systems and methods for partially supervised learning with momentum prototypes |
-
2023
- 2023-01-06 CN CN202310014823.9A patent/CN115908949B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114241564A (en) * | 2021-12-17 | 2022-03-25 | 东南大学 | Facial expression recognition method based on inter-class difference strengthening network |
CN115205592A (en) * | 2022-07-15 | 2022-10-18 | 东北大学 | Multi-mode data based rebalance long-tail image data classification method |
Non-Patent Citations (2)
Title |
---|
Balanced Contrastive Learning for Long-Tailed Visual Recognition;Jianggang Zhu等;2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR);第6908-6917页 * |
Momentum Contrast for Unsupervised Visual Representation Learning;Kaiming He等;https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9157636;第9726-9735页 * |
Also Published As
Publication number | Publication date |
---|---|
CN115908949A (en) | 2023-04-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107729819B (en) | Face labeling method based on sparse fully-convolutional neural network | |
CN109583501B (en) | Method, device, equipment and medium for generating image classification and classification recognition model | |
CN109447906B (en) | Picture synthesis method based on generation countermeasure network | |
CN109299716A (en) | Training method, image partition method, device, equipment and the medium of neural network | |
CN107392919B (en) | Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method | |
CN111145116A (en) | Sea surface rainy day image sample augmentation method based on generation of countermeasure network | |
CN106778852A (en) | A kind of picture material recognition methods for correcting erroneous judgement | |
CN113033537A (en) | Method, apparatus, device, medium and program product for training a model | |
CN114511576B (en) | Image segmentation method and system of scale self-adaptive feature enhanced deep neural network | |
CN111986125A (en) | Method for multi-target task instance segmentation | |
CN110929836B (en) | Neural network training and image processing method and device, electronic equipment and medium | |
CN107480723B (en) | Texture Recognition based on partial binary threshold learning network | |
CN111489364A (en) | Medical image segmentation method based on lightweight full convolution neural network | |
CN110298898B (en) | Method for changing color of automobile image body and algorithm structure thereof | |
CN112381030A (en) | Satellite optical remote sensing image target detection method based on feature fusion | |
CN114581552A (en) | Gray level image colorizing method based on generation countermeasure network | |
CN108647696A (en) | Picture face value determines method and device, electronic equipment, storage medium | |
CN117253071B (en) | Semi-supervised target detection method and system based on multistage pseudo tag enhancement | |
CN115908949B (en) | Long-tail image recognition method based on class balance encoder | |
Jolly et al. | Bringing monochrome to life: A GAN-based approach to colorizing black and white images | |
CN116561622A (en) | Federal learning method for class unbalanced data distribution | |
CN107729992B (en) | Deep learning method based on back propagation | |
CN116030302A (en) | Long-tail image recognition method based on characterization data enhancement and loss rebalancing | |
CN107274357B (en) | Gray level image enhancement processing system with optimal parameters | |
CN114913588A (en) | Face image restoration and recognition method applied to complex scene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Wei Xiucan Inventor after: Shen Yang Inventor after: Sun Xuhao Inventor before: Shen Yang Inventor before: Sun Xuhao Inventor before: Wei Xiucan |
|
GR01 | Patent grant | ||
GR01 | Patent grant |