CN113255832A

CN113255832A - Method for identifying long tail distribution of double-branch multi-center

Info

Publication number: CN113255832A
Application number: CN202110697276.XA
Authority: CN
Inventors: 徐行; 范峻植; 沈复民; 邵杰; 申恒涛
Original assignee: Chengdu Koala Youran Technology Co ltd
Current assignee: Chengdu Koala Youran Technology Co ltd
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2021-08-13
Anticipated expiration: 2041-06-23
Also published as: CN113255832B

Abstract

The invention belongs to the field of computer vision, and provides a double-branch multi-center long tail distribution identification method which is used for solving the problems caused by a long tail distribution data set. Inputting the picture into a default branch and a resampling branch for data enhancement, and then inputting the picture into a deep convolutional neural network to obtain low-dimensional feature representation; obtaining the probability of belonging to each category through a full connection layer, multiplying the probability by a matrix representing multiple centers to obtain a characteristic matrix, and taking the maximum value to obtain the probability of finally belonging to each category; respectively calculating losses; adding to obtain final loss, performing back propagation on the network according to the loss and updating the weight; continuously iterating; when an identification task needs to be carried out, the picture is input into a resampling branch, and the probability that the picture belongs to each category is obtained. The influence caused by data distribution change caused by resampling can be reduced through the double-branch multi-center, the influence caused by long tail distribution can be further processed, a better identification and classification effect is brought, and the model has better generalization capability.

Description

Method for identifying long tail distribution of double-branch multi-center

Technical Field

The invention relates to the field of computer vision, in particular to a method for identifying long tail distribution of double-branch multi-center.

Background

With the rapid development of deep convolutional neural networks, the effect of image classification has been impressive. This effort is indistinguishable from the increasingly rich data set. In academia, most data sets have almost uniform distribution of the number of class labels, but data in the real world are not uniform and even have long tail distribution, i.e. a few classes occupy most of the number of pictures, the part of classes are called head classes, while the rest of classes only occupy a few pictures, the part of classes are called tail classes, and particularly, fig. 1 can be seen.

The more popular existing methods of dealing with long tail distributions include resampling and reweighting. The resampling essence is to inversely weight the sampling frequencies of different classes of pictures according to the number of samples. If the number of pictures belonging to the current class is larger, the sampling probability given to the pictures of the current class is lower, and in the opposite case, the corresponding sampling probability is higher. The weight is mainly reflected in the loss of classification, i.e. the loss of the head class is given a lower weight, and the loss of the tail class is given a higher weight.

Both of the above methods, while providing better predictive results, have the undesirable effect of compromising, to some extent, the characterization ability of the depth feature. Some existing methods have some defects, and specific defects are as follows:

1. when no measure is taken for the long tail distribution problem, the long tail distribution shows a good classification effect on the head classes and a poor classification effect on the tail classes. And the larger the maximum ratio difference of the number of the pictures of the head class and the tail class is, the poorer the classification and identification effect of the model on the tail class is.

2. When the resampling strategy is used for the long tail distribution data set, the sampling probability of the head class is reduced, and the sampling probability of the tail class is increased. This alleviates the problem of long tail distribution, but creates another problem. The sampling probability of the tail type picture is high, the distribution of the characteristic space data is changed, the identification and classification effects of the model are influenced, and particularly, the graph 4 can be viewed.

Disclosure of Invention

The invention aims to provide a method for identifying long tail distribution of a double-branch multi-center, which can solve the problems caused by a long tail distribution data set through a double-branch architecture and multi-center design.

The invention solves the technical problem, and adopts the technical scheme that:

the method for identifying the long tail distribution of the double-branch multi-center is characterized by comprising the following steps of:

step 1, initializing two samplers, wherein one sampler adopts default sampling to obtain an image input default branch, and the other sampler adopts a resampling strategy to sample to obtain an image input resampling branch;

step 2, respectively enhancing the data of the pictures obtained by the two samplers;

step 3, performing data enhancement on the pictures output by the default branch and the resampling branch, then respectively inputting the pictures into the corresponding deep convolutional neural networks, extracting high-dimensional feature representations of the two pictures, and then respectively performing global average pooling on the high-dimensional feature representations of the two pictures in the convolutional layers of the corresponding deep convolutional neural networks to obtain respective low-dimensional feature representations, wherein the convolutional layers of the two deep convolutional neural networks are shared except the parameter of the last residual block;

step 4, obtaining the probability of belonging to each category through a full connection layer by the low-dimensional feature representation obtained by the default branch through the convolution layer, multiplying the low-dimensional feature representation obtained by the resampling branch through the convolution layer by a matrix representing multiple centers to obtain a feature matrix, and then taking the maximum row value of the feature matrix to obtain the final probability of belonging to each category;

step 5, calculating the loss of the probability which belongs to each category and is obtained by the default branch and the resampling branch through a loss function respectively;

step 6, multiplying the loss of the default branch by a weight ⍺ to obtain the loss of the default branch, multiplying the loss of the resampling branch by a weight 1- ⍺ to obtain the loss of the resampling branch, wherein ⍺ is a variable from 1 to 0, adding the losses obtained by the two branches to obtain the final loss, and then performing back propagation on the deep convolutional neural network according to the loss and updating the weight;

step 7, continuously iterating the deep convolutional neural network until the deep convolutional neural network is converged, so that the recognition accuracy of the deep convolutional neural network reaches over 90 percent;

and 8, when an identification task needs to be carried out, inputting the picture into a resampling branch, and inputting the picture into a deep convolutional neural network after data enhancement to obtain the probability that the picture belongs to each category.

Further, in step 1, initializing two samplers, one of which adopts default sampling to obtain a picture input default branch, and the other of which uses a resampling strategy to sample, and obtaining a picture input resampling branch specifically means: the method comprises the steps that a default sampler samples according to the same probability of each picture, then the pictures are input into a default branch, a resampling sampler counts a training data set through a resampling strategy, the number of the pictures corresponding to each category is calculated, the sampling probability of the pictures of each category is the same through the resampling strategy, and then the pictures are input into a resampling branch.

Further, in step 2, the data enhancement is performed on the pictures obtained by the two samplers, including performing left-right flipping and/or random cropping and/or random padding policy operations on the pictures.

Further, in step 3, the dimension of the high-dimensional feature representation is 4096, and the dimension of the low-dimensional feature representation is 64.

Further, in step 4, the dimension of the obtained feature belonging to the probability of each class is the number of classes in the training data set, and the dimension of the matrix representing the multiple centers is the number of high-dimensional features, the number of classes in the training set, and the number of centers.

Further, step 5 specifically comprises: and calculating losses by using the probabilities belonging to the categories and obtained by the two branches and the corresponding labels of the pictures, and obtaining the respective losses of the two branches through a loss function.

Further, in step 5, the probabilities belonging to each category and the labels of the corresponding pictures obtained by the two branches are respectively subjected to a Cross entry loss function, so as to obtain the loss of each branch, wherein the Cross entry loss function has the formula:

；

wherein,

a formula representing the Cross control loss function,

the total number of categories is represented as,

indicates the probability that the current picture belongs to the ith class,

the representation model predicts the probability that the current picture belongs to the ith class.

The method has the advantages that by the method for identifying the long tail distribution of the double-branch multi-center, firstly, a double-branch framework is used, not only a default sampling sampler is used, but also a resampling sampler is used, and therefore the influence caused by the long tail distribution can be preliminarily relieved. In the invention, the double branches have the advantages that the integral distribution of the learning distribution of the default branches is realized, and the resampling branch is used for finely adjusting the learning distribution of the default branches; secondly, a multi-center classifier architecture is used in a resampling branch, so that the influence of data distribution change caused by resampling can be reduced. Experiments prove that due to the double-branch multi-center framework, the influence caused by long tail distribution can be further processed on the basis of resampling, a better identification and classification effect is achieved, and the model has better generalization capability.

Drawings

FIG. 1 is a schematic diagram illustrating a long tail distribution and a head class and a tail class;

FIG. 2 is a flowchart illustrating a method for identifying a long tail distribution of a dual-branch multi-center according to an embodiment of the present invention;

FIG. 3 is a diagram of a dual-branch multi-center embodiment of the present invention;

FIG. 4 is a schematic illustration of the effect on feature spatial distribution prior to resampling;

FIG. 5 is a schematic illustration of the effect on the spatial distribution of features after resampling.

Detailed Description

The technical solution of the present invention is described in detail below with reference to the accompanying drawings and embodiments.

The invention provides a method for identifying the distribution of long tails of double-branch multi-center, which comprises the following steps:

step 3, respectively inputting the pictures of the two branches into respective corresponding networks, extracting high-dimensional feature representations of the two pictures, and respectively performing global average pooling on the two branches to obtain respective low-dimensional feature representations, wherein the convolution layers of the two networks are shared except the parameter of the last residual block;

step 5, respectively calculating the loss of the vectors which are obtained by the two branches and belong to each classification category through a loss function;

step 6, multiplying the two losses by a specific weight respectively, namely the weight of the default branch is from 1 to 0, the weight of the resampling branch is from 0 to 1, so as to obtain the final loss, and then performing back propagation on the network according to the loss and updating;

step 7, continuously iterating until the network is converged;

and 8, when an identification task needs to be carried out, inputting the picture into a resampling branch to obtain the probability that the picture belongs to each category.

The invention shows long tail distribution and head class and tail class interpretation, specifically as shown in fig. 1, and also shows that resampling has an influence on a feature space, specifically as shown in fig. 4-5, which includes two parts, a feature space before resampling and a feature space after resampling. Before resampling, the classification surface of the feature space can well distinguish the head class, but cannot well distinguish the tail class. The head class has a good classification effect because it has a sufficient number of samples and has a rich distribution of features. The tail class cannot completely represent the feature distribution of the tail class due to the small number of pictures, so that the classification effect is poor, and the situation of misclassification is easily caused. After resampling, the head class and the tail class have enough pictures to enable the model to learn feature distribution, so that the model has a good classification effect on the head class and the tail class. However, resampling can change the original feature space distribution, so that the head class and the tail class are more dispersed in the respective feature spaces.

Examples

The present embodiment provides a method for identifying a long tail distribution of a dual-branch multi-center, a flowchart of which is shown in fig. 2, wherein the method of the present embodiment includes the following steps:

s1, initializing two samplers, wherein one sampler adopts default sampling, the picture obtained by the sampling is input into a default branch, the other sampler adopts a resampling strategy for sampling, and the picture obtained by the sampling is input into a resampling branch.

Here, the default sampler has the same profile for each pictureThe rate is sampled. A resampling sampling strategy, wherein before the sampling probability of each picture is calculated, the training data set needs to be counted first, and the number of pictures corresponding to each category is calculated, wherein the number of pictures owned by the ith category is

Remember that the number of pictures having the most number is

The total sum of all the pictures of the category is N, and the sampling probability of the pictures of each category is N through a resampling strategy

By this strategy, the sampling probability of each class is made the same

And S2, performing data enhancement on the two samples obtained in the first step.

And performing data enhancement on the sampled picture, wherein when the data enhancement is performed on the picture, the data enhancement comprises performing left-right turning and/or random cutting strategy operation on the picture.

And S3, respectively inputting the pictures of the two branches into respective corresponding networks, and sharing the parameters of the convolution layers (except the last residual block) of the two branches. And respectively carrying out global average pooling on the obtained high-dimensional features to obtain low-dimensional feature representation.

In this embodiment, the respective pictures of the two branches are passed through the ResNet-32 convolutional layer to obtain a 4096-dimensional high-dimensional feature representation, and the parameters of the convolutional layers of the two branches are shared except for the last residual block. Wherein, the ResNet-32 convolutional layer is shown in fig. 3, and in fig. 3, the high-dimensional features of the two branches are respectively subjected to global average pooling to obtain respective low-dimensional feature representations, and the dimension of the low-dimensional feature is 64.

And S4, obtaining the probability of each category through a full connection layer by the low-dimensional features obtained by the default branches through the convolution layer, multiplying the low-dimensional features obtained by the resampling branches through the convolution layer by a multi-center matrix to obtain a multi-center low-dimensional matrix, and then taking the maximum value of the low-dimensional features to obtain the final probability belonging to each category.

In the default branch, the obtained 64-dimensional features are subjected to a full-connection layer to obtain a vector of the probability belonging to each category. In the resampling branch, the obtained 64-bit features are multiplied by a matrix with dimension (64, category number, 10) to obtain a matrix with dimension (category number, 10), and then the maximum row value of the matrix is obtained to obtain the feature vector of the probability that the branch belongs to each category.

And S5, calculating the loss of the features which are obtained by the two branches and belong to each classification category respectively.

And respectively passing the eigenvectors obtained by the two branches through a Cross Engine loss function to obtain the loss of each branch.

The Cross control loss function is formulated as:

；

wherein,

a formula representing the Cross control loss function,

the total number of categories is represented as,

indicates the probability that the current picture belongs to the ith class,

S6, multiplying the two losses by a specific weight respectively, wherein the weight of the default branch is from 1 to 0, and the weight of the resampling branch is from 0 to 1.

The default branch penalty is multiplied by a weight ⍺ to obtain the default branch penalty. The loss of the resample branch is multiplied by the weights 1- ⍺ to obtain the loss of the resample branch. The two losses are then added to obtain the final loss.

Wherein, ⍺ has the calculation formula:

wherein E represents the current iteration round number,

representing the expected maximum number of iteration rounds.

And S7, continuously iterating until the model converges well enough and has better generalization.

And S8, when an identification task is required, the picture does not need to be subjected to data enhancement, and the picture is input into a resampling branch to obtain the probability that the picture belongs to each category.

In this embodiment, the accuracy of the trained Resnet-32 model on the CIFAR-10 data set is as follows:

the long tail table in the table indicates the maximum value of the ratio of the number of pictures of the head class to the number of pictures of the tail class. As can be seen from the table, the embodiment shows the stable improvement of the dual-branch multi-center architecture on the long-tail data task, which indicates that the method can improve the recognition effect of the model and has better generalization capability.

Claims

1. The method for identifying the long tail distribution of the double-branch multi-center is characterized by comprising the following steps of:

2. The method for identifying the long tail distribution of the double-branch multi-center according to claim 1, wherein in step 1, two samplers are initialized, one sampler adopts default sampling, the obtained picture is input into the default branch, and the other sampler adopts a resampling strategy to perform sampling, and the obtained picture is input into the resampling branch, specifically: the method comprises the steps that a default sampler samples according to the same probability of each picture, then the pictures are input into a default branch, a resampling sampler counts a training data set through a resampling strategy, the number of the pictures corresponding to each category is calculated, the sampling probability of the pictures of each category is the same through the resampling strategy, and then the pictures are input into a resampling branch.

3. The method for identifying the long tail distribution of the double-branch multi-center according to claim 1, wherein in step 2, the data enhancement is performed on the pictures obtained by the two samplers respectively, and the data enhancement includes left-right flipping and/or random cropping and/or random padding strategy operation on the pictures.

4. The method for identifying the distribution of the double-branch multi-center long tail according to claim 1, wherein in step 3, the dimension of the high-dimensional feature representation is 4096, and the dimension of the low-dimensional feature representation is 64.

5. The method for identifying the long tail distribution of the double-branch multi-center according to claim 1, wherein the dimension of the obtained feature of the probability belonging to each class in the step 4 is the number of classes in the training data set, and the dimension of the matrix representing the multi-center is the high-dimensional feature number, the training set class number and the center number.

6. The method for identifying the long tail distribution of the double-branch multi-center according to claim 1, wherein the step 5 is specifically as follows: and calculating losses by using the probabilities belonging to the categories and obtained by the two branches and the corresponding labels of the pictures, and obtaining the respective losses of the two branches through a loss function.

7. The method for identifying the long tail distribution of the double-branch multi-center according to claim 6, wherein in step 5, the probabilities belonging to each category and the labels of the corresponding pictures obtained from the two branches are respectively subjected to a Cross entry loss function to obtain the loss of each branch, and the formula of the Cross entry loss function is as follows:

；

wherein,

a formula representing the Cross control loss function,

the total number of categories is represented as,

indicates the probability that the current picture belongs to the ith class,