CN115294381B

CN115294381B - Small sample image classification method and device based on feature migration and orthogonal prior

Info

Publication number: CN115294381B
Application number: CN202210487137.9A
Authority: CN
Inventors: 李晓旭; 张志敏; 刘俊; 汤卓和; 刘忠源; 张文斌; 曾俊瑀; 马占宇; 陶剑; 董洪飞
Original assignee: Lanzhou University of Technology
Current assignee: Lanzhou University of Technology
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2023-06-30
Anticipated expiration: 2042-05-06
Also published as: CN115294381A

Abstract

The invention discloses a small sample image classification method and device based on feature migration and orthogonal priori, and a small sample classification framework for researching high-identification feature extraction based on a small sample image classification research based on depth measurement. By introducing feature migration and small sample image feature learning of orthogonal priori, the orthogonal feature subspace is learned by constructing an orthogonalization feature adaptation network on the assumption that the new class and the base class share a feature extraction mode and the features of different classes of new class data are orthogonal, so that the features of different classes are orthogonal to each other, and the identification degree of the features is improved. The invention has very important significance for theoretical research of small sample learning and promotion of wide application of machine identification technology. Meanwhile, the method plays a role in adding bricks and tiles for the advanced technology of breaking through the theoretical bottleneck of small sample learning and mastering artificial intelligence in China.

Description

Small sample image classification method and device based on feature migration and orthogonal prior

Technical Field

The invention relates to the technical field of image classification, in particular to a small sample image classification method and device based on feature migration and orthogonal prior.

Background

In recent years, with the development of deep learning, recognition performance of machines has exceeded that of humans on many large sample image classification tasks. However, when the sample size is small, the recognition level of the machine still has a large gap from that of human beings. Thus, image classification of a small number of training samples, particularly small sample image classification (Few-shot Image Classification) with only one or a few marker samples per class, has received considerable attention from researchers in the last two years.

The small sample classification (Few-shot Classification) belongs to the category of small sample Learning (Few-shot Learning), and often includes two types of data, namely base type data and new type data, with disjoint class spaces. Small sample classification aims at learning classification rules by using knowledge learned by base class data and a small number of marked samples (supporting samples) of new class data, and accurately predicts the class of unmarked samples (query samples) in new class tasks, and the framework of the small sample classification is shown in fig. 1.

Small sample image classification is a research problem that is currently in need of solution in the field of computer vision and artificial intelligence. The existing and successful large-sample image classification method is severely dependent on the number of samples, and the sample size of things in the real world is subject to long tail distribution, i.e. the sample size of a large number of things is severely insufficient, for example, in the fields of military, medical treatment, industry, astronomy and the like, the sample collection needs to consume a large amount of manpower, material resources, time and economic cost, and large-scale image samples are difficult to collect. Therefore, research on small sample image classification is of great value to the wide application of image classification technology.

In the prior art, a classification method based on depth measurement mainly judges a class by comparing a distance between a sample or a sample and a class prototype. Often combine data enhancement, techniques such as migration study to make up for the defect that data volume is insufficient and the model is easy to fit excessively, obtain better classification performance on many small sample classification tasks. However, compared with the large sample image classification, the performance of the existing small sample image classification is still not satisfactory, the practicability of the small sample image classification technology is limited to a great extent, and a plurality of problems are still faced to be solved: and (5) high recognition feature learning. For large sample image classification, existing deep learning techniques can learn high-resolution image features by increasing model elasticity and sample size. However, existing deep learning techniques are not applicable to small sample classification tasks with few labeled samples. Thus, how to learn a high-recognition feature representation based on the base class data and the new class data with few labeled samples is a worth exploring problem.

Disclosure of Invention

Aiming at the technical problem of high-recognition feature learning in small sample image classification, the invention provides a small sample image classification method and device based on feature migration and orthogonal prior.

In order to achieve the above object, the present invention provides the following technical solutions:

the invention firstly provides a small sample image classification method based on feature migration and orthogonal prior, which comprises the following steps:

s1, preparing data, and pre-training an image to obtain an embedded module f _θ Extracting features of an image, wherein the image comprises a training set and a testing set;

s2, introducing an orthogonal priori thought into the convolutional neural network model, and constructing a feature learning network model based on feature migration and the orthogonal priori;

s3, learning a network model objective function based on training and optimizing the characteristics of the orthogonal priori;

and S4, classifying the test set images by utilizing the optimized image orthogonal priori feature learning network model.

Further, step S1 includes:

s11, data is processed

Is divided into->

And->

Two parts, and the two parts are mutually exclusive in class space, D is calculated _train D as a model for training base class data _test Testing the model as new class data;

s12, for the C-way K-shot classification task, from D _train Randomly selecting C classes, randomly selecting M samples from each class, wherein K samples are used as support samples S _i The remaining M-K samples are used as query samples Q _i ，S _i And Q _i Form a task T _i Also for D _test With tasks

S13, first stage of training: pre-training an embedded module f using base class data _θ ，f _θ Comprising 4 convolution blocks, each convolution block comprising a convolution layer, a pooling layer, a nonlinear activation function; the convolution kernel window size used by each convolution block is 3×3, a batch normalization, three channels of RGB, a pooling layer, a 2×2 maximum pooling layer is adopted, the maximum pooling layer of the last two blocks is cut, and a nonlinear activation layer is adopted as an activation function of ReLu.

Further, in step S2, in the feature learning network model based on feature migration and orthogonal prior, the orthogonalization feature adaptation network is composed of three parts: embedded module f _θ Orthogonal adaptation module

And a metrology module; orthogonal adaptation module->

Is composed of two layers of convolution layers, the convolution kernel size is 5×5, and is used for transforming the characteristics of the new sample and learning the orthogonalization characteristic subspace.

Further, step S3 includes:

s31, training the new class number in the second stageAccording to a classification task, all supporting samples are input into an embedded module f with fixed parameters _θ In the process, corresponding support sample characteristics f are obtained _θ (S _ck )；

S32, performing feature transformation by using an orthogonal adaptation module to obtain

S33, mask M corresponding to each class is used for the transformed features _c Multiplying to enable the features between different classes to be orthogonal pairwise;

s34, calculating cosine distance C (P) between similar features by using the measurement module _ci ,P _cj )(i∈[0,K),i≠j)；

S35, optimizing orthogonal adaptation module by mean square error loss function

Further, the calculation formula in step S33 is:

wherein S is _ck For the kth support sample of class c,

representing multiplication of corresponding elements of the same order matrix, M _cijh For class c mask M _c The value of the ith channel of the ith row, the jth column, M _c The elements of (2) are as follows:

wherein C is the total category number under the current task, H is the characteristic channel number, and H is the integer multiple of C; in the above formula, when h is within a given range (the range of h starts from 0), M _cijh Equal to 1 and the remaining positions have a value of 0.

Further, the calculation formula in step S34 is as follows:

wherein C (P) _ci ,P _cj ) To calculate the cosine distance between the classes, K is the number of support samples, c is the c-th class, P _ci Representing the ith support sample feature in class c, P _cj Representing the j-th support sample feature in class c,

multiplication of corresponding elements representing the matrix, |P _ci The expression matrix P _ci Is a binary norm of (c).

Further, the mean square error loss function calculation formula in step S35 is as follows:

wherein N is the total category number under the current task, C (P _ci ，P _cj ) To calculate cosine distances between the classes, where MSE [ cos (P _ci ，P _cj )，1]＝[cos(P _ci ，P _cj )-1] ² ；

After loss of the support sample is calculated, gradient descent is carried out, and a mini-batch and Adam optimizer is adopted to update an orthogonal adaptation module

Training multiple tasks is repeated until the network converges.

Further, the Adam adaptive optimization algorithm specifically comprises the following steps:

initializing data: v _dW ＝0，S _dW ＝0，v _db ＝0，S _db =0, they represent biased first and second moment estimates, dW, db represent the differentiation of W and b, respectively;

calculating a Momentum exponential weighted average:

v _dW ＝β ₁ v _dW +(1-β ₁ )dW (5)

v _db ＝β ₁ v _db +(1-β ₁ )db (6)

calculating an exponential weighted average of the square of the gradient derivative of the RMSprop algorithm formula:

S _dW ＝β ₂ S _dW +(1-β ₂ )(dW) ² (7)

S _db ＝β ₂ S _db +(1-β ₂ )(db) ² (8)

calculating Momentum, RMSprop deviation correction of two algorithms:

deviation correction of Momentum algorithm:

deviation correction of RMSprop algorithm:

gradient descent is performed, and the weight is updated:

in equations (5) - (14), t represents the t-th iteration, α represents the learning rate, which controls the update rate of the weights, ε represents a very small constant, β ₁ ,β ₂ Respectively represent the exponential decay rates of the first and second moment estimates,

representing the first and second moment estimates after the bias correction.

Further, step S4 includes:

s41, testing process, each task

By support set->

And query set->

Composition, query set of test set->

Input to the embedding module f _θ Orthogonal adaptation module after fine tuning->

In (3) get the characteristics->

S42, respectively associating the characteristics output by the orthogonal adaptation module with different types of masks M _c The specific operation of multiplication is shown in formula (1):

wherein,,

for the kth query sample,/->

wherein C is the total category number under the current task, H is the characteristic channel number, and H is the integer multiple of C; in the above formula, when h is within a given range (the range of h starts from 0), M _cijh Equal to 1, the values of the remaining positions are 0;

s43, sending the product into a measurement module to calculate a query sample

Cosine distances from all support samples;

s44, taking the support sample category closest to the query sample as the prediction category of the query sample.

On the other hand, the invention also provides a small sample image classification device based on characteristic migration and orthogonal priori, which is used for realizing the method, and comprises the following functional modules:

the pre-training module is used for pre-training the image to obtain an embedded module f _θ Extracting features of an image, wherein the image comprises a training set and a testing set;

the processing module introduces the idea of orthogonal priori and constructs a feature learning network model based on feature migration and orthogonal priori;

the computing module is used for learning network model objective function solving model parameters based on the features of training and optimizing the orthogonal priori;

and the classification module classifies the test set images by utilizing the optimized image orthogonal priori feature learning network model.

Compared with the prior art, the invention has the beneficial effects that:

the small sample image classification method and device based on feature migration and orthogonal prior, provided by the invention, are based on a depth convolutional neural network (Deep Convolutional Neural Networks, DCNN for short), and a small sample classification framework of high-identification feature extraction is researched on the basis of small sample image classification research based on depth measurement. By introducing feature migration and small sample image feature learning of orthogonal priori, the feature extraction mode is assumed to be shared by the new class and the base class, and the feature orthogonality of the new class data among different classes is assumed to be free of correlation, and by constructing an orthogonalization feature adaptation network, an orthogonalization feature subspace is learned, so that the different classes of features are orthogonalized with each other, the different classes are easy to distinguish, and the identification degree of the features is improved. The invention has very important significance for theoretical research of small sample learning and promotion of wide application of machine identification technology. Meanwhile, the method plays a role in adding bricks and tiles for the advanced technology of breaking through the theoretical bottleneck of small sample learning and mastering artificial intelligence in China.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a small sample classification (Few-shot Classification) framework.

Fig. 2 is a flowchart of a small sample image classification method and apparatus based on feature migration and orthogonal prior according to an embodiment of the present invention.

Fig. 3 is an embedded module f according to an embodiment of the present invention _θ Structure diagram.

Fig. 4 is a network diagram for feature learning of a small sample image with feature migration and orthogonal prior introduced according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of an orthogonal adaptation module according to an embodiment of the present invention

Is a model structure diagram of the (c).

Fig. 6 is a schematic diagram of a functional module of a small sample image classification device based on feature migration and orthogonal prior according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. Embodiments of the present invention are intended to be within the scope of the present invention as defined by the appended claims.

The invention also provides a small sample image classification method based on feature migration and orthogonal prior, and the flow is shown in figure 2 and comprises the following steps.

The method comprises the following steps:

specifically, step S1 includes:

s11, data is processed

Is divided into->

And->

s12, for the C-way K-shot classification task, from D _train Randomly selecting C classes, randomly selecting M samples from each class, wherein K samples are used as support samples S _i The remaining M-K samples are taken asQuery sample Q _i ，S _i And Q _i Form a task T _i Also for D _test With tasks

S13, first stage of training: pre-training an embedded module f using base class data _θ ，f _θ Comprising 4 convolution blocks, each convolution block comprising a convolution layer, a pooling layer, a nonlinear activation function; the convolution kernel window size used by each convolution block is 3×3, a batch normalization, three channels of RGB, a pooling layer, a 2×2 maximum pooling layer is adopted, the maximum pooling layer of the last two blocks is cut, and a nonlinear activation layer is adopted as an activation function of ReLu. For example, for an 84 x 3RGB image, a 3x3 convolution kernel with 64 filters is used for each block. Each block consists of 1 convolution, 1 ReLu, one pooling, as shown in fig. 3. The pre-trained embedded modules can be multiplexed according to different scenes.

S2, introducing an orthogonal priori thought into the convolutional neural network model, and constructing a feature learning network model based on feature migration and the orthogonal priori; as shown in fig. 4.

Specifically, in step S2, in the feature learning network model based on feature migration and orthogonal prior, the orthogonalization feature adaptation network is composed of three parts: embedded module f _θ Orthogonal adaptation module

And a metrology module; orthogonal adaptation module->

Is composed of two layers of convolution layers, the convolution kernel size is 5 x 5, which is a feature subspace used for transforming and learning orthogonalization of new sample features, as shown in fig. 5.

specifically, step S3 includes:

s31, training the firstTwo stages, a classification task is carried out on the new class data, and all support samples are input into an embedding module f with fixed parameters _θ In the process, corresponding support sample characteristics f are obtained _θ (S _ck )；

S33, mask (Mask) M corresponding to each class of the transformed features _c Multiplying to enable the features between different classes to be orthogonal pairwise;

the calculation formula in step S33 is:

wherein S is _ck For the kth support sample of class c,

S34, calculating cosine distance C (P) between similar features by using the measurement module _ci ,P _cj ) (i epsilon [0, K), i not equal to j), the cosine distance of the features among the same class under a plurality of classes can be obtained;

the calculation formula of step S34 is as follows:

wherein C (P) _ci ,P _cj ) To calculate cosine distance between classes, K is the number of support samples, c is the c-th class, P _ci Representing the ith support sample feature in class c, P _cj Representing the j-th support sample feature in class c,

S35, optimizing orthogonal adaptation module by mean square error loss function

Step S35 includes:

the calculation formula using the mean square error loss function is as follows:

wherein N is the total category number under the current task, C (P _ci ,P _cj ) To calculate cosine distances between the classes, where MSE [ cos (P _ci ，P _cj )，l]＝[cos(P _ci ，P _cj )-1] ² ；

Training multiple tasks is repeated until the network converges.

The Adam self-adaptive optimization algorithm comprises the following specific steps:

calculating a Momentum exponential weighted average:

v _dW ＝β ₁ v _dW +(1-β ₁ )dW (5)

v _db ＝β ₁ v _db +(1-β ₁ )db (6)

S _dW ＝β ₂ S _dW +(1-β ₂ )(dW) ² (7)

S _db ＝β ₂ S _db +(1-β ₂ )(db) ² (8)

calculating Momentum, RMSprop deviation correction of two algorithms:

deviation correction of Momentum algorithm:

deviation correction of RMSprop algorithm:

gradient descent is performed, and the weight is updated:

representing the first and second moment estimates after the bias correction.

The step S4 includes:

s41, testing process, each task

By support set->

And query set->

Composition, query set of test set->

In (3) get the characteristics->

wherein,,

for the kth query sample,/->

s43, sending the product into a measurement module to calculate a query sample

Cosine distances from all support samples; in the training stage, the measurement module calculates cosine distances among similar features without calculation in different categories, which is different from the measurement module in the testing stage in use;

s44, taking the support sample category closest to the query sample as the prediction category of the query sample. Different from the traditional training, the support sample fine tuning model is used under the new class, and the query sample is directly tested after the optimization is finished.

On the other hand, the invention also provides a small sample image classification device based on feature migration and orthogonal priori, which is used for realizing the method, as shown in fig. 6, and comprises the following functional modules:

According to the method, feature migration and orthogonal priori small sample image feature learning are introduced, a new class and a base class share feature extraction mode is assumed, the orthogonality of features of different classes of new class data is assumed, and an orthogonalization feature adaptation network is constructed to learn an orthogonalization feature subspace, so that the features of different classes are orthogonalized to each other, and the recognition degree of the features is improved.

The specific embodiments of the small sample image classification method, device and method based on feature migration and orthogonal prior are set forth above in connection with the accompanying drawings. The implementation of the method and apparatus will be apparent to those skilled in the art from the description of the embodiments above.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, the disclosure herein is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present disclosure as described herein, and the above description of specific languages is provided for disclosure of enablement and best mode of the present disclosure.

Similarly, it should be appreciated that in the above description of exemplary embodiments disclosed herein, various features disclosed herein are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various disclosed aspects. However, the disclosed method should not be construed as reflecting the following schematic diagram: i.e., the claims are directed to the disclosed herein with more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this disclosure.

The foregoing examples are merely specific embodiments of the present application, and are not intended to limit the scope of the present application, but the present application is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, the present application is not limited thereto. Any person skilled in the art, within the technical scope of the disclosure of the present application, may modify or easily conceive of changes to the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical details; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The small sample image classification method based on feature migration and orthogonal priori is characterized by comprising the following steps:

step S3, feature learning network model based on training optimization orthogonal priori comprises:

s31, training secondStage, a classification task is carried out on the new class data, and all support samples are input into an embedding module f with fixed parameters _θ In the process, corresponding support sample characteristics f are obtained _θ (S _ck )；

S33, mask M corresponding to each class is used for the transformed features _c Multiplying to enable the features between different classes to be orthogonal pairwise; the calculation formula in step S33 is:

wherein S is _ck For the kth support sample of class c,

wherein C is the total category number under the current task, H is the characteristic channel number, and H is the integer multiple of C; in the above formula, when h is within the given range, the range of h starts from 0, M _cijh Equal to 1, the values of the remaining positions are 0;

s34, calculating cosine distance cos (P) between similar features by using the measurement module _ci ，P _cj ) (i ε [0, K), i+.j); the calculation formula of step S34 is as follows:

wherein cos (P _ci ,P _cj ) To calculate cosine distance between classes, K is the number of support samples, c is the c-th class, P _ci Representing the ith support sample feature in class c, P _cj Representing the j-th support sample feature in class c,

multiplication of corresponding elements representing the matrix, |P _ci The expression matrix P _ci Is a binary norm of (2);

s35, optimizing orthogonal adaptation module by mean square error loss function

The mean square error loss function calculation formula in step S35 is as follows:

wherein C is the total category number under the current task, cos (P _ci ,P _cj ) To calculate cosine distances between the classes, where MSE [ cos (P _ci ，P _cj )，1]＝[cos(P _ci ，P _cj )-1] ² ；

Repeating training the plurality of tasks until the network converges;

2. The small sample image classification method based on feature migration and orthogonal priors according to claim 1, wherein step S1 comprises:

s11, data is processed

Is divided into->

And->

3. The small sample image classification method based on feature migration and orthogonal prior according to claim 1, wherein in step S2, in a feature learning network model based on feature migration and orthogonal prior, an orthogonalization feature adaptation network is composed of three parts: embedded module f _θ Orthogonal adaptation module

And a metrology module; orthogonal adaptation module->

4. The small sample image classification method based on feature migration and orthogonal prior according to claim 1, wherein Adam adaptive optimization algorithm specifically comprises the following steps:

initializing data: v _dW ＝0,S _dW ＝0,v _db ＝0,S _db ＝0，v _dW ,v _db ,S _dW ,S _db Respectively representing biased first and second moment estimates, dW and db respectively representing the differentiation of W and b;

calculating a Momentum exponential weighted average:

v _dW ＝β ₁ v _dW +(1-β ₁ )dW (5)

v _db ＝β ₁ v _db +(1-β ₁ )db (6)

S _dW ＝β ₂ S _dW +(1-β ₂ )(dW) ² (7)

S _db ＝β ₂ S _db +(1-β ₂ )(db) ² (8)

calculating Momentum, RMSprop deviation correction of two algorithms:

deviation correction of Momentum algorithm:

deviation correction of RMSprop algorithm:

gradient descent is performed, and the weight is updated:

representing the first and second moment estimates after the bias correction.

5. The small sample image classification method based on feature migration and orthogonal priors according to claim 1, wherein step S4 comprises:

s41, testing process, each task

By support set->

And query set->

Composition, query set of test set->

In (3) get the characteristics->

S42, respectively associating the characteristics output by the orthogonal adaptation module with different types of masks M _c The specific operation of multiplication is shown in a formula (1);

wherein,,

for the kth query sample,/->

wherein C is the total category number under the current task, H is the characteristic channel number, and H is the integer multiple of C; in the above formula, when h is within the given range, the range of hStarting from 0, M _cijh Equal to 1, the values of the remaining positions are 0;

s43, sending the product into a measurement module to calculate a query sample

Cosine distances from all support samples;

6. A small sample image classification device based on feature migration and orthogonal priors, characterized by being adapted to implement the method of any of claims 1-5, comprising the following functional modules: