CN113326892A

CN113326892A - Relation network-based few-sample image classification method

Info

Publication number: CN113326892A
Application number: CN202110691161.XA
Authority: CN
Inventors: 刘洋; 蔡登�; 张伟锋; 项超; 郑途
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-06-22
Filing date: 2021-06-22
Publication date: 2021-08-31

Abstract

The invention discloses a few-sample image classification method based on a relational network, which comprises the following steps: (1) a neural network model forwards deduces local visual feature representations of a query image and a support image; (2) constructing a bidirectional affiliation network to mine deep relevance of local features in the query image and the support image set; (3) according to the graph network centrality of the relation network, the classification probability prediction of the query image is carried out; (4) after the image data set with the label is used for carrying out the division of the few-sample task, the training neural network model (5) is sequenced according to the classification probability between the image data set and each class in the support image set, and the class with the maximum classification probability is selected as the class prediction of the image. By using the method and the device, the relevance of local features of the few-sample image classification in the training process can be fully mined, so that the classification result is more accurate.

Description

Relation network-based few-sample image classification method

Technical Field

The invention belongs to the technical field of image classification, and particularly relates to a few-sample image classification method based on a relational network.

Background

In recent years, object classification has received attention from researchers in the industry and academia as an important branch in the field of computer vision. The supervised object classification task has advanced greatly with the benefit of the rapid development of deep learning techniques, but at the same time, there are some limitations to this supervised training approach, i.e. in supervised classification, each class requires enough labeled training samples. However, in practical applications, each class may not have enough training samples, and labeling picture data requires some expertise and often takes a lot of manpower.

The objective of few-sample image classification is to learn a machine learning model for image classification, so that after learning an image classification task of a large amount of data of a certain class, a method for quickly classifying a new image class can be performed with only a small number of samples. Few-sample image classification methods have become a rapidly developing direction in the field of machine learning, and have achieved certain achievements in the classification of medical images, satellite pictures and some rare species.

In the recent few-sample image classification method, a single feature vector is not used for carrying out global feature representation on the whole image, but the feature representation based on local feature characters is learned, and the feature representation retains each piece of local information as much as possible. The derivation method comprises the following steps: inputting an image, deducing a plurality of local feature vectors of the image by a model at a first stage, and representing by using a local feature vector set; various metric learning-based methods are used in the second stage to measure the distance between the query image and the support image, which consists of a small number of samples. For example, in The DN4 model proposed in The article "reviewing Local Descriptor based Image-to-Class measurement for raw-shot Learning" recorded in 2019 at The Conference of international Computer Vision and Pattern Recognition (The Conference on Computer Vision and Pattern Recognition), a similarity Measure between each Local feature representation of The query Image and The support Image set is aggregated by using a naive bayes nearest neighbor-based Measure method. In 2019, The Dense classification method is proposed by 'Dense classification and visualization for raw-shot learning' which is included in The Conference of international Computer Vision and Pattern Recognition (The Conference on Computer Vision and Pattern Recognition), and classification prediction is performed on each local feature representation of an image, and prediction values of The local feature representations are averaged to obtain a classification prediction result of The whole image. An article entitled "DeepEMD: Few-Shot Image Classification with differentiated Earth Mover's Distance" recorded in The Conference of Computer Vision and Pattern Recognition (The Conference on Computer Vision and Pattern Recognition) in 2020 proposes to split an Image into a plurality of tiles, introduce a bulldozer Distance as a Distance measurement method between The tiles, and calculate The best matching cost between each tile of a query Image and a support Image to represent The similarity between The two.

In addition to using local feature representations of images, some recent work has often involved subspace learning methods. For example, a TapNet model proposed in an article named "TapNet: New network augmented with task-adaptive projection for raw-shot Learning" at the international Machine Learning Conference (International Conference on Machine Learning) in 2019 learns task-specific subspace projections and classifies based on the subspace projections between query image features and supporting image features. An article named "Adaptive subspaces for raw-shot learning" in The international Conference on Computer Vision and Pattern Recognition in 2020 proposes a class-specific subspace based on a few learnt classes and uses The projected distances of query images to The respective class subspaces for classification.

Disclosure of Invention

The invention provides a few-sample image classification method based on a relational network, which can be understood as establishing a bidirectional affiliated relational network for a local feature set of a query image and a local feature set of a support image, and emphasizing the deep relevance between the local features of the query image and the support image, thereby realizing better few-sample image classification.

A few-sample image classification method based on a relational network comprises the following steps:

(1) constructing a deep neural network model, and obtaining a local feature set representation when the query image and the support image are subjected to forward derivation;

(2) constructing a bidirectional affiliation network of a local feature set q of a query image and a local feature set S supporting all classes of images;

(3) calculating the association degree between the query image and each support image by using the graph centrality of the bidirectional affiliation network;

(4) during training, the low-sample training data set is divided into a plurality of low-sample image classification tasks, the steps (2) to (3) are repeated for the query image and the support image set in each low-sample classification task,

calculating the probability of classifying the few-sample images according to the relevance between the few-sample images and the categories of all the support images, and performing parameter training of the deep neural network model by using a negative log-likelihood function as a loss function;

(5) during testing, for the query image and the support image set in each small-sample classification task, calculating the probability of classifying the images in the query image set into the categories to which the support images belong according to the steps (2) to (3), and selecting the category with the highest probability as the classification prediction of the images.

The invention provides a few-sample image classification method based on a relational network, which is a novel algorithm for classifying according to the centrality of the relational network. Compared with the previous method, the method considers the deep bidirectional correlation between the query image feature and the local feature of the support image in the relational network, and not only simply uses the local feature of the query image in a single direction to find the closest local feature in the support image set, or carries out probability prediction on each local feature individually.

The specific process of the step (1) is as follows:

extraction of deep visual features theta ∈ R of query image input x using pre-trained deep neural network model^C×H×WTo rotate itSet representation q ═ q for local features of images₁,…,q_MWhere M ═ H × W denotes the number of local feature sets of a single image, q ∈ R^CRepresenting one of the local feature vectors;

extracting kth support image input from class c using the same deep neural network model

Deep layer characteristics of

Averaging all local features of K support images from the same class c to obtain an average local feature map of the class

Converting it into a local feature set representation of a class

The specific process of the step (2) is as follows:

(2-1) first, constructing a local feature set S supporting all classes in the image set:

wherein N is the number of all classes in a small sample classification task;

(2-2) for any local feature q ∈ q of the query image, calculating its random walk probability function to each local feature in the set S:

wherein exp (. cndot.) represents an exponential function,

presentation GinsengTwo eigenvectors v of number gamma scale up₁,v₂Cosine similarity between them;

(2-3) expressing the relationship between each q ∈ q and each S ∈ S in (2-2) using a matrix form:

P_Sq＝Φ_γD^-1

wherein, P_SqEach column of (a) corresponds to a random walk probability from one local feature in q to each local feature in S; phi_γThe relationship between any of q and s is reflected in [ phi ]_γ]_sq＝exp(φ_γ(s, q)); d is a diagonal matrix in which the elements in the jth row and jth column are equal to phi_γThe sum of all elements in column j;

(2-4) for any local feature S ∈ S of the support image, calculate its random walk probability function to each local feature in the set q:

wherein exp (. cndot.) represents an exponential function,

two eigenvectors v representing the scale up of the parameter tau₁,v₂Cosine similarity between them;

(2-5) expressing the relationship between each S ∈ S and each q ∈ q in (2-4) using a matrix form:

P_qS＝Φ_τW^-1

wherein, P_qSCorresponds to a random walk probability from one local feature in S to each local feature in q; phi_τThe relationship between any of s and q is reflected in [ phi ]_τ]_qs＝exp(φ_τ(s, q)); w is a diagonal matrix in which the elements in the jth row and jth column are equal to phi_τAll element sums in column j;

(2-6) constructing a bidirectional affiliation network, and the nodes and the connection matrix among the nodes can be expressed as

Wherein the size of the connection matrix P is (NM + M) × (NM + M); the nodes in the local feature set S and the nodes in the local feature set S are not connected with each other, the nodes in the local feature set q and the nodes in the local feature set q are not connected with each other, and a zero matrix is used for representing; the weight value of the node in the local feature set q connected to the directed connecting edge of the local feature set S is reflected on the submatrix P_SqIn the method, the weight of the node in the local feature set S connected to the directed connecting edge of the local feature set q is reflected on the submatrix P_qSIn (1).

The specific process of the step (3) is as follows:

(3-1) calculating Katz centrality of the graph network for the bidirectional affiliation relationship network constructed in the step (2):

x^Katz＝((I-αP)^-1-I)e

where I is an identity matrix of size (NM + M) × (NM + M), and e is a column vector of length NM + M with each element being 1; the Katz centrality vector x is obtained by calculation^KatzThe length of the node is also NM + M, which represents the importance degree of each node in the bidirectional affiliation network;

(3-2) vector x according to Katz centrality^KatzCalculating the probability of classifying the query image into the category to which each support image belongs:

wherein the content of the first and second substances,

a node centrality scalar representing the local feature s, x represents the query image input,

prediction for a few sample classification algorithm representing x-times of an input query imageCategories

Is the probability of c.

The specific process of the step (4) is as follows:

(4-1) in the data preparation process, for N classes of low-sample classification tasks with K samples in each class, randomly sampling a training data set into a set consisting of E low-sample tasks

Wherein

Represents the kth image, x, in the support set image set from the jth class in the ith sample-less task⁽ⁱ⁾Representing the query image in the ith sample-less task, y⁽ⁱ⁾Representing a category of the query image;

in the network training process, each small-sample task comprises N multiplied by K support image sets

And a query image (x)⁽ⁱ⁾,y⁽ⁱ⁾) Participating together;

(4-2) in the training process, images in each small sample task

And query image label y⁽ⁱ⁾Calculated according to the formula of (3-2)

Training a neural network by using a negative log-likelihood loss function L, wherein the specific calculation formula of the loss function L is as follows:

wherein, δ (y)⁽ⁱ⁾C) is an indicator function, i.e. when y⁽ⁱ⁾When the condition is satisfied, the value is 1, and when the condition is not satisfied, the value is 0.

The specific process of the step (5) is as follows:

(5-1) in the process of preparing the test data, the test data set is randomly sampled into a set consisting of E' few-sample test tasks according to the similar step (4-1)

Wherein D is_testWith D mentioned in step (4-1)_trainThere are two different points: (a) the image data sources are different, namely the categories of the training set and the testing set are not intersected; (b) d_testDoes not include the query image x⁽ⁱ⁾The classification information of (1), namely the information is only used as a standard to measure the quality of the neural network model with few sample classification, and does not participate in calculation;

(5-2) test prediction stage of images in each of the sample-less tasks

Firstly, the formula (3-2) is calculated

Obtaining a query image x⁽ⁱ⁾Class of prediction output

(5-3) test evaluation phase if calculated in step (5-3)

Is equal to y⁽ⁱ⁾Then the task with less samples is consideredThe prediction is successful; for test set D consisting of E' few sample tasks_testAnd (4) measuring the robustness of the few-sample neural network model trained in the step (4) by using the average prediction accuracy.

The few-sample image classification method based on the relational network has all the advantages of the few-sample image classification method, fully considers the deep relation between the local features of the query image and the local features of the support image set, and greatly improves the accuracy of the few-sample classification method based on the local feature classification.

The invention also constructs a few-sample image classification system based on the relational network, which comprises a computer system, wherein the computer system comprises:

the visual feature module captures the depth visual features of the input image by utilizing a convolutional neural network;

the relation network module is used for constructing a bidirectional affiliated relation network based on the local visual characteristics;

the classification prediction module is used for classifying the few-sample images by using the centrality of the graph network of the relational network;

and the classification generation module is used for outputting a classification result to the outside after the model classification is finished.

Compared with the prior art, the invention has the following beneficial effects:

1. the bidirectional affiliation network algorithm provided by the invention improves the accuracy of the few-sample image classification method based on local feature classification by mining the deep layer relation between the local features of the query image and the local features of the support image set.

2. A large number of experiments prove that the model performance superior to other baseline algorithms is demonstrated. The superiority of the model is proved from experiments.

Drawings

FIG. 1 is a schematic overall framework diagram of the process of the present invention;

FIG. 2 is a block diagram of a system according to the present invention;

FIG. 3 is a heat map for visualizing the centrality of local features of a set of sample-less classification tasks according to an embodiment of the present invention.

Detailed Description

The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.

As shown in FIG. 1, the main model of the present invention is divided into a visual feature module, a relationship network module, and a classification prediction module, and the final probability prediction is used in the optimization process of the whole module. The method comprises the following specific steps:

(a) the visual feature module is used for learning the depth visual feature theta of an input image x in the process of classification training of the few-sample images, and the basic steps are as follows:

(a-1) the query image in the ith task of the sample-less image classification task is first cropped to an image x scaled to 84 × 84 size⁽ⁱ⁾As a query input to the network; the supporting image set is also scaled to 84 × 84 size by the same cropping

As a support input to the network.

(a-2) for query image x⁽ⁱ⁾Extracting a feature map of the last non-classified layer of the neural network as a depth visual feature theta of the image⁽ⁱ⁾∈R^C×H×WWhere C denotes the number of channels of the feature map, and H and W denote the height and width of the feature map, respectively. In the method for classifying the few-sample images based on the local characteristic characters, a characteristic graph theta of a query image is expressed⁽ⁱ⁾Local feature set representation q ═ { q ═ q converted into query image₁,…,q_MWhere M ═ H × W denotes the number of local features of a single image, q ∈ R^CRepresenting one of the local feature vectors.

(a-3) inputting image of kth support from category c in support set

Extracting the characteristic diagram of the last non-classified layer of the same neural network

Averaging the local features of the K support images from the same class c to obtain an average local feature map of the class

Converting it into a local feature set representation of a class

(b) The relationship network module constructs a bidirectional affiliated relationship network according to the local features of the image, and excavates the relevance among the local features, and the basic steps are as follows:

(b-1) constructing a local feature set S supporting all classes in the image set:

where N is the number of categories of the supporting image set in a sample-less classification task.

(b-2) for each local feature q in the query image local feature set q, calculating a random walk probability function of it to each local feature in the set S:

where exp (·) represents an exponential function,

two eigenvectors v representing the scaling of the parameter gamma₁,v₂Cosine similarity between them.

(b-3) expressing the relationship between each q ∈ q and each S ∈ S in the step (b-2) using a matrix form:

P_Sq＝Φ_γD^-1

wherein, P_SqEach column of (1) corresponds to an office in qA random walk probability of each local feature from the partial feature to S; phi_γThe random walk relationship between any one of q and s is reflected in [ phi ]_γ]_sq＝exp(φ_γ(s, q)); d is a diagonal matrix in which the elements in the jth row and jth column are equal to phi_γThe sum of all elements in column j.

(b-4) for each local feature S supporting the set S of image local features, calculating a random walk probability function for it to each local feature in the set q:

wherein exp (. cndot.) represents an exponential function,

two eigenvectors v representing the scale up of the parameter tau₁,v₂Cosine similarity between them.

(b-5) expressing the relationship between each S ∈ S and each q ∈ q in the step (b-4) using a matrix form:

P_qs＝Φ_τW^-1

wherein, P_qSCorresponds to a random walk probability from one local feature in S to each local feature in q; phi_τThe relationship between any of s and q is reflected in [ phi ]_τ]_qs＝exp(φ_τ(s, q)); w is a diagonal matrix in which the elements in the jth row and jth column are equal to phi_τAll the elements in column j and.

(b-6) constructing a bidirectional affiliation network, wherein the network consists of NM + M local characteristic nodes, and the nodes and the connection matrix among the nodes can be expressed as

Wherein the size of the connection matrix P is (NM + M) × (NM + M); nodes in the local feature set S and nodes in the local feature set SThe points are not connected with each other, the nodes in the local feature set q and the nodes in the local feature set q are not connected with each other, and a zero matrix is used for representing; the weight value of the node in the local feature set q connected to the directed connecting edge of the local feature set S is reflected on the submatrix P_SqIn the method, the weight of the node in the local feature set S connected to the directed connecting edge of the local feature set q is reflected on the submatrix P_qSIn (1).

(c) The classification prediction module provides a classification probability calculation function for a few-sample image classification process, and the basic steps are as follows:

(c-1) calculating Katz centrality of the graph network (wherein the attenuation adjustable parameter of the Katz centrality is alpha) for the bidirectional affiliation relationship network constructed in the step (b-6):

X^Katz＝((I-αP)^-1-I)e

where I is an identity matrix of size (NM + M) × (NM + M) and e is a column vector of length NM + M with each element being 1. The Katz centrality vector x is obtained by calculation^KatzIs also NM + M, indicating how important the respective node is in the bidirectional affiliation network.

(c-2) Katz centrality vector x from the relational graph network^KatzCalculating the probability of classifying the query image into the category to which each support image belongs:

wherein the content of the first and second substances,

prediction classes representing a few-sample classification algorithm when the input query image is x

Is the probability of c.

The training steps of the few-sample image classification method based on the relational network are as follows:

1. for the class-N K-supported sample low-sample image classification task, initializing a training data set consisting of E low-sample tasks in a standard classification training data set in a random sampling mode

Wherein

Represents the kth image, x, in the support set image set from the jth class in the ith sample-less task⁽ⁱ⁾Representing the query image in the ith sample-less task, y⁽ⁱ⁾Representing categories of query images in the ith sample-less task. In the network training process of each low-sample task, the network training system comprises N multiplied by K support image sets

And a query image (x)⁽ⁱ⁾,y⁽ⁱ⁾) And (4) participating together.

2. Selecting images in a task with few samples

As input to the network model, label y with the query image⁽ⁱ⁾As a result of the correct classification of the task. Extracting local features of the image through a visual feature module; mining deep correlation of the local features through a relation network module; a classification prediction module is used to calculate the probability of a query image being classified into classes in the set of support images.

3. The classification probability is maximized by using a negative log-likelihood-based function, and the specific loss function L is calculated by the formula:

wherein, δ (y)⁽ⁱ⁾C) is an indicator function, i.e. when y⁽ⁱ⁾C condition is fullIf sufficient, the value is 1, and if the condition is not satisfied, the value is 0.

4. And (5) repeating the steps 2-3 by adopting a gradient descent method, and training the parameters of the visual feature module.

The sample classification steps of the relation network-based few-sample image classification method are as follows:

1. for input query image x⁽ⁱ⁾Using the trained model to calculate the classification probability of each class in the support image set

2. And sequencing the probabilities of all the categories, and selecting the category with the highest classification probability in the support image set as the prediction category of the query image.

As shown in fig. 2, a relational network-based few-sample classification system is divided into four major modules, namely a visual feature module, a relational network module, a classification prediction module, and a classification generation module.

The method is applied to the following embodiments to achieve the technical effects of the present invention, and detailed steps in the embodiments are not described again.

This example compares two large public data sets miniImageNet, tiered ImageNet with other current leading edge few sample image classification methods. miniImageNet is the most well-known evaluation dataset in the task of few-sample image classification, containing 100 randomly selected classes from the large-scale image dataset ImageNet, 600 images per class; on miniImageNet, 64 classes were used to train the few sample classification neural network, 16 classes were used to cross-validate the robustness of the network, and 20 classes were used to evaluate the generalization ability of the network. tiered ImageNet is a subset of the large-scale image data set ImageNet, as is miniImageNet, and contains a broader class than miniImageNet; wherein 351 self classes from the 20 major classes are used for training, 97 sub-classes from the 6 major classes are used for cross-validation, and pictures of 160 sub-classes from the 8 major classes are used for testing; in the tiered imagenet, a challenging data set, the information overlap between training, cross-validation, test sets is very small. The evaluation index of this embodiment is the average classification accuracy of the class-less classification task (including two cases, N is 5, K is 1 and N is 5, and K is 5) under 10000 classes of K samples in the test set, and the total comparison results are shown in table 1 and table 2, where 5 current class-less image classification algorithms under two classes of sample classification mainstream neural networks (Conv4 and ResNet12) respectively used in the visual feature module are compared.

Table 1 classification results of visual characteristics module using Conv4 as skeleton network (N ═ 5)

Table 2 classification results of visual characteristics module using ResNet12 as skeleton network (N ═ 5)

As can be seen from tables 1 and 2, the less-sample image classification framework based on the relational network provided by the invention obtains the optimal effect under each large evaluation index, and fully shows the superiority of the algorithm of the invention.

To further illustrate that the algorithm proposed by the present invention does classify according to the centrality of the relationship network, i.e. the class to which the local visual feature more in the center of the relationship network belongs is more likely to be the correct query image class, the present invention visualizes the centrality heat map of the local feature of the set of few-sample classification tasks, and the result is shown in fig. 3. It can be seen that local features of the correct prediction class of the query image tend to have a higher heat map.

The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A few-sample image classification method based on a relational network is characterized by comprising the following steps:

2. The relational network-based few-sample image classification method according to claim 1, wherein the specific process of the step (1) is as follows:

extraction of deep visual features theta ∈ R of query image input x using pre-trained deep neural network model^C×H×WConverting it into a set representation of local features of the image q ═ { q ═ q₁，...，q_MWhere M ═ H × W denotes a single imageQ ∈ R^CRepresenting one of the local feature vectors;

Deep layer characteristics of

Converting it into a local feature set representation of a class

3. The relational network-based few-sample image classification method according to claim 2, wherein the specific process of the step (2) is as follows:

wherein N is the number of all classes in a small sample classification task;

wherein exp (. cndot.) represents an exponential function,

two eigenvectors v representing the scaling of the parameter gamma₁，v₂Cosine similarity between them;

P_Sq＝Φ_γD^-1

wherein exp (. cndot.) represents an exponential function,

two eigenvectors v representing the scale up of the parameter tau₁，v₂Cosine similarity between them;

P_qS＝Φ_τW^-1

wherein, P_qSCorresponds to a random walk probability from one local feature in S to each local feature in q; phi_τThe relationship between any of s and q is reflected in [ phi ]_τ]_qs＝exp(φ_τ(s, q)); w is a diagonal matrix in which the elements in the jth row and jth column are equal to phi_τAll elements of the j-th columnAnd;

4. The relational network-based few-sample image classification method according to claim 1, wherein the specific process of the step (3) is as follows:

x^Katz＝((U-αP)^-1-U)e

where U is an identity matrix of size (NM + M) × (NM + M), and e is a column vector of length NM + M with each element being 1; the Katz centrality vector x is obtained by calculation^KatzThe length of the node is also NM + M, which represents the importance degree of each node in the bidirectional affiliation network;

wherein the content of the first and second substances,

Is the probability of c.

5. The relational network-based few-sample image classification method according to claim 4, wherein the specific process of the step (4) is as follows:

Wherein

And a query image (x)⁽ⁱ⁾，y⁽ⁱ⁾) Participating together;

(4-2) in the training process, images in each small sample task

And query image label y⁽ⁱ⁾Calculated according to the formula of (3-2)

6. The relational network-based few-sample image classification method according to claim 5, wherein the specific process of the step (5) is as follows:

(5-2) test prediction stage of images in each of the sample-less tasks

Firstly, the formula (3-2) is calculated

Obtaining a query image x⁽ⁱ⁾Class of prediction output

(5-3) test evaluation phase if calculated in step (5-3)

Is equal to y⁽ⁱ⁾If so, the task with less samples is considered to be predicted successfully; for test set D consisting of E' few sample tasks_testAnd (4) measuring the robustness of the few-sample neural network model trained in the step (4) by using the average prediction accuracy.