CN116543192A

CN116543192A - Remote sensing image small sample classification method based on multi-view feature fusion

Info

Publication number: CN116543192A
Application number: CN202310070568.XA
Authority: CN
Inventors: 王�琦; 贾玉钰; 袁媛
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2023-02-07
Filing date: 2023-02-07
Publication date: 2023-08-04

Abstract

The invention provides a remote sensing image small sample classification method based on multi-view feature fusion. Firstly, performing rotation enhancement processing on an input training set image; then, extracting the characteristics of the images under all the multiple visual angles; then, inputting the extracted features into a classification model network for training, wherein the model comprises three parallel branches of a full-connection-layer rotation angle classifier, a basic semantic classifier and a multi-view feature fusion semantic classifier, and corresponding loss functions are respectively designed; and finally, carrying out classified prediction on the remote sensing image data set to be processed by using the trained network. The method can solve the problem of insufficient generalization of the model in the remote sensing scene recognition training process when the number of the labeled samples is small, and has the beneficial effects of promoting model learning to migrate knowledge, inhibiting semantic irrelevant content in the remote sensing image and strengthening nearest neighbor prototype matching associated information.

Description

Remote sensing image small sample classification method based on multi-view feature fusion

Technical Field

The invention belongs to the technical field of computer vision and image processing, and particularly relates to a remote sensing image small sample classification method based on multi-view feature fusion.

Background

Remote sensing scene recognition has higher application value in various fields, such as: natural disaster prediction, land utilization monitoring, autonomous perception of temporary security, and the like. In recent years, the depth model driven by the annotation data greatly improves the performance of the remote sensing scene classification algorithm by virtue of the strong learning capability. However, with the development of various high-resolution sensors, the remote sensing scene images are increasingly large in size and numerous in variety, which results in extremely difficult labeling tasks of the remote sensing scene images. Inspired by the rapid migration capability of human knowledge, the classification of small samples aims at realizing the identification of target samples by using a very small amount of labeling data, and the problems of difficult data labeling and sample collection are effectively alleviated. Related studies on small sample learning can be roughly classified into the following three categories: data augmentation based small sample learning, meta-learning based small sample learning, and metric based small sample learning. Among other things, algorithms that combine metric learning with meta learning achieve significant success.

The document "H.Ji, Z.Gao, Y.Zhang, Y.Wan, C.Li and T.Mei.Few-Shot Scene Classification of Optical Remote Sensing Images Leveraging Calibrated Pretext tasks.IEEE Transactions on Geoscience and Remote Sensing,60,1-13,2022" proposes a remote sensing scene small sample classification method based on self-supervision auxiliary tasks. According to the method, a rotation angle prediction task is introduced to improve the movable feature extraction capacity of the model; then, the comparison learning is used as an auxiliary task, the same-class features are aggregated, the heterogeneous features are far away from each other, and the feature expression capacity of the model is improved; finally, to further alleviate the over-fitting problem of the model and improve its generalization ability, the method corrects the parameters of the model using an AMP-based regularization method.

A task-related improved contrast learning method is proposed in the literature "Q.Zeng and J.Geng.task-Specific Contrastive Learning for Few-Shot Remote Sensing Image Scene classification.ISPPRS Journal of Photogrammetry and Remote Sensing,191,143-154,2022." to improve the performance of a model in classifying small samples of a remote sensing scene. Firstly, designing a characteristic enhancement module of 'self-attention + mutual-attention', aiming at filtering background noise in a remote sensing image and capturing potential association between a test sample and a category center by an auxiliary model; in addition, the text optimizes traditional contrast learning, expands the screening range of positive and negative sample pairs by introducing semantic tags, and provides a task-related contrast learning method.

However, these methods have limitations, and only take rotation prediction as an auxiliary task to enhance the feature extraction capability of the model, and do not fully utilize the rotation insensitivity of the remote sensing image: on one hand, as the remote sensing image has no explicit azimuth information and attitude information, the semantic prediction probability distribution of the remote sensing image under different rotation visual angles should be kept consistent; on the other hand, since the content in the different views of each remote sensing image contributes almost equally to the semantic tags, the potentially shared information between them can provide important value for matching test samples to class centers.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a remote sensing image small sample classification method based on multi-view feature fusion. Firstly, performing rotation enhancement processing on an input training set image; then, extracting the characteristics of the images under all the multiple visual angles; then, inputting the extracted features into a classification model network for training, wherein the model comprises three parallel branches of a full-connection-layer rotation angle classifier, a basic semantic classifier and a multi-view feature fusion semantic classifier, and corresponding loss functions are respectively designed; and finally, carrying out classified prediction on the remote sensing image data set to be processed by using the trained network. The method can solve the problem of insufficient generalization of the model in the remote sensing scene recognition training process when the number of the labeled samples is small, and has the advantages of promoting model learning and mobility knowledge, inhibiting semantic irrelevant content in the remote sensing image and strengthening nearest neighbor prototype matching associated information.

A remote sensing image small sample classification method based on multi-view feature fusion is characterized by comprising the following steps:

step 1: inputting a training image data set, and performing rotation enhancement processing on all images in the data set, wherein the rotation enhancement processing refers to respectively rotating each image by 0 DEG, 90 DEG, 180 DEG and 270 DEG to obtain an image with a corresponding view angle, and recording the ith image in the data set as a multi-view image set obtained after rotation enhancement of the ith image in the data setWherein, the liquid crystal display device comprises a liquid crystal display device,i=1, 2, …, representing a training image dataset, representing the total number of images contained in the dataset;

step 2: performing feature extraction on all the multi-view images obtained after the processing in the step 1 by adopting a ResNet-12 feature extraction network to obtain features corresponding to the images under each view, wherein all the features are one-dimensional vectors with the length of d=640;

step 3: inputting the characteristics of all the multi-view images into a classification model, and performing model overall optimization training in an end-to-end mode to obtain a trained model; the classification model comprises three parallel branches of a full-connection layer rotation angle classifier, a basic semantic classifier and a multi-view feature fusion semantic classifier;

the full-connection layer rotation angle classifier adopts a single-layer full-connection+relu activation function structure, the input dimension is 640, the output dimension is 4, the characteristics are mapped to an angle class space through the full-connection layer rotation angle classifier, and the corresponding rotation angle prediction loss function is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the rotation angle prediction loss, θ representing the network parameters of the feature extractor, +.>Parameters representing the full-link rotation angle classifier, < ->A cross entropy loss function calculated for the following equation:

wherein r=4, represents four views of rotation, R represents the R-th view of rotation, f _θ () A feature extraction operation is represented and is performed,representing the rotation angle classification operation of the full connection layer, [ ·] _r Representing the r element in the fetch vector;

the basic semantic classifier adopts the nearest neighbor prototype characterization principle, selects the class center closest to the characteristics of the test image as the semantic class of the test image, obtains the semantic probability distribution of the test image under each view angle through the semantic probability distribution, and the corresponding class distribution consistency loss function is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing class distribution consistency loss, P ^r All of the r×90 ° rotation-enhanced products are shownThe category probability distribution of the query set image, query set Q is the set of all test images, +.>N represents the number of categories in each mini-batch during training, +.>The category probability distribution obtained by r multiplied by 90 DEG rotation enhancement of the ith test image in the query set is represented, i=1, 2, …, |Q|, and|Q| represent the number of images contained in the set Q; p represents the average class probability distribution at all views, according to +.>Calculated, D _KL (||) represents the KL divergence of the computation vector; />C element of (a)>The calculated expression of (2) is as follows:

wherein τ is a scaling factor, the range of values is 128 to 512,a class center representing class c obtained by averaging the image features of all support sets under r×90 ° rotation enhancement, support set S representing training images of known tags in each mini-batch during training,/->Expressing Euclidean distance between the ith test image feature and the class center c under r multiplied by 90 DEG rotation enhancement, wherein the value range of c is 1 to N;

the calculation formula of the KL divergence is as follows:

the multi-view feature fusion semantic classifier adopts a transducer structure, obtains fused test image features and class center features through the transducer structure, and outputs class probability distribution of each test image based on a nearest neighbor prototype characterization principle, wherein the specific process is as follows: splicing the R eigenvectors to obtain a multi-view eigenvector F _i For the support set image, recording the obtained multi-view characteristic diagram as F _i ^S For the query set image, recording the obtained multi-view characteristic diagram as F _i ^Q I represents the image number in the collection; then, for each test image in the query set, the corresponding multi-view feature images and all the class centers are spliced according to rows to obtain the corresponding augmented multi-view feature images

Wherein, the liquid crystal display device comprises a liquid crystal display device,the multi-view class center feature map is a multi-view class center feature map of class c, and is obtained by averaging the support set image features under all view angles;

and then carrying out feature fusion through a transducer structure, wherein the specific expression is as follows:

wherein (Q, K, V) is the receive triplet input of the transducer structure, W _Q 、W _K And W is _V Is provided with three full-connection layers, wherein the three full-connection layers are arranged on the same substrate,is a fused feature;

for a pair ofEqually split according to rows to obtain two features, respectively marked as +.>And->Will->And->Expanding according to the 2 nd and 3 rd dimensions and calculating the Euclidean distance D according to the following formula _i The distance between the fused test image feature map and the fused class center feature map is:

wherein d (·) represents the Euclidean distance function, row _j Representing the j-th row of the matrix;

selecting a class center closest to the nearest neighbor prototype characterization principle as a prediction class of the test image;

the loss function corresponding to the multi-view feature fusion semantic classifier is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing multi-view feature fusion main classification loss, y _i True semantic tags representing the ith test image in the query set, [ D ] _i ] _c Representing the Euclidean distance between the ith test image feature in the query set and the class center of class c;

the total loss function of the classification model is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the total loss of the classification model network, beta being the rotation angle predictive loss +.>The weight of the term exceeds the parameter, the value range is 1 to 5, and gamma is the consistency loss of category distribution +.>The weight of the item exceeds the parameter, and the value range is 10 to 50;

step 4: inputting the remote sensing image data set to be processed into the classification model trained in the step 3, wherein the output of the multi-view feature fusion semantic classifier in the classification model is the final class prediction result of each image.

The beneficial effects of the invention are as follows: the rotation angle classifier of the full-connection layer is adopted, so that the rotation insensitivity characteristic of the remote sensing image is fully utilized, and the class distribution consistency loss function is designed, so that the characteristics irrelevant to the semantics in the remote sensing image can be effectively restrained; because the rotation angle classification task and the class distribution consistency task belong to the self-supervision auxiliary task, the movable feature extraction capacity of the model is better improved; because a new multi-view attention capturing module is designed, a supervised small sample classifier is embedded into the multi-view attention capturing module to form a multi-view feature fusion semantic classifier, and meanwhile, shared information in the multi-view features and strong association information between a query set sample and a category center in nearest neighbor matching are extracted, redundant information can be effectively removed, the strong association information between the sample and the category center is captured, and the classification accuracy of a model is improved; the classification model provided by the invention is a multi-task deep neural network, can realize end-to-end training, does not need a redundant pre-training process, and has a simpler and more efficient whole model frame.

Drawings

FIG. 1 is a flow chart of a method for classifying small samples of a remote sensing image based on multi-view feature fusion;

FIG. 2 is a diagram of a classification confusion matrix on the NWPU-RESISC45 dataset using the method of the invention;

wherein, (a) is a 5-way1-shot task result schematic diagram, and (b) is a 5-way 5-shot task result schematic diagram;

FIG. 3 is a graph of a classification confusion matrix on a WHU-RS19 dataset using the method of the present invention;

wherein, (a) is a 5-way1-shot task result schematic diagram, and (b) is a 5-way 5-shot task result schematic diagram.

Detailed Description

The invention will be further illustrated with reference to the following figures and examples, which include but are not limited to the following examples.

The invention provides a remote sensing image small sample classification method based on multi-view feature fusion, which is characterized in that a multi-task deep neural network classification model is constructed, and a simple and efficient classification model suitable for a small sample remote sensing image is obtained by designing a self-supervision auxiliary task, nearest neighbor prototype characterization, a multi-view feature fusion supervised classifier and the like and adopting a multi-task training method. As shown in fig. 1, the specific implementation process of the present invention is as follows:

1. generating multi-view remote sensing images

Inputting a training image data set, and performing rotation enhancement processing on all images in the data set, wherein the rotation enhancement processing refers to an operation of rotating each image by (r-1) multiplied by 90 degrees, and r=1, 2,3,4, namely 0 degree respectively,90 degrees, 180 degrees and 270 degrees to obtain images with corresponding visual angles, and recording the rotation-enhanced multi-visual angle image set of the ith image in the data set as the multi-visual angle image setWherein (1)>Corresponding to images at four viewing angles, i=1, 2, …, |ε|, ε represents the training image dataset and|ε| represents the total number of images contained in the dataset.

2. Extracting original features

And (3) performing feature extraction on all the multi-view images obtained after the processing in the step (1) by adopting a ResNet-12 feature extraction network, and performing multi-level depth feature extraction, namely performing convolution and pooling operations of different layers, wherein the feature output of the images under each view is a one-dimensional vector with the length of 640.

ResNet-12 networks are described in the document "K.He, X.Zhang, S.Ren, et al deep Residual Learning for Image, recording in Proceeding of IEEE Conference on Computer Vision and Pattern Recognition,770-778, 2016".

3. Building classification models and model training

And inputting the characteristics of all the multi-view images into a classification model, and performing model overall optimization training in an end-to-end mode to obtain a trained model. The classification model comprises three parallel branches of a full-connection-layer rotation angle classifier, a basic semantic classifier and a multi-view feature fusion semantic classifier.

(1) Full-link rotation angle classifier and rotation angle prediction loss

As different rotation angles are used as the task-irrelevant labels to conduct supervised learning, the movable feature extraction capability of the model is improved, and the invention introduces the rotation angle prediction self-supervision task.

The full-connection layer rotation angle classifier is of a single-layer full-connection+relu activation function structure, the input dimension is 640, and the output dimension is 4. Through which features are mapped to an angle class space, the corresponding rotation angle prediction loss function is as follows:

wherein r=4, represents four views of rotation, R represents the R-th view of rotation, f _θ (·) represents a feature extraction operation,representing the rotation angle classification operation of the full connection layer, [ ·] _r Representing the r element in the vector.

The rotation angle prediction task introduces different rotation angles of the original image as supervision signals, and designs a full-connection layer rotation angle classifier to map the original features output by the feature extractor to an angle class space.

(2) Basic semantic classifier and class distribution consistency loss

According to the invention, another loss function is designed in the self-supervision auxiliary task and through an optimization model, so that semantic prediction probability distribution of the loss function on different view angles of the same original picture is kept consistent, namely, consistency loss of a basic semantic classifier and category distribution is realized.

The basic semantic classifier adopts the nearest neighbor prototype characterization principle, namely, a class center closest to the characteristics of the test image is selected as the semantic class of the test image, and the semantic probability distribution of the test image under each view angle is obtained through the semantic class center.

In order to keep the class probability distribution consistency of the test images in the query set under different rotation view angles, the invention minimizes the Kullback-Leibler (KL) divergence of the probability distribution between each view angle and the average value of all view angles, and the designed corresponding class distribution consistency loss function is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing class distribution consistency loss, P ^r Representing class probability distribution of all query set images enhanced by r×90 DEG rotation, wherein query set Q refers to a set of all test images, +.>N represents the number of categories in each mini-batch, +.>Representing P ^r Probability matrix for |Q| row N column, |>The category probability distribution obtained by r multiplied by 90 DEG rotation enhancement of the ith test image in the query set is represented, i=1, 2, …, |Q|, and|Q| represent the number of images contained in the set Q; p represents the average class probability distribution at all views, according to +.>Calculated, D _KL (||) represents the KL divergence of the computation vector; />C element of (a)>The calculated expression of (2) is as follows:

wherein τ is a scaling factor, the range of values is 128 to 512,class center representing class c obtained by averaging all support set image features under r×90 ° rotation enhancement, support set S representing training images of known tags in each mini-batch, and +.>Expressing Euclidean distance between the ith test image feature and the class center c under r multiplied by 90 DEG rotation enhancement, wherein the value range of c is 1 to N;

the calculation formula of the KL divergence is as follows:

(3) Multi-view feature fusion module and main classification loss

In order to fuse the multi-view features, the invention embeds a new multi-view attention module into a supervised classifier based on nearest neighbor prototype characterization, namely a multi-view feature fusion semantic classifier. The classifier can extract shared information of each test image under different rotation visual angles, so that semantic irrelevant content in the remote sensing images is effectively restrained, and in addition, the classifier can capture strong association information between the test image and a class center in nearest neighbor prototype matching, so that the accuracy of nearest neighbor prototype matching is improved;

the multi-view feature fusion semantic classifier is a transformer structure,the fused test image features and class center features are obtained, and class probability distribution of each test image is output based on the nearest neighbor prototype characterization principle. Specifically, for the multi-view collection corresponding to the ith test image in the query setThrough the feature extractor, R one-dimensional feature vectors with a length of d=640 can be obtained. Obtaining a multi-view feature map by stitching R feature vectors>Similarly, a multi-view support set feature map and a multi-view query set feature map, respectively, may be represented as F _i ^S And F _i ^Q For each test image in the query set, splicing the corresponding multi-view feature image and all the class centers according to rows to obtain a corresponding augmented multi-view feature image:

wherein, the liquid crystal display device comprises a liquid crystal display device,the multi-view class center feature map of class c is calculated from the average of the support set image features at all view angles, and can be expressed as +.>K is the number of images in the support set, and the transducer structure in the classifier receives the triplet inputs (F, F, F) as (Query, key, and Value), respectively. For the ith test image in the query set, a corresponding augmented multi-view feature map can be calculated by equation (17), and the process of transform feature fusion can be expressed as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,W _Q ,W _K and W is _V Three fully connected layers are shown, respectively. By means of->Splitting equally according to the rows to obtain a fused test image characteristic diagram +.>And a fused class center feature map

Next, the process will be describedAnd->Expanded to +.2, 3 dimension>Where m=r×d. Further, the distance vector between the fused test image feature map and the fused class center feature map may be expressed as:

wherein, the liquid crystal display device comprises a liquid crystal display device,d (·) represents the Euclidean distance function, row _j Representing the j-th row of the matrix;

and finally, adopting a nearest neighbor prototype characterization principle, and selecting a class center closest to the nearest neighbor prototype as a prediction class of the test image.

wherein, the liquid crystal display device comprises a liquid crystal display device,representing multi-view feature fusion main classification loss, y _i True semantic tags representing the ith test image in the query set, [ D ] _i ] _c Representing the euclidean distance between the ith test image feature in the query set and the class center of class c.

(4) Total loss of model

For the total loss calculation of the classification model, the invention adopts the joint loss of the weighted summation of the rotation angle prediction loss function, the class distribution consistency loss function and the main classification loss function fused by the multi-view characteristics, namely:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the total loss of the classification model network, beta being the rotation angle predictive loss +.>The weight of the term exceeds the parameter, the value range is 1 to 5, and gamma is the consistency loss of category distribution +.>The weight of the item exceeds the parameter, and the value range is 10 to 50.

4. Remote sensing image classification prediction

Inputting the remote sensing image data set to be processed into the classification model trained in the step 3, wherein the final class prediction result of each image is obtained by a multi-view feature fusion semantic classifier in the classification model.

The effect of the invention can be further illustrated by the following simulation experiment:

1. simulation conditions

The experimental equipment in this embodiment is specifically configured as follows: central processing unit[email protected], memory 64GB, image processor +.>GeForce GTX 1080Ti, operating system Ubuntu2016, and simulation using a PyTorch deep learning framework.

There are two data sets used in the simulation: NWPU-RESISC45 dataset and WHU-RS19 dataset. The NWPU-RESISC45 dataset was proposed by Cheng et al in the literature "g.cheng, j.han, and x.lu.remote Sensing Image Scene Classification: benchmark and State of the art. In Proceeding of the IEEE,105 (10), 1865-1883,2017," comprising 31500 high definition remote sensing pictures of 256 x 256, involving 45 categories of scenes, 700 pictures in each scene. The WHU-RS19 dataset was proposed by Sheng et al in literature "G.Sheng, W.Yang, T.Xu, et al high-Resolution Satellite Scene Classification Using aSparse Coding Based Multiple Feature coding. International Journal of Remote Sensing,33 (8), 2395-2412,2012," comprising 1005 high definition remote sensing pictures of 600 x 600, involving 19 categories of scenes, at least 50 pictures in each scene.

In the experiment, the 45 categories of the NWPU-RESISC45 data set are divided into 3 groups of 25/10/10 and respectively used as a training set, a verification set and a test set; the 19 categories of the WHU-RS19 dataset were divided into 3 groups of 9/5/5 as training set, validation set and test set, respectively.

2. Emulation content

Firstly, training the end-to-end multi-task deep learning framework by using data of a training set to obtain and store a trained model; then, the trained models are tested by using data in the test set, all images in the test set are divided into a plurality of mini-batches, each mini-batch is divided into a support set (category is known) and a query set (category is unknown), in order to evaluate the performance of the models under different numbers of marked images, a 5-way1-shot task (the support set comprises 5 categories, 1 sample in each category) and a 5-way 5-shot task (the support set comprises 5 categories, 5 samples in each category) are respectively designed, and the average classification accuracy of a 95% confidence interval is respectively calculated as the final evaluation performance.

In order to prove the effectiveness of the method, 5 existing algorithms, namely a ProtoNet algorithm, an SPNet algorithm, a DANet algorithm, an SGMNet algorithm and a TSC algorithm, are respectively selected as comparison algorithms. The ProtoNet algorithm is proposed in literature "J.Snell, K.Swersky, and r.zemel.protometric Networks for Few-Shot learning.Advances in Neural Information Processing Systems, 2017"; SPNet algorithm is proposed in document "G.Cheng, L.Cai, C.Lang, X.Yao, et al SPNet: siamese-Prototype Network for Few-Shot Remote Sensing Image Scene classification.IEEE Transactions on Geoscience and Remote Sensing,60,1-11,2022"; DANet algorithm is proposed in document "M.Gong, J.Li, Y.Zhang, et al, two-Path Aggregation Attention Network With Quad-Patch Data Augmentation for Few-Shot Scene classification.IEEE Transactions on Geoscience and Remote Sensing,60,1-16,2022"; SGMNet algorithm is proposed in document "B.Zhang, S.Feng, X.Li, et al SGMNet: scene Graph Matching Network for Few-Shot Remote Sensing Scene classification.IEEE Transactions on Geoscience and Remote Sensing,60,1-15,2022"; the TSC algorithm is set forth in the document "Q.Zeng and J.Geng.task-Specific Contrastive Learning for Few-Shot Remote Sensing Image Scene classification.ISPS Journal of Photogrammetry and Remote Sensing 191,143-154,2022".

The average classification accuracy calculation results for the different algorithms are shown in table 1. It can be seen that on two data sets, the average classification accuracy of the method in the two task settings of 5-way1-shot and 5-way 5-shot is higher than that of other algorithms.

TABLE 1

Fig. 2 and 3 show the classification confusion matrix of the method of the present invention on two data sets, wherein fig. 2 (a) and fig. 2 (b) are the classification confusion matrices of the 5-way1-shot task and the 5-way 5-shot task on NWPU-RESISC45 data set, respectively, and fig. 3 (a) and fig. 3 (b) are the classification confusion matrices of the 5-way1-shot task and the 5-way 5-shot task on WHU-RS19 data set, respectively. In the figure, the abscissa and ordinate are class labels in the test set, and the class labels comprise a plurality of classes such as Airport and the like, and the element of the ith row and the jth represents the probability that the model predicts the image belonging to the class i as the class j. As can be seen from fig. 2 and fig. 3, the probability of correctly classifying each category is relatively stable and approaches to the overall average classification accuracy of the data set, which indicates that the method of the invention can obtain higher classification accuracy.

According to the invention, the rotation insensitivity characteristic of the remote sensing image is fully excavated through a small sample classification framework based on multi-view characteristic fusion under nearest neighbor prototype representation, firstly, the deep features rich in the remote sensing image are extracted by utilizing a full convolution network, and then the movable feature extraction capability and generalization capability of the model are improved through two self-supervision auxiliary tasks and a multi-view characteristic fusion main classifier.

Claims

1. A remote sensing image small sample classification method based on multi-view feature fusion is characterized by comprising the following steps:

step 1: inputting a training image data set, and performing rotation enhancement processing on all images in the data set, wherein the rotation enhancement processing refers to respectively rotating each image by 0 DEG, 90 DEG, 180 DEG and 270 DEG to obtain an image with a corresponding view angle, and recording the ith image in the data set as a multi-view image set obtained after rotation enhancement of the ith image in the data setWherein (1)>Corresponding to images under four viewing angles respectively, i=1, 2, …, |epsilon|, epsilon| represents a training image dataset, and|epsilon| represents the total number of images contained in the dataset;

wherein r=4, represents four views of rotation, R represents the R-th view of rotation, f _θ (·) represents a feature extraction operation,representing the rotation angle classification operation of the full connection layer, [ ·] _r Representing the r element in the fetch vector;

wherein, the liquid crystal display device comprises a liquid crystal display device,representing class distribution consistency loss, P ^r Representing class probability distribution of all query set images enhanced by r×90 DEG rotation, wherein query set Q refers to a set of all test images, +.>N represents the number of categories in each mini-batch during training, +.>Representing class probability distribution obtained by r×90 ° rotation enhancement of the ith test image in the query set, i=1, 2, …, |q| representing set Q packageThe number of images contained; p represents the average class probability distribution at all views, according to +.>Calculated, D _KL (||) represents the KL divergence of the computation vector; />C element of (a)>The calculated expression of (2) is as follows:

the calculation formula of the KL divergence is as follows:

the multi-view feature fusion semantic classifier adopts a transformer structure, obtains fused test image features and class center features through the transformer structure, and outputs the class of each test image based on the nearest neighbor prototype characterization principleThe probability distribution is as follows: splicing the R eigenvectors to obtain a multi-view eigenvector F _i For the support set image, recording the obtained multi-view characteristic diagram as F _i ^S For the query set image, recording the obtained multi-view characteristic diagram as F _i ^Q I represents the image number in the collection; then, for each test image in the query set, the corresponding multi-view feature images and all the class centers are spliced according to rows to obtain the corresponding augmented multi-view feature images

wherein, the liquid crystal display device comprises a liquid crystal display device,representing multi-view feature fusion main classification loss, y _i True semantic tags representing the ith test image in the query set, [ D ] _i ] _c Represent the first of the set of queriesEuclidean distance between i test image features and class center of class c;

the total loss function of the classification model is as follows: