CN114898157A

CN114898157A - Global learning device and method for hyperspectral image classification

Info

Publication number: CN114898157A
Application number: CN202210563560.2A
Authority: CN
Inventors: 党兰学; 刘崇阳; 侯彦娥; 左宪禹; 刘扬; 田军锋; 林英豪; 周黎鸣
Original assignee: Henan University
Current assignee: Henan University
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2022-08-12

Abstract

The invention provides a global learning device and method for hyperspectral image classification. The device includes: an encoder and a decoder; the encoder sequentially comprises a spectrum dimension adjusting layer, a first feature extraction layer and a second feature extraction layer according to an image processing sequence; the first feature extraction layer comprises three MLBSA structure layers which are stacked together; the MLBSA structural layer comprises three shuffling spectral attention SSA modules, two MLB layers and a down-sampling layer; the input of the first SSA module passes through a Zero-padded convolution module and then is subjected to addition fusion operation with the output of the down-sampling layer to obtain an output which is used as the output of the MLBSA structural layer; the decoder sequentially comprises a first up-sampling layer, a Concat layer and an output layer according to an image processing sequence; the output of the first MLBSA structural layer in the first feature extraction layer is used as the input of the first upper sampling layer, the output of the second feature extraction layer and the output of the first upper sampling layer are fused through a Concat layer, and the fusion result is processed through the output layer to complete the classification of the hyperspectral images.

Description

Global learning device and method for hyperspectral image classification

Technical Field

The invention relates to the technical field of hyperspectral image classification, in particular to a global learning device and method for hyperspectral image classification.

Background

The hyperspectral imaging technology can simultaneously detect two-dimensional geometric space information and one-dimensional continuous spectral information of a target object, so that the hyperspectral image has the characteristic of integrating maps. The geometric spatial information can reflect external characteristics such as size and shape of the target object, and the spectral information can reflect physical structure and chemical composition inside the target object. Therefore, the hyperspectral remote sensing is widely applied to the fields of rock and mineral substance detection, marine plant detection, water resource application, land resource utilization and the like.

How to construct a more accurate and effective classification method is a key problem in the application of the hyperspectral remote sensing technology. Traditional classification algorithms, such as Support Vector Machine (SVM), three-dimensional wavelet transform, gaussian mixture, etc., generally adopt a way of band selection and feature extraction to reduce the dimensionality of an original image, and project the image to a lower-level feature space. The methods often change the wave band correlation of the original image, lose part of spectral information, and cannot fully extract abstract features in the hyperspectral image, thereby influencing the classification accuracy.

In recent years, with the application and development of deep learning technology, an algorithm model constructed based on a Convolutional Neural Network (CNN) has been widely applied to image classification (left Y, bottom l.gradient-based learning applied to document recognition [ J ]. Proceedings of the IEEE,1998, 86(11): p.2278-2324.), speech recognition, target detection, image semantic segmentation, and other fields, and the CNN shows a strong feature extraction capability. More and more researchers use CNN to replace the traditional classification method and are applied to the classification of the hyperspectral images. Current CNN-based classification models evolve towards deeper or wider levels of complex architectures. Although good results are achieved to some extent, the deep layer means that the network model has more parameters, which not only increases the calculation overhead and has low classification speed, but also has higher requirements on computer hardware equipment.

Disclosure of Invention

In order to improve the classification accuracy and the classification speed of the hyperspectral images, the invention provides a global learning device and method for hyperspectral image classification.

In one aspect, the present invention provides a global learning apparatus for hyperspectral image classification, comprising: an encoder and a decoder; the encoder sequentially comprises a spectrum dimension adjusting layer, a first feature extraction layer and a second feature extraction layer according to an image processing sequence; the first feature extraction layer comprises three MLBSA structural layers stacked together; the MLBSA structural layer comprises three shuffling spectral attention SSA modules, two MLB layers and a down-sampling layer; wherein the SSA module and the MLB layer are stacked to cross each other; the downsampling layer is used as the last sublayer of the MLBSA structural layer; the input of the first SSA module passes through a Zero-padded convolution module and then is subjected to addition fusion operation with the output of the down-sampling layer, and the output is used as the output of the MLBSA structural layer; the MLB layer represents a modified linear bottleneck layer;

the decoder sequentially comprises a first up-sampling layer, a Concat layer and an output layer according to an image processing sequence; and the output of the first MLBSA structural layer in the first feature extraction layer is used as the input of the first up-sampling layer, the output of the second feature extraction layer and the output of the first up-sampling layer are fused through the Concat layer, and the fusion result is processed through the output layer to complete the classification of the hyperspectral images.

Further, the spectral dimension adjustment layer comprises three sublayers, namely an SSA module, a 1 × 1 convolution layer and an MLB layer from a shallow layer to a deep layer.

Further, the MLB layer comprises a first convolution module, a second convolution module and a third convolution module which are stacked together in sequence; the first convolution module and the second convolution module are sequentially composed of a convolution layer, a GN layer and a ReLU layer from a shallow layer to a deep layer; the third convolution module includes a convolution layer and a GN layer in this order.

Further, the second feature extraction layer comprises an SSA module, a first branch extraction layer for extracting global information, a second branch extraction layer for extracting local information, a feature fusion layer and a second up-sampling layer; the output of the SSA module passes through the first branch extraction layer and the second branch extraction layer respectively; and the output of the two branch extraction layers is subjected to feature fusion through the feature fusion layer, and the output of the feature fusion layer after passing through the second up-sampling layer is taken as the output of the second feature extraction layer.

Further, the first branch extraction layer comprises two connected cross-attention CCA modules.

Further, the second branch extraction layer adopts a cavity space pyramid pooling ASPP structure.

Further, the output layer comprises three sublayers, and two fourth convolution modules and a 1 × 1 convolution layer are sequentially arranged from the shallow layer to the deep layer; the fourth convolution module is composed of a convolution layer, a GN layer and a ReLU layer from a shallow layer to a deep layer in sequence.

In another aspect, the present invention provides a hyperspectral image classification method based on the above apparatus, including:

dividing a data set into a training set, a verification set and a test set by adopting a universal global random layering UGSS sampling strategy; the data set is a set formed by all extracted ground object sample data after ground object samples are extracted from the hyperspectral image;

training the device according to any one of claims 1 to 7 by using a training set and a validation set to obtain a trained classification model;

and classifying the test set by using the classification model.

Further, the SSA module processes the input data by sequentially using formula (1), formula (2), and formula (3):

s _c ＝F _ex (z _c ,W)＝W ₂ (ReLU((GN(W ₁ (Shuffle(z _c )))))) (2)

wherein z is _c Represents the value of all the pixel values of the c-th band after encoding, H and W represent the height and width of the hyperspectral image, respectively, u _c (i, j) represents the pixel of the ith row and the jth column in the c wave band, Shuffle represents a Shuffle function for performing an operation of scrambling the spectral dimension to increase interactivity, W ₁ And W ₂ Is meant to denote two fully connected layers, s _c Represents the intermediate output of the c-th band after being processed by the SSA module,

which represents the final output of the c-th band after processing by the SSA module.

The invention has the beneficial effects that:

(1) on the basis of a traditional linear bottleneck, the hyperspectral image is improved according to the characteristics of the hyperspectral image, an improved linear bottleneck MLB is designed, an MLBSA (layered sparse coding buffer) structure layer is designed on the basis, and the characteristic information of the hyperspectral image can be fully extracted by stacking the MLBSA structure to extract the characteristics;

(2) fully extracting local and global spatial information by using a second feature extraction layer with a double-branch structure at the tail end of the encoder; and the space-spectrum information of the hyperspectral image can be fully extracted by matching with an SS (shuffling mechanism) attention mechanism layer in the whole learning device;

(3) adding a shortcut connection behind the first MLBSA structure layer, performing up-sampling on the MLBSA structure layer, and combining the up-sampled output with the up-sampled output in the second characteristic layer, thereby fully utilizing low-level characteristics and high-level characteristics;

(4) the classification method is different from a classification method using a data cube as input, a universal global random layering UGSS sampling strategy is adopted, a complete image is input into a classification model every time, global space-spectrum information can be fully extracted, and high classification speed is achieved while high precision is guaranteed.

Drawings

Fig. 1 is a schematic structural diagram of a global learning apparatus for hyperspectral image classification according to an embodiment of the present invention;

FIG. 2 is a structural diagram of an MLBSA structural layer provided in the embodiment of the present invention;

fig. 3 is a structural diagram of an SSA module provided in an embodiment of the present invention;

FIG. 4 is a diagram illustrating classification results of different models on an IP data set according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating classification results of different models on a PU data set according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating classification results of different models on an SA data set according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

With reference to fig. 1 and 2, the present invention provides a global learning apparatus for hyperspectral image classification, including: an encoder and a decoder;

the encoder sequentially comprises a spectrum dimension adjusting layer, a first feature extraction layer and a second feature extraction layer according to an image processing sequence; the first feature extraction layer comprises three MLBSA structural layers which are stacked together; the MLBSA structural layer comprises three shuffling spectral attention SSA modules, two MLB layers and a down-sampling layer; wherein the SSA module and the MLB layer are stacked to cross each other; the downsampling layer is used as the last sublayer of the MLBSA structural layer; the input of the first SSA module passes through a Zero-padded convolution module and then is subjected to addition fusion operation with the output of the down-sampling layer to obtain the output of the MLBSA structural layer; the MLB layer represents a modified linear bottleneck layer; the down-sampling layer is used for extracting abstract features, so that computing resources are saved; MLBSA: modified Linear Bottlen and spectral entry; SSA: the Shuffle spectral Attention; MLB: modified Linear Bottlen.

In the embodiment of the invention, the MLBSA structures are stacked for feature extraction, and the quick connection is added behind the first MLBSA structure layer, so that the MLBSA structure is subjected to up-sampling and is combined with the output of the second feature extraction layer in the encoder, and thus, the low-level features and the high-level features are fully utilized.

In addition, due to the existence of the down-sampling layer, the input and output dimensions of the MLBSA structural layer are different, so for the residual structure, a Zero-padded convolution module is used.

As an implementation, as shown in fig. 1, the spectral dimension adjustment layer includes three sublayers, which are the SSA module, the 1 × 1 convolution layer and the MLB layer in sequence from the shallow layer to the deep layer.

Specifically, the spectrum dimension adjusting layer is adopted to perform initial processing on the input hyperspectral image, compared with the traditional convolution from high dimension to low dimension, the spectrum dimension adjusting layer in the embodiment of the invention adopts a 1 × 1 convolution layer to perform expansion operation, and maps the low dimension feature space into the high dimension space, so that the tensor dimension cannot be reduced.

As an implementation, as shown in fig. 2, the MLB layer includes a first convolution module, a second convolution module, and a third convolution module stacked together in sequence; the first convolution module and the second convolution module are sequentially composed of a convolution layer, a GN layer and a ReLU layer from a shallow layer to a deep layer; the third convolution module includes a convolution layer and a GN layer in this order. GN: group Normalization, Group Normalization.

In the structural design of the MLB layer, a non-linear activation function ReLU layer is used instead of a linear activation function as in the conventional method when mapping data from a low-dimensional space to a high-dimensional space, because: the high dimensionality of the hyperspectral image itself; after mapping the low-dimensional tensor to the high-dimensional tensor, the dimensionality of itself is high enough, and not much information is lost by using the non-linear activation function. Meanwhile, in the process of extracting the features, enough information can be extracted by using the high-dimensional tensor to extract the features, so that the extracted features are more beneficial to classification, therefore, in a third convolution module with low-dimensional characteristics, if a nonlinear activation function is still used, more useful information is lost, and therefore, an activation function is not adopted in the third convolution module.

As an implementable manner, as shown in fig. 1, the second feature extraction layer includes an SSA module, a first branch extraction layer for extracting global information, a second branch extraction layer for extracting local information, a feature fusion layer, and a second upsampling layer; the output of the SSA module passes through the first branch extraction layer and the second branch extraction layer respectively; and the output of the two branch extraction layers is subjected to feature fusion through the feature fusion layer, and the output of the feature fusion layer after passing through the second upper sampling layer is used as the output of the second feature extraction layer.

In this embodiment of the present invention, the first branch extraction layer includes two connected CCA modules; the second branch extraction layer adopts an ASPP structure. The CCA module is a simplification of a non-local module and can extract spatial information from the whole situation; ASPP extracts local spatial information from different sized receptive fields. CCA: cross-attention, cross-attention; the modular structure can be referred to as "z.huang, x.wang, l.huang, c.huang, y.wei, and w.liu," CCNet: Criss-Cross attachment for Semantic Segmentation in Proceedings of the IEEE/CVF International Conference on Computer Vision,2019, pp.603-612. ASPP: atrous Spatial Pyramid Pooling; the structure can be referred to as "l. — c.chen, g.papandreou, i.kokkinos, k.murphy, a.l.j.i.t.o.p.a.yuille, and m.intellgence," Deeplab: the creation of a Semantic image segment with deep connected networks, atomic connections, and full connected crfs, "IEEE Transactions on Pattern Analysis and Machine understanding, vol.40, No.4, pp.834-848,2017.

Local and global spatial information can be fully extracted by adopting a double-branch extraction layer in the second feature extraction layer; and the space-spectrum information can be fully extracted by matching with an SSA module used in the global learning device.

As an implementation, as shown in fig. 1, the output layer includes three sublayers, namely, two fourth convolution modules and a 1 × 1 convolution layer from the shallow layer to the deep layer; the fourth convolution module is composed of a convolution layer, a GN layer and a ReLU layer from a shallow layer to a deep layer in sequence.

The appropriate cross-channel interaction is very important for learning the channel attention with high performance and high efficiency, the purpose of the two FC layers of the existing SEBlock is to capture the non-linear cross-channel interaction, but due to the fact that dimension reduction exists in the existing SEBlock, the effect is negatively influenced to a certain extent, so that the embodiment of the invention designs an SSA module, the SSA module adds a Shuffle (Shuffle) operation before global pooling, namely, the spectral dimension of HSI is shuffled before dimension reduction, and then the spectral dimension is normalized by using a GN layer. So, the SSA module can make and carry out abundant information interaction between the passageway, cooperates first characteristic extraction layer again, and effectual spectral feature and spatial feature can constantly be learnt in the combination of the two, have promoted the effect of model.

Example 2

On the basis of the above embodiments, the embodiments of the present invention provide a global learning device for hyperspectral image classification, and provide parameter settings of each network layer or each module of the device.

In the embodiment of the present invention, in the MLB layer, the parameters of the convolutional layer of the first convolutional module are set as follows: the convolution kernel is 1, Stride is 1, Padding is 1; the parameters of the convolutional layer of the second convolutional module are set as: the convolution kernel is 3, Stride is 1, Padding is 1; the parameters of the convolutional layer of the third convolutional module are set as: the convolution kernel is 3, Stride is 1, Padding is 0.

In the MLBSA structural layer, the structure of the down-sampling layer is as follows: adopting a convolution layer with a convolution kernel of 3, Stride of 2 and Padding of 1; under this parameter setting, since Stride is 2 and convolution layer with Padding is 1, the spatial size is reduced, and the spectral dimension is increased, in practical application, an average pooling with size of 2 × 2 may be performed after the Zero-padded convolution module to further ensure that the input and output dimensions of the MLBSA structure layer are the same, so that the two layers can perform an additive fusion operation, such as [ - ] in fig. 2. By using the Zero-padded combined with the average pooled skip-join approach, normal add operation is guaranteed without adding additional parameters.

In the ASPP structure employed in the second branch extraction layer, the expansion factors of the hole convolution used in the hole space pyramid pooling are set to be rate 12, rate 24, and rate 36, respectively.

The characteristic fusion layer sequentially comprises a Concat layer and a convolution module consisting of a 1 multiplied by 1 convolution layer, a GN layer and a ReLU layer; and the outputs of the two branch extraction layers after fusion by the Concat layer pass through the convolution module, and the output of the convolution module is the output of the feature fusion layer.

The first and second upsampling layers are set to 2 times upsampling and 8 times upsampling, respectively.

Before the output of the first up-sampling layer is fused with the output of the second feature extraction layer, the output of the first up-sampling layer passes through a convolution module consisting of a 1 × 1 convolution layer, a GN layer and a ReLU layer, and the output of the convolution module is fused with the output of the second feature extraction layer through the Concat layer.

In the output layer, the convolution kernels of the convolution layers in the two fourth convolution modules are both 3.

Example 3

The embodiment of the invention provides a hyperspectral image classification method, which adopts the global learning device for hyperspectral image classification in the embodiments and comprises the following steps:

s301: dividing a data set into a training set, a verification set and a test set by adopting a universal global random layering UGSS sampling strategy; the data set is a set formed by all extracted surface feature sample data after surface feature sample extraction is carried out on the high-spectrum image;

the UGSS sampling strategy takes the whole hyperspectral image as the input of a classification model, and the specific idea is as follows: a fixed number of training samples is set and if the total number of samples of a feature is insufficient to provide the fixed number of samples, proportional extraction is used. Meanwhile, during training, the extracted training samples are divided into a training set and a verification set again according to a set proportion. All the extracted samples are divided into groups with the specified parameter quantity according to the set parameters, data of the whole graph is input during input, but only the extracted samples are updated during updating, so that the effect of layered training is achieved, the model can be ensured to be converged during training, and the mode of extracting the samples enables a UGSS sampling strategy to be universally used for most training tasks.

For example, for a data set, before training the network, samples of all ground features are extracted to make a training set and a verification set, and the rest is taken as a test set, if 200 samples are extracted and some ground feature samples are only 150, the invention extracts according to a set proportion, for example, the proportion is 0.8, then 120 samples are extracted to be the training set and the verification set, and the rest is taken as the test set, and the extracted 120 samples are divided into the training set and the verification set according to the set proportion, and the extracted training set samples are evenly distributed into each group according to the set group (a hyper-parameter). It should be noted that, in the training, although the training set, the verification set, and the test set are divided, all samples are still input for training each time, only the sample corresponding to the training set participates in the gradient descent operation, and the rest samples do not participate.

S302: training the global learning device by utilizing a training set and a verification set to obtain a trained classification model;

specifically, model parameters are trained by adopting a random gradient descent algorithm based on a training set, and a model with the minimum loss rate stored on a verification set is an optimal classification model.

S303: and classifying the test set by using the classification model.

As an implementation manner, the SSA module processes input data by sequentially using formula (1), formula (2), and formula (3); fig. 3 shows the structure of the SSA module.

s _c ＝F _ex (z _c ,W)＝W ₂ (ReLU((GN(W ₁ (Shuffle(z _c )))))) (2)

Wherein z is _c Represents the value of all the pixel values of the c wave band after being coded, H and W represent the height and width of the hyperspectral image respectively, u _c (i, j) represents the pixel of the ith row and the jth column in the c wave band, where Shuffle represents a Shuffle function for performing an operation of scrambling the spectral dimension to increase interactivity, W ₁ And W ₂ Is meant to denote two fully connected layers, s _c Representing via SSA moduleThe processed middle output of the c-th band,

On the basis of the above embodiment, before step S301, the method further includes: carrying out zero-mean standardization operation on a marking sample of an input hyperspectral image;

specifically, the zero-mean normalization operation can be expressed by equation (4):

wherein the content of the first and second substances,

representing the pixel value of the ith row and the jth column in the nth band signature,

representing the mean, σ, of all pixels in the nth band ⁿ Indicating the standard deviation of the nth waveband pixel value, W, H and N respectively indicating the width, height and total waveband number of the hyperspectral image.

In order to verify the effect of the device and the method, the invention also provides the following experimental data.

1. Experimental Environment

Hardware equipment: CPU is Intel (R) Xeon (R) E5-2682 [email protected], GPU is NVIDIA GeForce RTX 3060;

a software platform: the Python version is 3.7, the Cuda version is 11.0.194, and the model structure is built by using a deep learning framework with the Pythroch version 1.8.0.

2. Experimental data set

In order to measure the classification effect of the invention, three reference hyperspectral data sets of Indian Pipes (IP), Pavia University (PU) and Salinas (SA) are selected for experimental study. The details of the three data sets are shown in table 1.

TABLE 1 detailed information of the three data sets

3. Experimental setup

In order to make the size of the input image meet the down-sampling requirement, the spectral dimension of the input image is kept at a multiple of 16, and zero padding is used, the number of groups normalized by the group is set to 16, and the number of channels after each down-sampling of the group is guaranteed to be a multiple of 16.

For all experiments, the global learning device was optimized using SGD and added with poly learning rate strategy, with initial learning rate of 0.001 multiplied by

Wherein the power value is set to 0.9, the momentum is set to 0.9, no data expansion strategy is adopted, the fixed sample number extracted by each data set is 200, if the total number of samples of a certain feature is insufficient, the set proportion parameter is 0.8, the training set proportion ratio is 0.75 in the extracted sample number, and the verification set proportion ratio is 0.25. For the selected samples, the number of groups divided was set to 10. To evaluate the performance of the method of the invention, three general indicators were used: overall Accuracy (OA), Average Accuracy (AA), and Kappa coefficient (Kappa).

4. Contrast between different classification models

Several typical classification models in recent years, SVM-RBF (literature 1), 1D-CNN (literature 2), M3D-DCNN (literature 3), SSRN (literature 4), DBDA (literature 5) and the classification model of the invention (abbreviated as deployed) are selected for detailed comparison. Table 2, table 3 and table 4 show the final classification results for different models on IP, PU and SA datasets, respectively. Wherein:

document 1: kuo B C, Ho H, Li C H, et al. A Kernel-Based feed Selection Method for SVM With RBF Kernel for Hyperspectral Image Classification [ J ]. Selected Topics in Applied elevation estimates and removal Sensing, IEEE Journal of 2014,7(1): 317-;

document 2: wei H, Yangyu H, Li W, et al. deep capacitive Neural Networks for Hyperspectral Image Classification [ J ]. Journal of Sensors,2015,2015:1-12.

Document 3: m.he, b.li and h.chen, "Multi-scale 3D deep volumetric neural network for hyperspectral Image classification,"2017IEEE International Conference on Image Processing (ICIP), Beijing,2017, pp.3904-3908, doi: 10.1109/icip.2017.8297014.

Document 4: Z.Zhong, J.Li, Z.Luo, M.J.I.T.o.G.Chapman, and R.Sensing, "Spectral-spatial residual network for hyperspectral image classification," A3-D deep learning frame, "vol.56, No.2, pp.847-858,2017.

Document 5: li, S.ZHEN, C.Duan, Y.Yang, and X.J.R.S.Wang, "Classification of hyperspectral image based on double-branch dual-authentication network," vol.12, No.3, p.582,2020.

TABLE 2 comparison of classification accuracy of different models on IP datasets

Class of ground object	SVM-RBF	1D-CNN	M3D-DCNN	SSRN	DBDA	Proposed
							C1	58.77	92.22	96.67	100	88.1	100
C2	77.96	81.68	86.9	98.11	98.62	99.26
							C3	71.68	81.44	90.59	98.02	98.34	97.79
C4	19.93	92.97	99.73	91.62	94.22	100
							C5	89.84	94.59	98.34	99.08	99.09	100
C6	97.01	97.81	99.74	99.61	98.29	100
							C7	75.88	98	100	96.67	83.83	100
C8	99.13	98.67	99.93	99.93	99.96	100
							C9	51.78	100	100	100	93.33	100
C10	72.46	90.4	91.27	92.71	94.45	99.75
							C11	90.22	72.36	78.29	99.33	99.45	99.34
C12	73.16	91.6	95.24	98.07	98.91	99.47
							C13	72.62	100	100	96.67	93.81	100
C14	97.9	91.35	95.76	99.78	99.79	100
							C15	57.92	85.16	98.44	97.03	98.59	100
C16	78.59	98.33	100	90.62	85.08	100
							OA(％)	82.32±0.753	84.07±1.05	88.95±1.46	98.16±0.46	98.48±0.521	99.46±0.09
AA(％)	74.05±1.71	91.66±0.95	95.68±0.64	97.33±0.955	95.24±2.277	99.73±0.03
							Kappa x 100	79.38±0.872	81.44±1.14	87.07±1.66	97.83±0.543	98.20±0.615	99.35±0.1

TABLE 3 comparison of classification accuracy of different models on PU data set

Class of ground object	SVM-RBF	1D-CNN	M3D-DCN	SSRN	DBDA	Proposed
							C1	96.83	85.27	94.59	99.9	99.56	99.98
C2	97.41	88.08	96.67	99.11	99.92	99.97
							C3	76.04	82.91	93.52	92.14	98.98	100
C4	87.68	96.49	98.38	99.47	97.27	99.63
							C5	97.56	99.74	100	100	99.9	100
C6	76.75	91.5	97.73	94.2	96.27	100
							C7	66.36	91.6	97.69	98.17	99.32	100
C8	85.76	85.15	94.18	97.4	97.1	99.96
							C9	99.87	99.88	99.83	99.78	97.55	100
OA(％)	90.34±0.678	88.78±2.25	96.41±0.84	97.9±1.83	98.90±0.607	99.95±0.00
							AA(％)	87.14±0.694	91.18±0.86	96.95±0.43	97.80±1.3	98.46±0.630	99.95±0.00
Kappa x 100	87.22±0.868	85.26±2.78	95.2±1.11	97.20±2.4	98.53±0.809	99.94±0.00

TABLE 4 comparison of classification accuracy of different models on SA data set

From the experimental results in tables 2 to 4, it can be seen that the OA, AA and Kappa values of the classification model proposed by the present invention are higher than those of the other classification models in all three data sets. For the SVM-RBF and 1D-CNN models using only spectral feature classification, the accuracy is significantly lower than that of the M3D-DCNN, the SSRN, the DBDA and the model provided by the invention using the space-spectral feature classification, which shows that the higher classification accuracy cannot be achieved by using only spectral feature classification. In the IP data set, the information of the two feature 2 and 11 are similar, for example, the accuracy of the two features in M3D-DCNN is not ideal, while for the feature 1, 7, 9 with a small number of samples, the accuracy of the feature 7 in SSRN is only 96%, and the accuracy of the feature in DBDA is respectively 88.1%, 83.83%, and 93.33%, which are not ideal, indicating that the two feature lack the ability to process small sample data. The method achieves the precision of more than 99% for the data of the ground features with similar wave bands or the small samples, and shows that the SSA module used by the method can accurately select the wave bands, and the problem of the small samples is well solved by using the UGSS sampling strategy. And the minimal standard deviation of the owner of the method in all classification methods shows that the method has very good stability.

In addition, fig. 4-6 show the classification results of different classification models on IP, PU and SA datasets, respectively. Wherein (a) is a false color image and (b) is a label image. As is clear from fig. 4-6, classification using only spectral features, such as SVM-RBF and 1D-CNN models, produces many noise points, but the method based on spatio-spectral features overcomes this disadvantage, such as M3D-DCNN, SSRN and DBDA models, and achieves better results. Through comparison with the ground real image, the classification effect of the classification model provided by the invention is smoother.

TABLE 5 comparison of parameters, training time and testing time of different classification models on three datasets

As can be seen from the data in Table 5, under the IP data set, the training time of the classification model of the invention is far shorter than that of all the comparison models, and is only 223.7 seconds; in the PU data set, compared with most methods, the classification model has great advantages in training time, the SVM-RBF is simple, and the DBDA code uses a strategy of stopping training in advance, so that the training time of the classification model is longer than that of the SVM-RBF, but the accuracy of the classification model has great advantages compared with that of the two models, and saturation is achieved. In the SA data set, the SVM-RBF and the DBDA are also superior, but the classification model of the invention is very close to the two types of the SVM-RBF and the DBDA. In all data sets, the classification model of the invention ensures the highest classification accuracy. The test time of the classification model of the invention is the shortest for all three data sets, and the test time is only 0.06, 0.33 and 0.21 seconds in the three data sets respectively, which shows that the classification model of the invention can be better applied to related work after training.

In conclusion, the global learning device and the classification method for hyperspectral image classification provided by the invention have the advantages that the high precision is ensured in the classification work, and the classification speed is higher. In addition, the classification model fully extracts the features in the hyperspectral image, and still shows higher classification accuracy on ground objects with smaller sample data size.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. Global learning apparatus for hyperspectral image classification, comprising: an encoder and a decoder; the encoder sequentially comprises a spectrum dimension adjusting layer, a first feature extraction layer and a second feature extraction layer according to an image processing sequence; the first feature extraction layer comprises three MLBSA structural layers which are stacked together; the MLBSA structural layer comprises three shuffling spectral attention SSA modules, two MLB layers and a down-sampling layer; wherein the SSA module and the MLB layer are stacked to cross each other; the downsampling layer is used as the last sublayer of the MLBSA structural layer; the input of the first SSA module passes through a Zero-padded convolution module and then is subjected to addition fusion operation with the output of the down-sampling layer to obtain an output which is used as the output of the MLBSA structural layer; the MLB layer represents a modified linear bottleneck layer;

2. The global learning device for hyperspectral image classification according to claim 1, wherein the spectral dimension adjustment layer comprises three sublayers, namely an SSA module, a 1 x 1 convolutional layer and an MLB layer from a shallow layer to a deep layer.

3. The global learning apparatus for hyperspectral image classification according to claim 1, wherein the MLB layer comprises a first convolution module, a second convolution module and a third convolution module stacked in sequence; the first convolution module and the second convolution module are sequentially composed of a convolution layer, a GN layer and a ReLU layer from a shallow layer to a deep layer; the third convolution module includes a convolution layer and a GN layer in this order.

4. The global learning apparatus for hyperspectral image classification according to claim 1, wherein the second feature extraction layer comprises an SSA module, a first branch extraction layer for extracting global information, a second branch extraction layer for extracting local information, a feature fusion layer and a second up-sampling layer; the output of the SSA module passes through the first branch extraction layer and the second branch extraction layer respectively; and the output of the two branch extraction layers is subjected to feature fusion through the feature fusion layer, and the output of the feature fusion layer after passing through the second upper sampling layer is used as the output of the second feature extraction layer.

5. The global learning apparatus for hyperspectral image classification according to claim 4, wherein the first branch extraction layer comprises two connected cross-attention CCA modules.

6. The global learning apparatus for hyperspectral image classification according to claim 4, wherein the second branch extraction layer adopts a void space pyramid pooling ASPP structure.

7. The global learning device for hyperspectral image classification according to claim 1, wherein the output layer comprises three sublayers, namely two fourth convolution modules and a 1 x 1 convolution layer from a shallow layer to a deep layer; the fourth convolution module is composed of a convolution layer, a GN layer and a ReLU layer from a shallow layer to a deep layer in sequence.

8. The hyperspectral image classification method based on the device of any one of claims 1 to 7 is characterized by comprising the following steps:

dividing a data set into a training set, a verification set and a test set by adopting a universal global random layering UGSS sampling strategy; the data set is a set formed by all extracted surface feature sample data after surface feature sample extraction is carried out on the high-spectrum image;

and classifying the test set by using the classification model.

9. The hyperspectral image classification method according to claim 8, wherein the SSA module processes input data sequentially using formula (1), formula (2) and formula (3):

s _c ＝F _ex (z _c ,W)＝W ₂ (ReLU((GN(W ₁ (Shuffle(z _c )))))) (2)

wherein z is _c Represents the value of all the pixel values of the c wave band after being coded, H and W respectively represent the height and width of the hyperspectral image, u _c (i, j) represents the pixel of the ith row and the jth column in the c wave band, where Shuffle represents a Shuffle function for performing an operation of scrambling the spectral dimension to increase interactivity, W ₁ And W ₂ Is meant to denote two fully connected layers, s _c Represents the intermediate output of the c-th band after being processed by the SSA module,