CN114155443A

CN114155443A - Hyperspectral image classification method based on multi-receptive-field attention network

Info

Publication number: CN114155443A
Application number: CN202111153710.4A
Authority: CN
Inventors: 丁遥; 张志利; 赵晓枫; 蔡伟; 阳能军
Original assignee: Rocket Force University of Engineering of PLA
Current assignee: Rocket Force University of Engineering of PLA
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-03-08

Abstract

The invention relates to a hyperspectral image classification method of a multi-sensor wildcard attention network. Accurately dividing an original hyperspectral image into self-adaptive local areas by using an unsupervised principal component analysis method and simple linear iterative clustering; the spectral characteristics of the pixels are extracted by adopting a double-layer convolutional neural network, so that the number of nodes needing to be calculated is reduced, and the noise of an original hyperspectral image is suppressed; constructing a multi-sensing-field map based on a superpixel cube; extracting local node features and edge features by using a multi-feature attention module and learning important coefficients of the two features by using a graph attention network; and finally, interpreting the image features by using cross entropy loss to obtain a label of each pixel, so that the hyperspectral image is correctly classified. Compared with the prior art, the hyperspectral features can be automatically extracted and classified, and the classification accuracy rate reaches over 93%.

Description

Hyperspectral image classification method based on multi-receptive-field attention network

Technical Field

The invention relates to the technical field of geographical remote sensing, in particular to a hyperspectral image classification method of a multi-sensor wilderness attention network.

Background

The hyperspectral image (HSI) is a three-dimensional image taken by an aerospace vehicle equipped with a hyperspectral imager. Each pixel in the image contains hundreds of different bands of reflectance information, which makes it suitable for many practical applications, such as military target detection, mineral exploration, and agricultural production. The goal of hyperspectral image classification is to classify each pixel in an image into a specific label based on sample features, and these pixels can be represented by different image colors. In recent years, hyperspectral image classification has increasingly become a research hotspot. However, label deficiency, high data dimensionality, spectral similarity, pixel blending, etc. present a significant challenge to hyperspectral image classification.

In the early stages, various machine learning classification methods are applied to HSI classification, such as Support Vector Machine (SVM), k-nearest neighbor (KNN), naive bayes, decision trees, and Extreme Learning Machine (ELM). It is worth mentioning that SVM is one of the most representative and effective machine learning classification methods. With the development of support vector machine methods, support vector machine methods using some kernel transformation techniques, such as multi-kernel learning and Kernel Support Vector Machine (KSVM), are widely used. Meanwhile, sparse representation based classifiers (SRC) have also attracted more and more attention, and a large number of SRC classifier based algorithms have been proposed. However, the above methods are all based on the spectral characteristics of HSI, do not consider the spatial information of HSI, and the classification performance is not ideal. In order to fully utilize spatial information, some classifiers based on spectral-spatial information, such as superpixel segmentation, morphological segmentation and graph construction, have also been studied. However, conventional machine learning classification methods rely heavily on expertise and are unable to learn deep features from HSI, such as the information disclosed in "Spectral-spatial classification of hyperspectral images with a super-pixel-based discrete search model," IEEE trans. geosci. remove series, vol.53, No.8, pp.4186-4201, aug.2015), published by l.fang, s.li, x.kang, and j.a. Benediktsson.

With the development of deep learning and the progress of Artificial Intelligence (AI), deep learning methods have been widely applied in the fields of natural language processing, computer vision, intelligent decision making, and the like. Inspired by the successful application of deep learning, many deep learning classifiers have been developed for HSI classification with excellent performance, such as Stacked Automatic Encoders (SAEs), Recurrent Neural Networks (RNN), and Convolutional Neural Networks (CNN). Compared with a machine learning method, the deep learning classifier can automatically learn deep complex features of the HSI. Among the above deep learning methods, CNN shows a bright foreground and is widely used. In addition, as deep learning progresses, various CNN-based methods are proposed, from one-dimensional CNN to three-dimensional CNN, from single CNN to hybrid CNN, from shallow CNN to deep CNN. However, early CNN methods only learned local spatial information and could not extract deep spatio-spectral features of HSI, such as disclosed in "CNN-based multi-layer spatial-spectral fusion and sample amplification with local and non-local constraints for hyper-spectral image classification," IEEE J.Sel.topics application.earth observer.Remote Sens., vol.12, No.4, pp.1299-1313, Apr.2019, published by J.Feng et al. To overcome these drawbacks, different forms of CNN methods are beginning to emerge. Recently, some advanced CNN techniques such as spectral-spatial attention networks and dynamic convolution kernels have been explored to improve CNN-based approaches. However, CNNs require a large number of training tags to train the network, which contradicts the tag deficiency in HSI and these advanced CNN deep learning models are designed for euclidean data, the inherent correlation between adjacent land covers is often neglected, making the above methods unable to learn the relationship between labeled and unlabeled data, such as that disclosed in "a CNN with scale association and transformed method for hyperspectral identification," trans.

Therefore, the convolutional neural network in both the traditional machine learning method and the deep learning faces certain limitation in the hyperspectral classification.

Disclosure of Invention

In order to overcome the problem that the convolutional neural network faces limitation in hyperspectral classification in the traditional machine learning method and the deep learning method, the invention aims to provide a novel hyperspectral image classification method of a multi-scale receptive field map attention neural network.

The concept and technical solution of the present invention will now be described as follows:

the basic concept of the invention is that in a convolutional neural network, a superpixel segmentation algorithm is adopted to refine local spatial features of original HSI, two layers of one-dimensional CNNs are provided to extract superpixels and spectral features capable of being automatically learned, edges of a graph and local semantic features of a learning graph are introduced into a graph attention network (GAT), a multi-scale receptive field GAT is provided for extracting local-global adjacent node features and edge features, and finally, a graph attention network and a softMax function are utilized to perform multi-receiving feature fusion and pixel label prediction.

The invention relates to a technical solution of a hyperspectral image multi-scale classification method combining a graph structure and a convolutional neural network, which has a superpixel cube structure multi-receptive-field graph and a convolutional neural network and is characterized in that: accurately partitioning the original HSI into adaptive local regions (superpixels) by using a supervised-free Principal Component Analysis (PCA) method and Simple Linear Iterative Clustering (SLIC); the double-layer 1D CNN is adopted to extract the spectral characteristics of the pixels, so that the number of nodes to be calculated is reduced, and the noise of the original HSI is suppressed; constructing a multi-sensing-field map based on a superpixel cube; extracting local node features and edge features by using a multi-feature attention module and learning important coefficients of the two features by using a graph attention network; and fusing each receptive field node-edge characteristic by utilizing a characteristic fusion attention module to provide the characteristics of the classification nodes. Finally, the image features are explained by using cross entropy loss to obtain the label of each pixel, and the method specifically comprises the following steps:

step 1: constructing a spectral-spatial conversion module

Given HSI cube I_B＝{x₁,x₂,…,x_mIs composed of

One pixel, B wave bands, wherein

H respectively represents the spatial width and height of the hyperspectral image; in order to improve the calculation efficiency, the Principal Component Analysis (PCA) method is used for reducing the dimension, and a first principal component is selected to generate a dimension reduction image

Having m pixels and b wavebands, where b < B, r denotes dimensionality reduction; then, the SLIC is adopted to divide the pixels into superpixels, and the local superpixel HSI can be expressed by a mathematical expression as:

wherein

Is represented by comprising n_iA super-pixel of a single pixel,

k is the total number of superpixels; the pixels in the super pixel have strong spectral spatial correlation; in the method of the invention, the superpixels are taken as nodes of the graph; by controlling the number K of the super pixels, the scale of the graph can be controlled, and the calculation complexity of the algorithm is reduced;

step 2: constructing a frequency domain conversion module:

step 2.1: in order to extract the discriminative and robust spectral features for the proposed method, a spectral transformer is provided, whose framework is shown in fig. 1; two layers of 1 x 1CNN kernels are used to extract the spectral values of a single pixel in each band, and in the proposed method, the spatial position p₀The spectral feature vector of the pixel can be written as:

wherein p is₀(X, y) is the spatial position of the pixel in the HSI, X_i(p₀) Representing the spatial position p of the pixel in the ith spectrum₀The spectral value of (d);

step 2.2: output characteristics of convolution layer in the b-th spectral band

Is composed of

Wherein

And

trainable weights (1 × 1 convolution kernels) and bias, respectively, σ (·) is the activation function, i.e., ReLU;

step 2.3: in order to display the corresponding relation between the pixels and the super pixels, an incidence matrix is constructed

In particular, M can be calculated as

Wherein x_iIs I_B"flattening" (HSI) denotes a flattening operation of the hyperspectral image in the spatial dimension. As described in equation (4), a mapping between spatial pixels and superpixels can be achieved

Step 2.4: finally, the average spectral feature of each superpixel is taken as a node feature vector, and the graph node feature can be expressed mathematically as

Wherein H_iIs the ith node feature vector, N_iRepresenting the number of pixels contained in the super-pixel,

is the spectral feature vector of the pixel as shown in equation (2);

and step 3: constructing a multi-feature attention Module

The attention mechanism and the multiple features are very important for extracting the features of the graph, and the features of the neighbor nodes and the edges are extracted through the multi-feature attention module; the multi-feature attention can be divided into three parts, namely neighbor node attention, edge attention and feature fusion attention; the specific details of these three sections are as follows:

step 3.1: neighbor node attention

In order to aggregate information of neighbor nodes, the module adopts a multilayer graph attention mechanism, neighbor node attention coefficients of classified nodes can be obtained, and the output of the first convolution layer can be calculated as

After multilayer convolution, n_iCan be converted into a form that

Where ← represents the assignment symbol, i is the ith hop neighbor of node n, K represents the number of neighbor nodes in the ith hop neighbor of node n,

represents n_ikThe importance coefficient of (a); n is_iIs a neighbor node characteristic of the central node n, n according to equation (6)_iCan be decomposed into a linear sum of the characteristics of the neighboring nodes;

step 3.2: attention at the same time

Edges contain rich semantic information. However, most graph attention networks only focus on the aggregation of adjacent nodes, and lack the extraction of edge features; inspired by the calculation edge in the graph convolution, the step introduces the Gaussian distance a_ijTo define relationships between nodes, i.e.

Wherein h is_iAnd h_jThe spectral characteristics (calculated by equation (4)) of the nodes i, j are represented, and the euclidean distance between the two nodes is represented as | | | h_i-h_j||²,

Represents h_jThe t hop neighbor node set of (1), gamma represents that the experience set is 0.2;

similar to equation (5), the output of the first convolutional layer in side attention can be expressed as

Wherein

Represents the output of the l-1 convolutional layer in edge attention,

learning attention coefficient, a, representing side attention_iCan be converted into

Wherein a is_iIs the ith edge-hopping feature of the center node n,

denotes a_ikThe importance coefficient of (a);

step 3.3: feature fusion attention

In order to comprehensively utilize the edge and node characteristics and provide characteristic fusion attention, the characteristics extracted by the attention module are adaptively fused, and the mathematical expression of n can be expressed as

Wherein

And

respectively representing the important coefficients of adjacent nodes and edges;

as described above, centroid node n may ultimately be represented as

Wherein alpha is_iAnd beta_iRespectively represent n_ikAnd a_ikThe weight coefficient of (2).

And 4, step 4: construction of multiple receptor fields Module

In order to extract the global information of HSI, a multi-branch receptive field aggregation mechanism is designed; the mechanism of multi-receptor field design is shown in FIG. 1; the multi-receptive-field module can avoid the negative influence of the low-order jumping points and learn the remote characteristic information of the HSI; in branch s, the receptive field for node n may be formed

R_i(n)＝R_i-1(n)∪R₁ (12)

Wherein R is₁(n) is a set of 1-hop neighbors of node n, R₀(n) n; the characteristics of the central node n in branch i can finally be represented as

Wherein i is the ith hop neighbor of node n;

and 5: feature fusion attention and decision module

And finally, adopting graphic attention to fuse the multi-scale receptive field characteristics. The output characteristic O of the central node n can be calculated as

O＝σ(∑_i∈Se_i·W^Tnⁱ) (14)

Wherein e_iSignificant coefficients representing features in branch i, S is the number of multi-scale degrees, W is the trainable weight, σ is the activation function, i.e., LeakyReLU; to determine the label of each pixel, the output features O are classified using a softmax classifier, i.e.

Step 6: loss function and model training

Penalizing differences between the network output and the original markup tags using a cross-entropy function, i.e.

Wherein y is_GIs a set of tags; c denotes the number of classes, Y_zfIs a training label matrix; end-to-end training is adopted, and Adam is adopted to update the network parameters of the invention;

in the invention, three hyper-parameters are set, namely the number of the hyper-pixels K, the number L of convolution layers of MFaM, the number T of iteration times and the learning rate lr; the optimal hyper-parameter settings are shown in table 1.

This compares beneficial effect with prior art and is: a one-dimensional CNN is provided to learn the spectral characteristics of the superpixels; secondly, graph edges are introduced into the GAT, a feature fusion mechanism based on the GAT is researched, and the features of the classification nodes are represented by combining neighbor node information and edge information; and thirdly, a multi-scale receptive field mechanism is provided, multi-scale local semantic features are extracted, and local global spatial context information is learned. The hyperspectral image classification method can automatically extract hyperspectral features and complete classification. The classification accuracy reaches more than 93 percent.

Drawings

FIG. 1: the invention relates to a hyperspectral image classification method based on a multi-receptive-field attention network.

The experiment of the invention is that the invention is trained on a pytorch 1.8 by using a GeForce GTX 1080Ti 11G GPU and a computer of a 3.70G Intel i9-10900K CPU. Firstly, performing superpixel segmentation on an original hyperspectral image, and extracting the spectral feature of each superpixel block; then constructing a multi-sensing-field map with the super pixels as nodes; then extracting graph nodes and edge features by using a multi-feature attention module; and finally, interpreting the image characteristics by using cross entropy loss to obtain a label of each pixel, and predicting the node.

Detailed Description

The embodiments of the present invention will now be described in further detail with reference to the accompanying drawings.

The method of the invention comprises four parts: the system comprises a spectrum space transformation module for HSI preprocessing, a multi-sensing-field graph construction module for spectrum space feature extraction, a multi-feature attention module, a feature fusion attention and decision module. The process of the method of the invention is realized as follows: first, the original HSI is accurately divided into adaptive local regions (superpixels) by using unsupervised Principal Component Analysis (PCA) method and Simple Linear Iterative Clustering (SLIC); the spectral characteristics of the pixels are extracted by adopting the double-layer 1D CNN, so that the number of nodes needing to be calculated is reduced, and the noise of the original HSI is suppressed; constructing a multi-sensor field map based on a super-pixel cube; extracting local node features and edge features by using a multi-feature attention module and learning important coefficients of the two features by using a graphical attention network; and fusing each receptive field node-edge characteristic by utilizing a characteristic fusion attention module to provide the characteristic of the classification node. Finally, the image features are interpreted by using cross entropy loss to obtain labels of each pixel, and as shown above, the HSI cube can be divided into superpixels by a space transformer. However, PCA dimensionality reduction loses spectral information of HSI, i.e., superpixels cannot adequately extract spectral features of HSI. The general approach is to extract the spectral value of each pixel directly from the original HSI and then calculate the spectral average of the pixels in the superpixel. The method is simple and intuitive, but cannot suppress and eliminate the noise of the original HSI through network training.

TABLE 1 hyper-parameter settings for different datasets

And (4) classification results:

the experiment was conducted on the invention in a pytorch 1.8, using a GeForce GTX 1080Ti 11G GPU and a 3.70G Intel i9-10900K CPU computer. The method is used for classifying the data sets of Pavia University, Salinas and Houston 2013, overall classification accuracy (OA), average classification accuracy (AA) and Kappa coefficient (Kappa) are used as measuring indexes, 10 times of test operation are carried out for each time, and the average value is taken to obtain the result (OA: the proportion correctly classified in all samples; AA: the average value of the proportion correctly classified in each class; Kappa: the consistency of correctly classified data is expressed.)

TABLE 2 results of three standard data set experiments

	Pavia University	Salinas	Houston 2013
				OA(％)	98.62±0.76	98.65±0.28	94.87±0.76
AA(％)	98.38±0.92	99.05±0.15	95.70±0.52
				Kappa	0.98	0.98	0.94

The results of the three standard data sets in the table 2 show that the method has good classification effect on each data set, has good adaptability to different data sets, and achieves high classification precision.

Claims

1. A hyperspectral image classification method based on a multi-receptive-field attention network is provided with a multi-receptive-field image and a convolutional neural network which are constructed by a superpixel cube, and is characterized in that: accurately partitioning an original hyperspectral image (HSI) into adaptive local regions by using an unsupervised Principal Component Analysis (PCA) method and Simple Linear Iterative Clustering (SLIC); the spectral characteristics of the pixels are extracted by adopting a double-layer 1D Convolutional Neural Network (CNN), so that the number of nodes needing to be calculated is reduced, and the noise of the original HSI is suppressed; constructing a multi-sensing-field map based on a superpixel cube; extracting local node features and edge features by using a multi-feature attention module and learning important coefficients of the two features by using a graph attention network; fusing each receptive field node-edge characteristic by utilizing a characteristic fusion attention module, giving out the characteristic of a classification node, and finally, interpreting the image characteristic by utilizing cross entropy loss to obtain a label of each pixel, wherein the method specifically comprises the following steps:

step 1: constructing a spectral-spatial conversion module

Step 2: constructing a frequency domain conversion module:

and step 3: constructing a multi-feature attention Module

And 4, step 4: construction of multiple receptor fields Module

And 5: feature fusion attention and decision module

Step 6: loss functions and model training.

2. The hyperspectral image classification based on the multi-receptive-field attention network of claim 1, wherein: the step 1 of constructing the spectrum space conversion module specifically comprises the following steps:

given HSI cube I_B＝{x₁,x₂,…,x_mIs composed of

One pixel, B wave bands, wherein

Having m pixels and b bands, where b<<B. r represents the dimensionality reduction; then, the SLIC is adopted to divide the pixels into superpixels, and the local superpixel HSI can be expressed by a mathematical expression as:

wherein

Is represented by comprising n_iA super-pixel of a single pixel,

k is the total number of superpixels; the pixels in the super pixel have strong spectral spatial correlation; in the method of the invention, the superpixels are taken as nodes of the graph; by controlling the number of the super pixels K, the scale of the graph can be controlled, and the calculation complexity of the algorithm is reduced.

3. The hyperspectral image classification based on the multi-receptive-field attention network of claim 1, wherein: the step 2 of constructing the frequency domain conversion module specifically comprises the following steps:

step 2.1: in order to extract discriminative and robust spectral features for the proposed method, a spectral transformer is provided that employs two layers of 1 x 1CNN kernels to extract spectral values of a single pixel in each band, and in the proposed method, a spatial position p₀The spectral feature vector of the pixel can be written as:

step 2.2: output characteristics of convolution layer in the b-th spectral band

Is composed of

Wherein

And

In particular, M can be calculated as

Wherein x_iIs I_B"flattening" (HSI) denotes a flattening operation of the hyperspectral image in the spatial dimension, as described in equation (4), a mapping between spatial pixels and superpixels can be achieved;

step 2.4: finally, the average spectral feature of each superpixel is used as a node feature vector, and the graph node feature can be expressed mathematically as

is the spectral feature vector of the pixel as shown in equation (5).

4. The hyperspectral image classification based on the multi-receptive-field attention network of claim 1, wherein: the multi-feature attention can be divided into three parts, namely neighbor node attention, side attention and feature fusion attention; the specific details of these three sections are as follows:

step 3.1: neighbor node attention

After multilayer convolution, n_iCan be converted into a form that

represents n_ikThe importance coefficient of (a); n is_iIs a neighbor node characteristic of the central node n, n according to equation (7)_iCan be decomposed into linear sums of adjacent node features;

step 3.2: attention at the same time

Wherein h is_iAnd h_jThe spectral characteristics of the nodes i, j are represented (calculated by equation (5)), and the euclidean distance between two nodes is represented as | | h_i-h_j||²,

similar to equation (6), the output of the first convolutional layer in side attention can be expressed as

Wherein

Represents the output of the l-1 convolutional layer in edge attention,

Wherein a is_iIs the ith edge-hopping feature of the center node n,

denotes a_ikThe importance coefficient of (a);

step 3.3: feature fusion attention

In order to comprehensively utilize edge and node features and provide feature fusion attention, the features extracted by the attention module are adaptively fused, and the mathematical expression of n can be expressed as

Wherein

And

as described above, centroid node n may ultimately be represented as

5. The hyperspectral image classification based on the multi-receptive-field attention network of claim 1, wherein: the specific steps of constructing the multi-receptive-field module in the step 4 are as follows:

in order to extract the global information of HSI, a multi-branch receptive field aggregation mechanism is designed; the multi-receptive-field module can avoid the negative influence of the low-order jumping points and learn the remote characteristic information of the HSI; in branch s, the receptive field for node n may be formed

R_i(n)＝R_i-1(n)∪R₁ (13)

Wherein R is₁(n) is a set of 1-hop neighbors of node n, R₀(n) n; the characteristics of the central node n in branch i can be finally expressed as

Where i is the ith hop neighbor of node n.

6. The hyperspectral image classification based on the multi-receptive-field attention network of claim 1, wherein: the feature fusion attention and decision module described in step 5 is specifically as follows:

O＝σ(∑_i∈Se_i·W^Tnⁱ) (15)

7. The hyperspectral image classification based on the multi-receptive-field attention network of claim 1, wherein: the "loss function and model training" described in step 6 is specifically as follows:

Wherein y is_GIs a set of tags; c denotes the number of classes, Y_zfIs a training label matrix; end-to-end training is adopted, and Adam is adopted to update the network parameters of the invention.