CN116843975A

CN116843975A - Hyperspectral image classification method combined with spatial pyramid attention mechanism

Info

Publication number: CN116843975A
Application number: CN202310838661.0A
Authority: CN
Inventors: 刘和; 胡紫林; 韩啸; 赵妍; 张洋; 王维英; 王东
Original assignee: State Grid Heilongjiang Electric Power Co Ltd Harbin Power Supply Co; State Grid Corp of China SGCC
Current assignee: State Grid Heilongjiang Electric Power Co Ltd Harbin Power Supply Co; State Grid Corp of China SGCC
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-10-03

Abstract

A hyperspectral image classification method combining a spatial pyramid attention mechanism belongs to the hyperspectral image classification field. The invention solves the problem of poor classification precision of the existing classification method. The method adopts a principal component analysis method to remove the spectral redundancy in the hyperspectral image to be classified; establishing a sample set; carrying out multi-scale feature extraction and preliminary fusion on a sample set by adopting a ResNet34 shallow feature extraction network combined with dynamic convolution, and carrying out weighted fusion on the sample subjected to linear convolution transformation and the preliminary fusion feature to obtain pooled fusion features; constructing an SPC module, and extracting multi-scale space features and local and global information from the pooled features; combining the extracted information to obtain spectrum-space information combined characteristics; and (3) converting the spectrum-space information combined characteristics into a fixed size matrix by adopting a pooling layer of a space characteristic pyramid, and linearly weighting the fixed size matrix by adopting a full-connection layer based on Softmax to obtain a classification result. The method is suitable for hyperspectral image classification.

Description

Hyperspectral image classification method combined with spatial pyramid attention mechanism

Technical Field

The invention belongs to the field of hyperspectral image classification.

Background

The hyperspectral remote sensing image (Hyperspectral Imagery, HSI) is an image obtained by a hyperspectral imager, and the space information and the spectrum information of the hyperspectral remote sensing image are quite rich. Compared with the common image, the hyperspectral remote sensing image also has more wave band numbers and extremely high resolution, and has the property of 'map in one'. The hyperspectral remote sensing earth observation technology is widely applied, for example, the fields of precise agriculture, land coverage analysis, marine hydrologic detection, geological exploration and the like, and plays an increasingly important role in various aspects of economy, agriculture, environmental monitoring and the like in China.

At present, a plurality of classification algorithms based on deep learning tasks still have the problems of data redundancy, insufficient fine feature extraction, spectrum global and local information loss and the like. There is a problem in that classification accuracy is poor.

Disclosure of Invention

The invention aims to solve the problem of poor classification precision of the existing classification method, and provides a hyperspectral image classification method combining a spatial pyramid attention mechanism.

The invention relates to a hyperspectral image classification method combined with a spatial pyramid attention mechanism, which comprises the following steps:

Removing spectral redundancy in hyperspectral images to be classified by adopting a principal component analysis method;

sampling the hyperspectral image with spectral redundancy removed by adopting a sliding window sampling method to obtain a sample set;

thirdly, carrying out multi-scale feature extraction and fusion on the sample set by adopting a ResNet34 shallow feature extraction network combined with dynamic convolution to obtain primary fusion features;

step four, carrying out linear convolution transformation on the sample set, and carrying out weighted fusion on the sample subjected to the linear convolution transformation and the primary fusion characteristic to obtain a pooling fusion characteristic;

step five, combining the spatial feature pyramid network with a coordinate extrusion attention mechanism SPCS and a coordinate attention excitation mechanism SPCE to construct SPC modules, stacking the three SPC modules, and respectively extracting multi-scale spatial features, local information and global information from the pooled fusion features; combining the extracted information to obtain spectrum-space information combined characteristics;

and step six, converting the spectrum-space information combined characteristics into a fixed size matrix by adopting a pooling layer of a space characteristic pyramid, and carrying out linear weighting on the fixed size matrix by adopting a full-connection layer based on Softmax to obtain a classification result.

Further, in the first step of the present invention, the process of removing the spectral redundancy in the hyperspectral image to be classified by using the principal component analysis method is as follows:

step one, calculating covariance matrix, eigenvalue and eigenvector of hyperspectral image data to be classified;

step one, arranging feature vectors according to the sequence from the large feature value to the small feature value; the method comprises the steps of using a feature vector as a weighting coefficient, forming a feature vector matrix by n feature vectors with the largest feature value, projecting hyperspectral image data to be classified onto the feature vector matrix, adopting a principal component analysis method, calculating principal components of the feature matrix by using the weighting coefficient to obtain B principal component components, obtaining a dimension-reducing feature matrix, and realizing redundancy elimination of hyperspectral images to be classified, wherein n=3, and B is the band number of the hyperspectral images.

Further, in the present invention, in the step one, the method for calculating the covariance matrix of the hyperspectral image data to be classified is as follows:

σ(q _j ,q _k )＝E[(q _j -E(q _j ))(q _k -E(q _k ))]

wherein σ (q _j ,q _k ) Represents q _j And q _k Covariance between j, k= … m, m represents the number of eigenvalue matrix columns, E represents the matrix expectation, a represents the covariance matrix, q _j And q _k Respectively represent the jth random vector and the kth random vector after normalization.

In the third step, the method for extracting and fusing the multi-scale characteristics of the sample set by adopting the ResNet34 shallow characteristic extraction network combined with dynamic convolution comprises the following specific steps:

step three, a dynamic convolution matching method is adopted to obtain the weight of a convolution kernel in a network identity residual block of the ResNet 34;

step three, constructing a residual error combined characteristic network containing dynamic convolution by utilizing the weight of the convolution kernel, and carrying out multi-scale characteristic extraction on a sample set to obtain a characteristic diagram;

and thirdly, fusing the features in the feature map by adopting a linear function to obtain primary fusion features.

In the third step, the method for obtaining the weight of the convolution kernel in the network identity residual block of the ResNet34 by adopting the dynamic convolution matching method comprises the following steps:

using the formula:

y(x ⁱ )＝α[(k ₁ w ¹ +k ₂ w ² +···k _n w ⁿ )*x ⁱ ]

calculating to obtain the weight of the convolution kernel, wherein x represents the weight value of the dynamic convolution kernel with data dependence obtained through the attention module, and k _n ＝s _n (x) Is the nth scalar weight dependent on the input samples, n is the standard number of kernels for the convolution kernel operation,convolution kernel parameters and paranoid parameters representing d networks respectively, alpha representing an activation function, P representing P networks, x ⁱ Representing i dynamic convolution kernel total weight values, < ->Representing convolution kernel weights.

In the third step, the method for obtaining the feature map in the present invention is as follows:

U(x)＝W ₂ σ(W ₁ x)

wherein U (x) represents a feature map, σ () represents an improved linear function, W ₁ And W is ₂ The weights of weight layer 1 and weight layer 2, respectively.

Further, in the third step of the present invention, the method for obtaining the fusion characteristic is as follows:

H(x)＝U(x,{W _e })+W _s x

wherein W is _s Representing a linear transformation function, W _e Represents W ₁ Or W ₂ H (x) represents the features after the preliminary fusion.

In the fourth step, the sample after linear convolution transformation and the fusion feature are weighted, and the specific method for obtaining the pooling fusion feature is as follows:

X ^q+2 ＝X ^q +F(X ^q ；ξ)

wherein ζ= { H ^q+1 ,H ^q+2 ,b ^q+1 ,b ^q+2 }，X ^q ，X ^q+2 Respectively represent the input characteristic volumes of the q-th and q+1-th layers,representing the input characteristic volume H of the q+1 layer after dynamic convolution operation ^q+1 And b ^q+1 Features of n spatial convolution kernels of the (q+1) -th layer and linear transformations of the (q+1) -th layer, respectively, F (X) ^q The method comprises the steps of carrying out a first treatment on the surface of the ζ) is the residual function of the two convolution layer constructions of the q+1st and Q-th layers, σ () represents the modified linear function, using the Mish function, and Q is the pooling fusion feature.

Further, in the fifth step of the present invention, the specific method for obtaining the spectrum-space information combination feature is as follows:

Fifthly, aggregating the pooling fusion features into two-direction combined feature vectors PFW and PFH by utilizing an SPCS module, and performing two-dimensional self-adaptive average pooling operation on the pooling fusion features to aggregate into three feature graphs PF1, PF2 and PF3;

flattening and shaping the three feature maps PF1, PF2 and PF3 to generate a feature multi-scale space feature RF1, local information RF2 and global information RF3; then adopting an SPCE module to aggregate PFW, PFH, RF, RF2 and RF3 to obtain aggregated characteristic information;

and fifthly, combining the pooling fusion characteristic with the characteristic information after aggregation by adopting a jump connection method to obtain the spectrum-space information combined characteristic.

In the fifth step, the SPCE module is used to aggregate PFW, PFH, RF, RF2 and RF3, and the method for obtaining the characteristic information after aggregation is as follows:

first, the aggregation method of the channel characteristics of each spectrum dimension is the same, and the c-th channel aggregation process is described:

wherein H and W represent the height and width of the pooled fusion feature,and->Features representing the c-th channels of PFW, PFH, PF, PF2 and PF3 polymerized from pooled fusion features; c ranges from (0, 32), adaptive avgpool (3), adaptive avgpool (6) and adaptive avgpool (8) represent adaptive average pooling with output sizes of 3×3, 6×6 and 8×8, respectively;

The cascade of features RF1, RF2 and RF3 c-th channels in the spatial dimension is:

wherein f _sc Features representing the output after the c-th channel aggregation after SPCS []Representing a join operation in a spatial dimension;and->Features RFW, RFH, RF, RF2 and RF3 after shaping in the c-th pass are shown, respectively.

In the fifth step, the pooling fusion feature is combined with the feature information after aggregation by adopting jump connection, and the spectrum-space information combination feature is obtained by the following steps:

wherein, the liquid crystal display device comprises a liquid crystal display device,is the feature output after the c-th channel is polymerized, and the superscript sp represents the multi-scale feature; />Is the spectral attention characteristic g in the c-th channel at different scales ¹ ，g ² And g ³ And, wherein g ¹ ∈R ^C×1×1 ，g ² ∈R ^C×1×1 And g ³ ∈R ^C×1×1 ；

Wherein the method comprises the steps ofAnd->Representing three scale-space features MF1, MF2 and MF3 before pooling the fusion features;

wherein f _c (h, w) is the characteristic of the pooling fusion feature in the c-th channel,is characteristic of the c-th channel, and _co representing the coordinate features;

wherein F is _h And F _w Representing a convolution operation with a kernel size of E xe/r x 1,and->Characteristic values of MFH and MFW, respectively, MFH represents a row characteristic having a characteristic value of (C/r, H, 1), MFW represents a column characteristic having a characteristic value of (C/r, W, 1), g ^h And g ^w The importance characteristics of each channel are automatically learned through convolution, C represents the number of convolution kernels, and E represents the size of the convolution kernels.

In the sixth step, the method for obtaining the classification result by linearly weighting the fixed size matrix by using the full connection layer based on Softmax comprises the following specific steps:

the full-connection layer of Softmax is adopted to carry out linear weighting on the spectrum-space information combined characteristics to obtain a predicted output result, and the difference between the predicted output result and the actual result is calculated through a loss function to obtain loss:

then adopting a back propagation algorithm to obtain hyperspectral image features, and expressing the hyperspectral image features through a Softmax output layer to obtain a classification result, wherein y is an input sample, n is a class number, and p (y) _i ) Is a label of a training sample, q (y _i ) Is a class of model predictions.

Further, in the present invention, the Softmax output layer functions as:

wherein z is _g The output value of the g node is L, namely the number of output nodes, namely the number of categories of the classifier, and L is the total number of categories.

According to the invention, through a hyperspectral image classification model based on a spatial pyramid attention mechanism residual network, useful information is focused, useless information is restrained, so that multi-scale joint feature focusing is effectively realized, sensitivity to joint features is improved, and information interaction is realized by effectively emphasizing and focusing spatial and spectral information. The classification accuracy is effectively improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a truth diagram for a MUUFL dataset;

FIG. 3 is a resulting image of a MUUFL data set classified by a radial basis function-support vector machine;

FIG. 4 is a resulting image of classification of MUUFL data sets using extended morphology contour-support vectors;

FIG. 5 is a resulting image of a MUUFL data set classified by a convolutional neural network;

fig. 6 is a resulting image of classification of MUUFL datasets using a residual network;

fig. 7 is a resulting image of the classification of MUUFL datasets using a spectrum-space residual network;

fig. 8 is a resulting image of a MUUFL dataset classified using the method of the present invention;

fig. 9 is a truth chart of the Trento dataset;

FIG. 10 is a resulting image of the classification of a Trento dataset using a spectral-spatial residual network; the method comprises the steps of carrying out a first treatment on the surface of the

FIG. 11 is a resulting image of classifying a Trento dataset with an extended morphology contour-support vector;

FIG. 12 is a resulting image of a Trento dataset classified using a convolutional neural network;

FIG. 13 is a resulting image of classification of a Trento dataset with a residual network;

FIG. 14 is a resulting image of the classification of a Trento dataset using a spectral-spatial residual network;

fig. 15 is a resulting image of classifying a Trento dataset using the method of the present invention;

FIG. 16 is a network schematic diagram of a hyperspectral image classification method of the present invention incorporating a spatial pyramid attention mechanism to improve the residual network;

FIG. 17 is a schematic diagram of the spatial pyramid coordinate excitation mechanism in the method of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

The first embodiment is as follows: the present embodiment specifically describes a hyperspectral image classification method combining a spatial pyramid attention mechanism according to the present embodiment, including:

In this embodiment, the input image is sampled using sliding windows of different window sizes, the overlap ratio is set to 50%, windows of 3×3, 6×6, and 8×8 sizes are generated, and training samples (10%), verification samples (10%), and test samples (80%) are cut and divided; the training set samples are input into a ResNet34 shallow feature extraction network (Dynamic convolution ResNet, DCResNet) combined with dynamic convolution improvement, a plurality of convolution kernels are arranged in the last two convolution layers in an identical residual block of the network, and further the dynamic convolution kernels with data dependence are obtained through weighted combination, so that dynamic convolution matching is realized. Adding the obtained fusion characteristics with the original input data subjected to linear convolution transformation element by element to obtain refined characteristics; the spatial feature pyramid network is combined with the coordinate attention mechanism to construct a spatial pyramid coordinate attention mechanism combination module (Spatial Pyramid Coordinate Attention, SPC), and the SPC modules are stacked 3 times. And acquiring and aggregating the multi-scale spatial features, the local and global information by utilizing a spatial pyramid coordinate extrusion module embedded in the SPC module. And then, the spectral characteristics under different spaces are adaptively enhanced by utilizing a spatial pyramid coordinate excitation module embedded in the SPC module, so that the joint attention to the spectral-spatial information characteristics is realized. The combined feature map is input into an SPP layer and is converted into a matrix with a fixed size, the matrix is input into a full-connection layer based on Softmax, the extracted features are subjected to linear weighting, and finally a classification result is obtained through an output layer; through a hyperspectral image classification model based on a spatial pyramid attention mechanism residual network, the network focuses more on useful information and suppresses useless information, so that multi-scale joint feature focusing is effectively realized, sensitivity to joint features is improved, and space and spectral information are effectively emphasized and focused to realize information interaction. The classification accuracy of the HSI is effectively improved.

Further, in the present invention, in the step one, the method for calculating the covariance matrix of the hyperspectral image data to be classified comprises:

σ(q _j ,q _k )＝E[(q _j -E(q _j ))(q _k -E(q _k ))]

In the third step, the feature extraction and fusion are carried out on the sample set by adopting a ResNet34 shallow feature extraction network combined with dynamic convolution, and the specific method for obtaining the primary fusion feature is as follows:

using the formula:

y(x ⁱ )＝α[(k ₁ w ¹ +k ₂ w ² +···k _n w ⁿ )*x ⁱ ]

calculating to obtain the weight of the convolution kernel, wherein x represents the weight value of the dynamic convolution kernel with data dependence obtained through the attention module, and k _n ＝s _n (x) Is the nth scalar weight dependent on the input samples, n is the standard number of kernels for the convolution kernel operation,convolution kernel parameters and paranoid parameters representing d networks respectively, alpha representing an activation function, P representing P networks, x ⁱ Representing i dynamic convolution kernel total weight values, < ->The expression represents a convolution kernel weight.

U(x)＝W ₂ σ(W ₁ x)

H(x)＝U(x,{W _e })+W _s x

X ^q+2 ＝X ^q +F(X ^q ；ξ)

wherein ζ= { H ^q+1 ,H ^q+2 ,b ^q+1 ,b ^q+2 }，X ^q ，X ^q+2 Respectively represent the input characteristic volumes of the q-th and q+1-th layers,representing the input characteristic volume H of the q+1 layer after dynamic convolution operation ^q+1 And b ^q+1 Features of n spatial convolution kernels of the (q+1) -th layer and linear transformations of the (q+1) -th layer, respectively, F (X) ^q The method comprises the steps of carrying out a first treatment on the surface of the ζ) is the residual function of the two convolution layer constructions of the q+1st and q-th layers, σ () representing improved linearityThe function, specifically the Mish function, Q is the pooling fusion feature.

and fifthly, combining the pooling fusion characteristic with the characteristic information after aggregation by adopting jump connection to obtain the spectrum-space information combined characteristic.

In the fourth step, the invention carries out linear convolution transformation on the sample set, and carries out weighted fusion on the sample subjected to the linear convolution transformation and the primary fusion characteristic, and the method for obtaining the pooling fusion characteristic comprises the following steps:

firstly, the pooling operation of the channel characteristics of each spectrum dimension is the same, and a c-th channel fusion process is described:

wherein H and W represent the height and width of the pooled fusion feature,and->Features representing the c-th channels of PFW, PFH, PF, PF2 and PF3 fused by pooled fusion features; c is in the range of (0, 32), adapteveAvgPool (3), adapteveAvgPool (6) and AdaptaveAvgPool (8) represent output sizes of 3X 3, 6X 6 and 8X, respectively 8, self-adaptive average pooling;

wherein F is _h And F _w Representing a convolution operation with a kernel size of C x C/r x 1,and->Characteristic values of MFH and MFW, respectively, MFH represents a row characteristic having a characteristic value of (C/r, H, 1), MFW represents a column characteristic having a characteristic value of (C/r, W, 1), g ^h And g ^w Meaning that the importance features of each channel are automatically learned by convolution, and C denotes the number of convolution kernels.

then adopting a back propagation algorithm to obtain hyperspectral image features, and obtaining classification results from the hyperspectral image features through a Softmax output layer, wherein y is an input sample, n is a class number, and p (v) _i ) Is a label of a training sample, q (v _i ) Is a class of model predictions.

Further, in the present invention, the Softmax output layer functions as:

wherein z is _i The output value of the i node, and b is the number of output nodes, namely the number of categories of the classifier.

The hyperspectral image classification method combining the spatial pyramid attention mechanism to improve the residual network in the specific embodiment is shown in fig. 1, the network principle schematic diagram is shown in fig. 4, and the method comprises the following steps:

And a step a1, inputting hyperspectral images to be classified.

In this embodiment, two disclosed data sets are employed: MUUFL data sets and Trento data sets.

(1) MUUFL dataset: MUUFL dataset was photographed in the university of misiisibi, gulfport school, 11 months 2010, with 64 bands, a spatial resolution of 1m, 11 categories, and details shown in table 1.

Table 1MUUFL dataset details table

Sequence number	Class name	Number of samples
			1	Tree and tree	23246
2	Large grassland	4270
			3	Mixed ground	6882
4	Soil and sand	1826
			5	Road	6687
6	Water and its preparation method	466
			7	Building shade	2233
8	Building construction	6240
			9	Sidewalk	1385
10	Yellow sidewalk edge	183
			11	Environment-friendly plate	269
Sum up		53267

(2) Trento dataset: the Trento dataset was photographed in rural areas in the south of treanto, italy, 166×600 pixels, containing 63 bands, 6 categories, with a spatial resolution of 1m, the details being shown in table 2.

Table 2Trento dataset details table

Sequence number	Class name	Number of samples
			1	Apple tree	4034
2	Building construction	2903
			3	Ground surface	479
4	Tree and tree	9123
			5	Olive garden	10591
6	Road	3174
			Sum up		53267

Step a2, removing spectral redundancy using a principal component analysis method (Principal Component Analysis, PCA);

for the principal component analysis method, the principle is that most of energy characteristics in original data are reflected through fewer components, and components with higher correlation in the original data are converted into new components which are not correlated with each other, and the new components are called principal components. The hyperspectral data is de-redundant and each band is treated as a vector. The method comprises the following steps: and expanding the 200-dimensional spectrum channel of each pixel point in the hyperspectral image matrix into a1×200 characteristic matrix. And (3) averaging the elements in the feature matrix according to columns, and respectively subtracting the average value of the corresponding columns of the feature matrix from each element in the feature matrix.

Step a3, calculating a covariance matrix of the three-dimensional original hyperspectral data and solving a characteristic value and a characteristic vector;

covariance is calculated for every two columns of elements in the feature matrix, a covariance matrix of the feature matrix is constructed, and the covariance matrix of the feature matrix is calculated according to the following two formulas in sequence.

σ(x _j ,x _k )＝E[(x _j -E(x _j ))(x _k -E(x _k ))]

Wherein σ (x _j ,x _k ) Represents x _j And x _k Covariance between the two is j, k= … m, m represents the number of columns of the feature matrix, E represents matrix expectation, A represents a covariance matrix, and the feature value and the feature vector are obtained according to the covariance matrix.

Step b, arranging the feature vectors according to the sequence from the large feature value to the small feature value; and calculating B principal component components by using the characteristic vector as a weighting coefficient, wherein B is the wave band number of the hyperspectral image.

And sorting all the eigenvalues according to the sequence from big to small, selecting the first 3 eigenvalues from the sequence, and forming an eigenvector matrix according to columns by eigenvectors respectively corresponding to the 3 eigenvalues. And calculating B principal component components by using the characteristic vector as a weighting coefficient, wherein B is the wave band number of the hyperspectral image. Projecting the hyperspectral image matrix onto the selected feature vector matrix to obtain the feature matrix after dimension reduction.

And c, setting a plurality of convolution kernels in the last two convolution layers in the ResNet34 network identity residual block, determining the weight of each convolution kernel according to the input of the convolution layers, weighting the convolution kernels to obtain dynamic convolution kernels, and finally carrying out convolution operation to realize DConv matching. A residual joint feature network (dcres net) is constructed that contains dynamic convolution. Adding the obtained feature fusion data with the original input data subjected to linear convolution transformation element by element to obtain a later fusion feature;

And c1, setting a plurality of convolution kernels in the last two convolution layers of the ResNet34 network identity residual block, determining the weight of each convolution kernel according to the input of the convolution layers, weighting the convolution kernels to obtain dynamic convolution kernels, and finally carrying out convolution operation to realize dynamic convolution matching.

The conventional convolution layer has a static convolution kernel applied to all input samples. In the model, DConv dynamically weights given input along a four-dimensional convolution kernel space, so that convolution operation depends on the input of the model, and the capability of the model for acquiring input information and rich features is enhanced. Mathematically, a dynamic convolution operation can be defined as:

y(x ⁱ )＝α[(k ₁ w ¹ +k ₂ w ² +···k _n w ⁿ )*x ⁱ ]

x can obtain a dynamic convolution kernel weight value with data dependence through the attention module. For a specific input x ⁱ The dynamic convolution kernel may be generated by the above method, where k _i ＝s _i (x) Is a scalar weight that depends on the input sample, calculated using a routing function that contains a learnable parameter,and respectively representing convolution kernel parameters and paranoid parameters of d networks, wherein n is the standard kernel number of convolution kernel operation, and alpha represents an activation function. The dynamic convolutional layer linear mixing process is as follows:

α[(k ₁ w ¹ +k ₂ w ² +···k _n w ⁿ )*x ⁱ ]＝α(k ₁ (w ¹ *x ⁱ )+···k _n (w ⁿ *x ⁱ ))

and x can obtain k attention parameters with different weights after passing through an attention module, and the k attention parameters are multiplied by k initialized convolution kernels to obtain k convolution kernels, and further the k convolution kernels with data dependence are obtained through weighted combination. Thus, the dynamic convolution has the same ability as the static convolution of n linear hybrids, with each routing function contributing to the dynamic convolution performance, the routing weights are calculated as follows:

s(x)＝Sigmoid(Global AveragePool(x)*F)

Wherein F is a learnable weight;

the dynamic convolution only introduces two parts of additional computation, namely the weighted combination of the attention module kernel convolution kernels, and the network can obtain remarkable capacity improvement through a small amount of additional computation.

The goal of DConv networks is to increase the generation and expression capabilities of lightweight neural networks without increasing the depth and breadth of the network. The difference from a single core of conventional convolution is that dynamic convolution can combine a plurality of dynamic parallel convolution cores to form a dynamic core, the dynamic core has data dependence, parameters of the dynamic convolution core can be adjusted according to the dynamic difference of data, and the generation and expression capacity of a network can be effectively improved. The dynamic convolution flow is shown in fig. 5.

Step c2, constructing a residual joint characteristic network (DCResNet) containing dynamic convolution, namely, DCResNet conv1, conv2_x, conv3_x, conv4_x and conv5_x, wherein the convolution layers containing dynamic convolution are conv4_x and conv5_x, inputting the processed data with the size of 9 multiplied by band into an improved residual network, and outputting to obtain a 64 multiplied by 3 characteristic diagram;

it consists of five parts of conv1, conv2_x, conv3_x, conv4_x and conv5_x, and the parameter settings of the parts are shown in table 2. The model input size is 9 x 9 and the final conv5_x output image size is 1 x 1. Firstly, the processed data with the size of 9 multiplied by band is input into a common convolution layer of a residual network, the convolution kernel size is 7 multiplied by 7, the step size is 2, the padding is 3, the layer is used for extracting initial features of an image, the obtained result is input into a dynamic convolution layer with the size of 64 multiplied by 5, the convolution kernel in the dynamic convolution is 3 multiplied by 3, the step size is 1, the padding is 1, the convolution kernel of the dynamic convolution is obtained by multiplying attention and weight to obtain a convolution kernel after the batch size fusion, and 4 convolution kernels with the size of 3 multiplied by 3 are adopted as the dynamic convolution kernels for weighting processing to output the feature image with the size of 64 multiplied by 3.

U(x)＝W ₂ σ(W ₁ x)

Wherein σ represents an improved linear function Mish, W ₁ And W is ₂ The weights of weight layer 1 and weight layer 2, respectively, then pass a shortcut, and the 2 nd ReLU to obtain the output H (x).

H(x)＝U(x,{W _i })+x

However, when the input and output dimensions need to be changed (e.g., changing the number of channels), a linear transformation W can be performed on x when performing a shortcut operation _s 。

H(x)＝U(x,{W _i })+W _s x

And c3, adding the fusion characteristics of the DCResNet and the original input data subjected to linear convolution transformation element by element to obtain refined characteristics, and capturing the spatial and spectral information characteristics of the input data.

Feature fusion data X ^q+1 And X ^q+2 The space size remains unchanged at 64 x 3. And weighting the characteristic feature fusion data and the original input data subjected to linear convolution transformation element by element to obtain the later fusion characteristic.

X ^q+2 ＝X ^q +F(X ^q ；ξ)

Wherein ζ= { H ^q+1 ,H ^q+2 ,b ^q+1 ,b ^q+2 }，X ^q+2 Representing the input feature volume of layer (q+1), H ^q+1 And b ^q+1 N spatial convolution kernels each representing layer (q+1), F (X) ^q The method comprises the steps of carrying out a first treatment on the surface of the ζ) is the residual function of the two convolutional layer constructions, rather than directly mapping X using a jump connection ^q Where σ represents the modified linear function Mish, and X is the post-fusion feature obtained after weighting.

And d, combining the spatial feature pyramid network with a coordinate attention mechanism to construct an SPC module, and stacking the SPC module for 3 times. And acquiring and aggregating the multi-scale space features, the local and global information by utilizing an SPCS module. Then the SPCE module is utilized to adaptively enhance the spectrum characteristics under different spaces, so as to realize the joint attention to the spectrum-space information characteristics

Step d1, merging the feature graphs (FM) extracted by DCResNet by using an SPCS module to generate PF, then converging FM into merged feature vectors PFW and PFH in two directions, performing two-dimensional adaptive pooling operation on the FM, converging the FM into three feature graphs PF1, PF2 and PF3 with the sizes of 3×3, 6×6 and 8×8, and flattening and shaping the PF to generate feature RF. Aggregating multi-scale spatial features, local and global information RFW, RFH, RF1, RF2, RF 3;

combining the SPCS module and the SPCE module to construct a spatial pyramid coordinate attention mechanism combined module SPC, and stacking the SPC modules for 3 times, as shown in FIG. 4;

the structure of the SPC module 3 times stack is: SPCS module, SPCE module, batch normalization, 1×1 convolution, sigmoid, 5×5 convolution, concate layer, SPCS module, SPCE module, batch normalization, 1×1 convolution, sigmoid, 5×5 convolution, concate layer;

and d2, sending the feature map SF after SPCS aggregation into SPCE. The spectrum characteristics under different obtained spaces are adaptively enhanced by using SPCE, so that the joint attention to spectrum-space information characteristics is realized, and the structure of the space pyramid coordinate extrusion module is shown in fig. 6:

First, feature Maps (FM) are combined to generate PF. Specifically, one-dimensional global pooling is performed along the vertical and horizontal directions over FM. The spatial features of the local and global information can be effectively aggregated. The pooling operation for the c-th channel feature is specifically as follows:

where H and W represent the height and width of the pooling feature,and->Features of the c-th channels of PFW, PFH, PF, PF2, and PF3 generated by pooled FM are shown. AdapteveAvgPool (3), adapteveAvgPool (6) and AdaptaveAvgPool (8) represent adaptive average pooling with output sizes of 3×3, 6×6 and 8×8.

The cascading equation for the c-th channel of the feature RF in the spatial dimension is as follows:

wherein f _sc Representing the output characteristics of the c-th channel after SPCS []Representing a join operation in the spatial dimension.And->Features RFW, RFH, RF, RF2 and RF3 after shaping in the c-th pass are shown, respectively.

And d3, combining the SPCS and the SPCE to construct a spatial pyramid coordinate attention mechanism combined module (Spatial Pyramid Coordinate Attention, SPC), and stacking the SPC module for 3 times to effectively enhance the combined characteristic information. The SPCE structure is shown in FIG. 7;

the feature map SF after SPCS aggregation is sent to SPCE to obtain the spectral space attention feature. SPCE can be decomposed into two steps, first reducing the SF channel to itself 1/r using a cx1 x 1 convolution. It is then split into two groups from the spatial dimension while the feature channels are extended to C. After the nonlinear operation is carried out on the channel, the channel weight with different spatial characteristics can be obtained, and in turn, the spectrum attention characteristics fused with different spatial information can be captured.

Specifically, the aggregate feature SF is first processed by 1 x 1 convolution and batch normalization.

f _m ＝δ(F ₁ (f _s ))

Where δ is the Mish activation function. F1 represents a convolution operation. The core size is C/r×C×1×1, where C/r and C represent the number of output channels and input channels, respectively, and r represents the reduction ratio. f (f) _s Is feature SF, and the feature MF size becomes C/r×s×1 after the above operation. Features are then split and reshaped along the spatial dimension and separated into two groups. One group is a row feature MFH and a column feature MFW having position information. Other features are MF1, MF2 and MF3, respectively 3×3, 6×6 and 8×8, respectively, containing multi-scale information.

To obtain a spectral attention feature with position information, a row feature MFH of size (C/r, H, 1) and a column feature MFW of size (C/r, W, 1), two convolutions are used to automatically learn the importance of each channel in the position feature to generate g ^h And g ^w The calculation method is as follows:

wherein F is _h And F _w A convolution operation with a kernel size of c×c/r×1×1 is shown.And->MFH and MFW are indicated, respectively. Will g ^h And g ^w After being applied to the feature map FM, the method is applied to g ^h 、g ^w And FM, and achieves attention to the information spectrum in the precise location area.

Wherein f _c (h, w) is the characteristic of feature FM in the c-th channel,is characteristic of the c-th channel, and _c o represents the coordinate feature.

To obtain spectral attention features of multi-scale information, the multi-scale features MF1, MF2 and MF3 are first convolved with convolution kernels of sizes c×c/r×3×3, c×c/r×6×6 and c×c/r×8×8, respectively, to obtain spectral attention features g at different scales ¹ ∈R ^C×1×1 ，g ² ∈R ^C×1×1 And g ³ ∈R ^C×1×1 。

Wherein the method comprises the steps ofAnd->Representing three scale-space features MF1, MF2 and MF3 after pooling the fusion features; g ^s ∈R ^C ^×1×1 G is g ¹ ，g ² And g ³ The spatial pyramid coordinates of the c-th channel are characterized by the following:

wherein, the liquid crystal display device comprises a liquid crystal display device,the c-th channel feature, the superscript sp, represents the multi-scale feature, and finally, the pooled feature is combined with the spectrum attention feature through jump connection to generate the spatial pyramid coordinate attention feature.

Step d4, setting an activation function (i.e. an improved linear function represented by sigma ()) in the pyramid coordinate attention mechanism module as a Mish function;

the Mish activation function is a regularization function and has the function of avoiding saturation, and compared with the rigid zero boundary of the ReLU function, the Mish function can find a small convolution margin of a negative value to improve gradient flow, so that the model has better classification precision and generalization performance. The Mish function is applied to the SPCE, so that regularization property of the model can be enhanced, the situation of excessive fitting is prevented, and the realization formula is as follows:

f(v)＝v·tanh(softplus(v))＝v·tanh(ln(1+e ^v ))

Where v is the input data, tanh is the hyperbolic tangent function, softplus is an activation function, which can be considered as a smoothing of ReLU.

Step e: the combined feature map is input into an SPP layer and is converted into a matrix with a fixed size, the matrix is input into a full-connection layer based on Softmax, the extracted features are subjected to linear weighting, and finally a classification result is obtained through an output layer;

after the feature map is converted by the SPP layer, the extracted features are subjected to linear weighting to obtain a predicted output result, and the difference between the predicted output result and the actual result is calculated by a loss function, wherein the formula is as follows:

where y is the input sample, n is the number of categories, p (y _i ) Is a label of a training sample, q (y _i ) The model prediction type is that after the loss is calculated through the above formula, the model is back-propagated to obtain hyperspectral image characteristic representation;

and (3) obtaining a classification result through a Softmax output layer by expressing the characteristics of the hyperspectral image, wherein the Softmax function has the following formula:

Tables 1 and 2 show the proposed method and the comparison result of classifying MUUFL, trento datasets for six different methods of radial basis function-support vector machine (Radial Basis Function, RBF-SVM), extended morphology contour-support vector machine (Extended Morphological Profile, EMP-SVM), convolutional neural network (Convolutional Neural Networks, CNN), residual network (ResNet), spectral residual network (SSRN). It is evident that the method of the present invention has better classification performance than all the comparison methods, and that it has the highest classification index on both data sets.

For subjective evaluation of the test effect, fig. 2-8 and fig. 9-15 show truth diagrams of the patent data set and classification results of the methods. Compared with RBF-SVM, EMP-SVM, CNN, resNet and SSRN methods, the method is closer to real ground object distribution, the error classification area is smaller, and the effectiveness of the method in hyperspectral data classification is further demonstrated, and is particularly shown in tables 3 and 4.

Table 3 classification accuracy comparison table (%)

Category(s)

RBF-SVM

EMP-SVM

CNN

ResNet

SSRN

Ours

1

89.89±3.96

93.23±2.17

93.01±3.85

97.11±0.86

96.61±0.25

97.32±0.13

2

70.20±3.62

68.08±2.99

75.12±6.13

92.67±2.56

91.28±1.75

91.25±0.07

3

65.44±7.82

76.76±1.63

79.65±6.12

76.69±4.36

84.48±0.88

90.01±1.45

4

79.35±7.68

88.26±4.27

75.08±9.63

91.96±0.21

92.86±0.65

91.78±2.01

5

84.44±2.52

85.08±2.83

90.15±1.08

90.69±1.02

90.97±0.56

93.03±1.02

6

97.95±2.23

81.88±1.53

53.09±9.11

86.99±3.60

94.33±0.96

95.41±1.16

7

64.38±5.21

69.45±0.53

66.38±1.23

92.18±3.90

92.29±0.12

92.58±3.83

8

90.30±1.43

93.85±2.05

94.12±1.51

94.82±0.48

94.74±0.49

96.72±0.53

9

53.84±1.51

46.78±20.50

67.64±0.06

65.40±0.34

71.77±5.22

72.22±3.22

10

74.04±9.76

76.61±2.12

92.46±0.46

100.0±0.00

72.22±2.22

73.79±5.91

11

54.32±0.82

44.56±1.62

94.91±0.49

91.73±0.91

90.62±0.50

93.96±0.28

OA(％)

81.90±2.00

83.73±0.02

86.94±1.43

91.36±0.67

92.81±1.74

94.08±0.13

AA(％)

74.92±4.23

74.96±3.84

80.11±3.58

89.11±1.70

88.38±2.80

89.82±1.78

K×100

75.64±3.06

78.54±2.82

82.59±2.09

88.57±3.01

90.45±0.80

92.17±0.17

Table 4 classification accuracy comparison table (%)

Category(s)

RBF-SVM

EMP-SVM

CNN

ResNet

SSRN

Ours

1

82.59±8.53

96.53±3.02

94.85±2.16

97.60±2.76

98.90±0.06

2

83.56±3.92

82.95±0.68

83.52±6.41

88.45±6.95

89.50±5.01

3

96.89±2.20

97.61±3.03

92.91±1.00

94.91±0.72

98.48±1.22

4

95.37±2.10

89.01±0.08

99.48±0.45

99.15±1.17

99.98±0.01

5

92.40±2.76

94.92±0.35

99.25±0.01

98.45±1.78

98.68±0.13

6

77.43±4.61

87.41±6.37

81.47±0.93

88.52±0.71

95.99±1.73

OA(％)

89.52±1.25

93.17±1.27

94.84±0.09

96.19±0.56

98.32±0.41

AA(％)

88.04±0.70

91.37±2.25

91.91±1.01

94.51±0.89

96.92±1.36

K×100

85.91±1.72

90.80±1.73

93.11±1.25

94.91±0.76

96.29±0.35

Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that the different dependent claims and the features described herein may be combined in ways other than as described in the original claims. It is also to be understood that features described in connection with separate embodiments may be used in other described embodiments.

Claims

1. The hyperspectral image classification method combined with the spatial pyramid attention mechanism is characterized by comprising the following steps of:

2. The method for classifying hyperspectral images in combination with a spatial pyramid attention mechanism as claimed in claim 1, wherein in step one, the process of removing spectral redundancy in hyperspectral images to be classified by using a principal component analysis method is as follows:

3. The method for classifying hyperspectral image in combination with spatial pyramid attention mechanism as claimed in claim 2, wherein in step one, the method for calculating covariance matrix of hyperspectral image data to be classified is as follows:

σ(q _j ,q _k )＝E[(q _j -E(q _j ))(q _k -E(q _k ))]

4. The hyperspectral image classification method combining a spatial pyramid attention mechanism according to claim 1, 2 or 3, wherein in the third step, a ResNet34 shallow feature extraction network combined with dynamic convolution is adopted to extract and fuse multi-scale features of a sample set, and the specific method for acquiring primary fusion features is as follows:

5. The method for classifying hyperspectral images in combination with a spatial pyramid attention mechanism as claimed in claim 4, wherein in step three, the method for obtaining the weight of the convolution kernel in the network identity residual block of the res net34 by adopting a dynamic convolution matching method is as follows:

Using the formula:

y(x ⁱ )＝α[(k ₁ w ¹ +k ₂ w ² +···k _n w ⁿ )*x ⁱ ]

6. The method for classifying hyperspectral images in combination with a spatial pyramid attention mechanism as claimed in claim 4, wherein in step three two, the method for obtaining the feature map is as follows:

U(x)＝W ₂ σ(W ₁ x)

7. The method for classifying hyperspectral images in combination with a spatial pyramid attention mechanism as claimed in claim 6, wherein in step three, the method for obtaining fused primary features is as follows:

H(x)＝U(x,{W _e })+W _s x

wherein W is _s Representing a linear transformation function, W _e Represents W ₁ Or W ₂ H (x) representsAnd (5) primarily fusing the characteristics.

8. The hyperspectral image classification method combining a spatial pyramid attention mechanism according to claim 3, wherein in the fourth step, the sample after linear convolution transformation and the primary fusion feature are weighted, and the specific method for obtaining the pooled fusion feature is as follows:

X ^q+2 ＝X ^q +F(X ^q ；ξ)

Wherein ζ= { H ^q+1 ,H ^q+2 ,b ^q+1 ,b ^q+2 }，X ^q ，X ^q+2 Respectively represent the input characteristic volumes of the q-th and q+1-th layers,representing the input characteristic volume H of the q+1 layer after dynamic convolution operation ^q+1 And b ^q+1 Features of n spatial convolution kernels of the (q+1) -th layer and linear transformations of the (q+1) -th layer, respectively, F (X) ^q The method comprises the steps of carrying out a first treatment on the surface of the ζ) is the residual function of the two convolution layer constructions of the q+1st layer and the Q layer, and Q is the pooling fusion feature.

9. The method for classifying hyperspectral images in combination with a spatial pyramid attention mechanism as claimed in claim 3, wherein in the fifth step, the specific method for obtaining the spectrum-spatial information combined feature is as follows:

10. The method for classifying hyperspectral images in combination with spatial pyramid attention mechanisms as claimed in claim 3, wherein in step six, the full-connection layer based on Softmax is used to linearly weight the fixed size matrix, and the specific method for obtaining the classification result is as follows: