CN116843975A - Hyperspectral image classification method combined with spatial pyramid attention mechanism - Google Patents

Hyperspectral image classification method combined with spatial pyramid attention mechanism Download PDF

Info

Publication number
CN116843975A
CN116843975A CN202310838661.0A CN202310838661A CN116843975A CN 116843975 A CN116843975 A CN 116843975A CN 202310838661 A CN202310838661 A CN 202310838661A CN 116843975 A CN116843975 A CN 116843975A
Authority
CN
China
Prior art keywords
feature
convolution
features
fusion
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310838661.0A
Other languages
Chinese (zh)
Inventor
刘和
胡紫林
韩啸
赵妍
张洋
王维英
王东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Heilongjiang Electric Power Co Ltd Harbin Power Supply Co
State Grid Corp of China SGCC
Original Assignee
State Grid Heilongjiang Electric Power Co Ltd Harbin Power Supply Co
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Heilongjiang Electric Power Co Ltd Harbin Power Supply Co, State Grid Corp of China SGCC filed Critical State Grid Heilongjiang Electric Power Co Ltd Harbin Power Supply Co
Priority to CN202310838661.0A priority Critical patent/CN116843975A/en
Publication of CN116843975A publication Critical patent/CN116843975A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

A hyperspectral image classification method combining a spatial pyramid attention mechanism belongs to the hyperspectral image classification field. The invention solves the problem of poor classification precision of the existing classification method. The method adopts a principal component analysis method to remove the spectral redundancy in the hyperspectral image to be classified; establishing a sample set; carrying out multi-scale feature extraction and preliminary fusion on a sample set by adopting a ResNet34 shallow feature extraction network combined with dynamic convolution, and carrying out weighted fusion on the sample subjected to linear convolution transformation and the preliminary fusion feature to obtain pooled fusion features; constructing an SPC module, and extracting multi-scale space features and local and global information from the pooled features; combining the extracted information to obtain spectrum-space information combined characteristics; and (3) converting the spectrum-space information combined characteristics into a fixed size matrix by adopting a pooling layer of a space characteristic pyramid, and linearly weighting the fixed size matrix by adopting a full-connection layer based on Softmax to obtain a classification result. The method is suitable for hyperspectral image classification.

Description

Hyperspectral image classification method combined with spatial pyramid attention mechanism
Technical Field
The invention belongs to the field of hyperspectral image classification.
Background
The hyperspectral remote sensing image (Hyperspectral Imagery, HSI) is an image obtained by a hyperspectral imager, and the space information and the spectrum information of the hyperspectral remote sensing image are quite rich. Compared with the common image, the hyperspectral remote sensing image also has more wave band numbers and extremely high resolution, and has the property of 'map in one'. The hyperspectral remote sensing earth observation technology is widely applied, for example, the fields of precise agriculture, land coverage analysis, marine hydrologic detection, geological exploration and the like, and plays an increasingly important role in various aspects of economy, agriculture, environmental monitoring and the like in China.
At present, a plurality of classification algorithms based on deep learning tasks still have the problems of data redundancy, insufficient fine feature extraction, spectrum global and local information loss and the like. There is a problem in that classification accuracy is poor.
Disclosure of Invention
The invention aims to solve the problem of poor classification precision of the existing classification method, and provides a hyperspectral image classification method combining a spatial pyramid attention mechanism.
The invention relates to a hyperspectral image classification method combined with a spatial pyramid attention mechanism, which comprises the following steps:
Removing spectral redundancy in hyperspectral images to be classified by adopting a principal component analysis method;
sampling the hyperspectral image with spectral redundancy removed by adopting a sliding window sampling method to obtain a sample set;
thirdly, carrying out multi-scale feature extraction and fusion on the sample set by adopting a ResNet34 shallow feature extraction network combined with dynamic convolution to obtain primary fusion features;
step four, carrying out linear convolution transformation on the sample set, and carrying out weighted fusion on the sample subjected to the linear convolution transformation and the primary fusion characteristic to obtain a pooling fusion characteristic;
step five, combining the spatial feature pyramid network with a coordinate extrusion attention mechanism SPCS and a coordinate attention excitation mechanism SPCE to construct SPC modules, stacking the three SPC modules, and respectively extracting multi-scale spatial features, local information and global information from the pooled fusion features; combining the extracted information to obtain spectrum-space information combined characteristics;
and step six, converting the spectrum-space information combined characteristics into a fixed size matrix by adopting a pooling layer of a space characteristic pyramid, and carrying out linear weighting on the fixed size matrix by adopting a full-connection layer based on Softmax to obtain a classification result.
Further, in the first step of the present invention, the process of removing the spectral redundancy in the hyperspectral image to be classified by using the principal component analysis method is as follows:
step one, calculating covariance matrix, eigenvalue and eigenvector of hyperspectral image data to be classified;
step one, arranging feature vectors according to the sequence from the large feature value to the small feature value; the method comprises the steps of using a feature vector as a weighting coefficient, forming a feature vector matrix by n feature vectors with the largest feature value, projecting hyperspectral image data to be classified onto the feature vector matrix, adopting a principal component analysis method, calculating principal components of the feature matrix by using the weighting coefficient to obtain B principal component components, obtaining a dimension-reducing feature matrix, and realizing redundancy elimination of hyperspectral images to be classified, wherein n=3, and B is the band number of the hyperspectral images.
Further, in the present invention, in the step one, the method for calculating the covariance matrix of the hyperspectral image data to be classified is as follows:
σ(q j ,q k )=E[(q j -E(q j ))(q k -E(q k ))]
wherein σ (q j ,q k ) Represents q j And q k Covariance between j, k= … m, m represents the number of eigenvalue matrix columns, E represents the matrix expectation, a represents the covariance matrix, q j And q k Respectively represent the jth random vector and the kth random vector after normalization.
In the third step, the method for extracting and fusing the multi-scale characteristics of the sample set by adopting the ResNet34 shallow characteristic extraction network combined with dynamic convolution comprises the following specific steps:
step three, a dynamic convolution matching method is adopted to obtain the weight of a convolution kernel in a network identity residual block of the ResNet 34;
step three, constructing a residual error combined characteristic network containing dynamic convolution by utilizing the weight of the convolution kernel, and carrying out multi-scale characteristic extraction on a sample set to obtain a characteristic diagram;
and thirdly, fusing the features in the feature map by adopting a linear function to obtain primary fusion features.
In the third step, the method for obtaining the weight of the convolution kernel in the network identity residual block of the ResNet34 by adopting the dynamic convolution matching method comprises the following steps:
using the formula:
y(x i )=α[(k 1 w 1 +k 2 w 2 +···k n w n )*x i ]
calculating to obtain the weight of the convolution kernel, wherein x represents the weight value of the dynamic convolution kernel with data dependence obtained through the attention module, and k n =s n (x) Is the nth scalar weight dependent on the input samples, n is the standard number of kernels for the convolution kernel operation,convolution kernel parameters and paranoid parameters representing d networks respectively, alpha representing an activation function, P representing P networks, x i Representing i dynamic convolution kernel total weight values, < ->Representing convolution kernel weights.
In the third step, the method for obtaining the feature map in the present invention is as follows:
U(x)=W 2 σ(W 1 x)
wherein U (x) represents a feature map, σ () represents an improved linear function, W 1 And W is 2 The weights of weight layer 1 and weight layer 2, respectively.
Further, in the third step of the present invention, the method for obtaining the fusion characteristic is as follows:
H(x)=U(x,{W e })+W s x
wherein W is s Representing a linear transformation function, W e Represents W 1 Or W 2 H (x) represents the features after the preliminary fusion.
In the fourth step, the sample after linear convolution transformation and the fusion feature are weighted, and the specific method for obtaining the pooling fusion feature is as follows:
X q+2 =X q +F(X q ;ξ)
wherein ζ= { H q+1 ,H q+2 ,b q+1 ,b q+2 },X q ,X q+2 Respectively represent the input characteristic volumes of the q-th and q+1-th layers,representing the input characteristic volume H of the q+1 layer after dynamic convolution operation q+1 And b q+1 Features of n spatial convolution kernels of the (q+1) -th layer and linear transformations of the (q+1) -th layer, respectively, F (X) q The method comprises the steps of carrying out a first treatment on the surface of the ζ) is the residual function of the two convolution layer constructions of the q+1st and Q-th layers, σ () represents the modified linear function, using the Mish function, and Q is the pooling fusion feature.
Further, in the fifth step of the present invention, the specific method for obtaining the spectrum-space information combination feature is as follows:
Fifthly, aggregating the pooling fusion features into two-direction combined feature vectors PFW and PFH by utilizing an SPCS module, and performing two-dimensional self-adaptive average pooling operation on the pooling fusion features to aggregate into three feature graphs PF1, PF2 and PF3;
flattening and shaping the three feature maps PF1, PF2 and PF3 to generate a feature multi-scale space feature RF1, local information RF2 and global information RF3; then adopting an SPCE module to aggregate PFW, PFH, RF, RF2 and RF3 to obtain aggregated characteristic information;
and fifthly, combining the pooling fusion characteristic with the characteristic information after aggregation by adopting a jump connection method to obtain the spectrum-space information combined characteristic.
In the fifth step, the SPCE module is used to aggregate PFW, PFH, RF, RF2 and RF3, and the method for obtaining the characteristic information after aggregation is as follows:
first, the aggregation method of the channel characteristics of each spectrum dimension is the same, and the c-th channel aggregation process is described:
wherein H and W represent the height and width of the pooled fusion feature,and->Features representing the c-th channels of PFW, PFH, PF, PF2 and PF3 polymerized from pooled fusion features; c ranges from (0, 32), adaptive avgpool (3), adaptive avgpool (6) and adaptive avgpool (8) represent adaptive average pooling with output sizes of 3×3, 6×6 and 8×8, respectively;
The cascade of features RF1, RF2 and RF3 c-th channels in the spatial dimension is:
wherein f sc Features representing the output after the c-th channel aggregation after SPCS []Representing a join operation in a spatial dimension;and->Features RFW, RFH, RF, RF2 and RF3 after shaping in the c-th pass are shown, respectively.
In the fifth step, the pooling fusion feature is combined with the feature information after aggregation by adopting jump connection, and the spectrum-space information combination feature is obtained by the following steps:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the feature output after the c-th channel is polymerized, and the superscript sp represents the multi-scale feature; />Is the spectral attention characteristic g in the c-th channel at different scales 1 ,g 2 And g 3 And, wherein g 1 ∈R C×1×1 ,g 2 ∈R C×1×1 And g 3 ∈R C×1×1
Wherein the method comprises the steps ofAnd->Representing three scale-space features MF1, MF2 and MF3 before pooling the fusion features;
wherein f c (h, w) is the characteristic of the pooling fusion feature in the c-th channel,is characteristic of the c-th channel, and co representing the coordinate features;
wherein F is h And F w Representing a convolution operation with a kernel size of E xe/r x 1,and->Characteristic values of MFH and MFW, respectively, MFH represents a row characteristic having a characteristic value of (C/r, H, 1), MFW represents a column characteristic having a characteristic value of (C/r, W, 1), g h And g w The importance characteristics of each channel are automatically learned through convolution, C represents the number of convolution kernels, and E represents the size of the convolution kernels.
In the sixth step, the method for obtaining the classification result by linearly weighting the fixed size matrix by using the full connection layer based on Softmax comprises the following specific steps:
the full-connection layer of Softmax is adopted to carry out linear weighting on the spectrum-space information combined characteristics to obtain a predicted output result, and the difference between the predicted output result and the actual result is calculated through a loss function to obtain loss:
then adopting a back propagation algorithm to obtain hyperspectral image features, and expressing the hyperspectral image features through a Softmax output layer to obtain a classification result, wherein y is an input sample, n is a class number, and p (y) i ) Is a label of a training sample, q (y i ) Is a class of model predictions.
Further, in the present invention, the Softmax output layer functions as:
wherein z is g The output value of the g node is L, namely the number of output nodes, namely the number of categories of the classifier, and L is the total number of categories.
According to the invention, through a hyperspectral image classification model based on a spatial pyramid attention mechanism residual network, useful information is focused, useless information is restrained, so that multi-scale joint feature focusing is effectively realized, sensitivity to joint features is improved, and information interaction is realized by effectively emphasizing and focusing spatial and spectral information. The classification accuracy is effectively improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a truth diagram for a MUUFL dataset;
FIG. 3 is a resulting image of a MUUFL data set classified by a radial basis function-support vector machine;
FIG. 4 is a resulting image of classification of MUUFL data sets using extended morphology contour-support vectors;
FIG. 5 is a resulting image of a MUUFL data set classified by a convolutional neural network;
fig. 6 is a resulting image of classification of MUUFL datasets using a residual network;
fig. 7 is a resulting image of the classification of MUUFL datasets using a spectrum-space residual network;
fig. 8 is a resulting image of a MUUFL dataset classified using the method of the present invention;
fig. 9 is a truth chart of the Trento dataset;
FIG. 10 is a resulting image of the classification of a Trento dataset using a spectral-spatial residual network; the method comprises the steps of carrying out a first treatment on the surface of the
FIG. 11 is a resulting image of classifying a Trento dataset with an extended morphology contour-support vector;
FIG. 12 is a resulting image of a Trento dataset classified using a convolutional neural network;
FIG. 13 is a resulting image of classification of a Trento dataset with a residual network;
FIG. 14 is a resulting image of the classification of a Trento dataset using a spectral-spatial residual network;
fig. 15 is a resulting image of classifying a Trento dataset using the method of the present invention;
FIG. 16 is a network schematic diagram of a hyperspectral image classification method of the present invention incorporating a spatial pyramid attention mechanism to improve the residual network;
FIG. 17 is a schematic diagram of the spatial pyramid coordinate excitation mechanism in the method of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The first embodiment is as follows: the present embodiment specifically describes a hyperspectral image classification method combining a spatial pyramid attention mechanism according to the present embodiment, including:
removing spectral redundancy in hyperspectral images to be classified by adopting a principal component analysis method;
sampling the hyperspectral image with spectral redundancy removed by adopting a sliding window sampling method to obtain a sample set;
Thirdly, carrying out multi-scale feature extraction and fusion on the sample set by adopting a ResNet34 shallow feature extraction network combined with dynamic convolution to obtain primary fusion features;
step four, carrying out linear convolution transformation on the sample set, and carrying out weighted fusion on the sample subjected to the linear convolution transformation and the primary fusion characteristic to obtain a pooling fusion characteristic;
step five, combining the spatial feature pyramid network with a coordinate extrusion attention mechanism SPCS and a coordinate attention excitation mechanism SPCE to construct SPC modules, stacking the three SPC modules, and respectively extracting multi-scale spatial features, local information and global information from the pooled fusion features; combining the extracted information to obtain spectrum-space information combined characteristics;
and step six, converting the spectrum-space information combined characteristics into a fixed size matrix by adopting a pooling layer of a space characteristic pyramid, and carrying out linear weighting on the fixed size matrix by adopting a full-connection layer based on Softmax to obtain a classification result.
In this embodiment, the input image is sampled using sliding windows of different window sizes, the overlap ratio is set to 50%, windows of 3×3, 6×6, and 8×8 sizes are generated, and training samples (10%), verification samples (10%), and test samples (80%) are cut and divided; the training set samples are input into a ResNet34 shallow feature extraction network (Dynamic convolution ResNet, DCResNet) combined with dynamic convolution improvement, a plurality of convolution kernels are arranged in the last two convolution layers in an identical residual block of the network, and further the dynamic convolution kernels with data dependence are obtained through weighted combination, so that dynamic convolution matching is realized. Adding the obtained fusion characteristics with the original input data subjected to linear convolution transformation element by element to obtain refined characteristics; the spatial feature pyramid network is combined with the coordinate attention mechanism to construct a spatial pyramid coordinate attention mechanism combination module (Spatial Pyramid Coordinate Attention, SPC), and the SPC modules are stacked 3 times. And acquiring and aggregating the multi-scale spatial features, the local and global information by utilizing a spatial pyramid coordinate extrusion module embedded in the SPC module. And then, the spectral characteristics under different spaces are adaptively enhanced by utilizing a spatial pyramid coordinate excitation module embedded in the SPC module, so that the joint attention to the spectral-spatial information characteristics is realized. The combined feature map is input into an SPP layer and is converted into a matrix with a fixed size, the matrix is input into a full-connection layer based on Softmax, the extracted features are subjected to linear weighting, and finally a classification result is obtained through an output layer; through a hyperspectral image classification model based on a spatial pyramid attention mechanism residual network, the network focuses more on useful information and suppresses useless information, so that multi-scale joint feature focusing is effectively realized, sensitivity to joint features is improved, and space and spectral information are effectively emphasized and focused to realize information interaction. The classification accuracy of the HSI is effectively improved.
Further, in the first step of the present invention, the process of removing the spectral redundancy in the hyperspectral image to be classified by using the principal component analysis method is as follows:
step one, calculating covariance matrix, eigenvalue and eigenvector of hyperspectral image data to be classified;
step one, arranging feature vectors according to the sequence from the large feature value to the small feature value; the method comprises the steps of using a feature vector as a weighting coefficient, forming a feature vector matrix by n feature vectors with the largest feature value, projecting hyperspectral image data to be classified onto the feature vector matrix, adopting a principal component analysis method, calculating principal components of the feature matrix by using the weighting coefficient to obtain B principal component components, obtaining a dimension-reducing feature matrix, and realizing redundancy elimination of hyperspectral images to be classified, wherein n=3, and B is the band number of the hyperspectral images.
Further, in the present invention, in the step one, the method for calculating the covariance matrix of the hyperspectral image data to be classified comprises:
σ(q j ,q k )=E[(q j -E(q j ))(q k -E(q k ))]
wherein σ (q j ,q k ) Represents q j And q k Covariance between j, k= … m, m represents the number of eigenvalue matrix columns, E represents the matrix expectation, a represents the covariance matrix, q j And q k Respectively represent the jth random vector and the kth random vector after normalization.
In the third step, the feature extraction and fusion are carried out on the sample set by adopting a ResNet34 shallow feature extraction network combined with dynamic convolution, and the specific method for obtaining the primary fusion feature is as follows:
step three, a dynamic convolution matching method is adopted to obtain the weight of a convolution kernel in a network identity residual block of the ResNet 34;
step three, constructing a residual error combined characteristic network containing dynamic convolution by utilizing the weight of the convolution kernel, and carrying out multi-scale characteristic extraction on a sample set to obtain a characteristic diagram;
and thirdly, fusing the features in the feature map by adopting a linear function to obtain primary fusion features.
In the third step, the method for obtaining the weight of the convolution kernel in the network identity residual block of the ResNet34 by adopting the dynamic convolution matching method comprises the following steps:
using the formula:
y(x i )=α[(k 1 w 1 +k 2 w 2 +···k n w n )*x i ]
calculating to obtain the weight of the convolution kernel, wherein x represents the weight value of the dynamic convolution kernel with data dependence obtained through the attention module, and k n =s n (x) Is the nth scalar weight dependent on the input samples, n is the standard number of kernels for the convolution kernel operation,convolution kernel parameters and paranoid parameters representing d networks respectively, alpha representing an activation function, P representing P networks, x i Representing i dynamic convolution kernel total weight values, < ->The expression represents a convolution kernel weight.
In the third step, the method for obtaining the feature map in the present invention is as follows:
U(x)=W 2 σ(W 1 x)
wherein U (x) represents a feature map, σ () represents an improved linear function, W 1 And W is 2 The weights of weight layer 1 and weight layer 2, respectively.
Further, in the third step of the present invention, the method for obtaining the fusion characteristic is as follows:
H(x)=U(x,{W e })+W s x
wherein W is s Representing a linear transformation function, W e Represents W 1 Or W 2 H (x) represents the features after the preliminary fusion.
In the fourth step, the sample after linear convolution transformation and the fusion feature are weighted, and the specific method for obtaining the pooling fusion feature is as follows:
X q+2 =X q +F(X q ;ξ)
wherein ζ= { H q+1 ,H q+2 ,b q+1 ,b q+2 },X q ,X q+2 Respectively represent the input characteristic volumes of the q-th and q+1-th layers,representing the input characteristic volume H of the q+1 layer after dynamic convolution operation q+1 And b q+1 Features of n spatial convolution kernels of the (q+1) -th layer and linear transformations of the (q+1) -th layer, respectively, F (X) q The method comprises the steps of carrying out a first treatment on the surface of the ζ) is the residual function of the two convolution layer constructions of the q+1st and q-th layers, σ () representing improved linearityThe function, specifically the Mish function, Q is the pooling fusion feature.
Further, in the fifth step of the present invention, the specific method for obtaining the spectrum-space information combination feature is as follows:
Fifthly, aggregating the pooling fusion features into two-direction combined feature vectors PFW and PFH by utilizing an SPCS module, and performing two-dimensional self-adaptive average pooling operation on the pooling fusion features to aggregate into three feature graphs PF1, PF2 and PF3;
flattening and shaping the three feature maps PF1, PF2 and PF3 to generate a feature multi-scale space feature RF1, local information RF2 and global information RF3; then adopting an SPCE module to aggregate PFW, PFH, RF, RF2 and RF3 to obtain aggregated characteristic information;
and fifthly, combining the pooling fusion characteristic with the characteristic information after aggregation by adopting jump connection to obtain the spectrum-space information combined characteristic.
In the fourth step, the invention carries out linear convolution transformation on the sample set, and carries out weighted fusion on the sample subjected to the linear convolution transformation and the primary fusion characteristic, and the method for obtaining the pooling fusion characteristic comprises the following steps:
firstly, the pooling operation of the channel characteristics of each spectrum dimension is the same, and a c-th channel fusion process is described:
wherein H and W represent the height and width of the pooled fusion feature,and->Features representing the c-th channels of PFW, PFH, PF, PF2 and PF3 fused by pooled fusion features; c is in the range of (0, 32), adapteveAvgPool (3), adapteveAvgPool (6) and AdaptaveAvgPool (8) represent output sizes of 3X 3, 6X 6 and 8X, respectively 8, self-adaptive average pooling;
the cascade of features RF1, RF2 and RF3 c-th channels in the spatial dimension is:
wherein f sc Features representing the output after the c-th channel aggregation after SPCS []Representing a join operation in a spatial dimension;and->Features RFW, RFH, RF, RF2 and RF3 after shaping in the c-th pass are shown, respectively.
In the fifth step, the pooling fusion feature is combined with the feature information after aggregation by adopting jump connection, and the spectrum-space information combination feature is obtained by the following steps:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the feature output after the c-th channel is polymerized, and the superscript sp represents the multi-scale feature; />Is the spectral attention characteristic g in the c-th channel at different scales 1 ,g 2 And g 3 And, wherein g 1 ∈R C×1×1 ,g 2 ∈R C×1×1 And g 3 ∈R C×1×1
Wherein the method comprises the steps ofAnd->Representing three scale-space features MF1, MF2 and MF3 before pooling the fusion features;
wherein f c (h, w) is the characteristic of the pooling fusion feature in the c-th channel,is characteristic of the c-th channel, and co representing the coordinate features;
wherein F is h And F w Representing a convolution operation with a kernel size of C x C/r x 1,and->Characteristic values of MFH and MFW, respectively, MFH represents a row characteristic having a characteristic value of (C/r, H, 1), MFW represents a column characteristic having a characteristic value of (C/r, W, 1), g h And g w Meaning that the importance features of each channel are automatically learned by convolution, and C denotes the number of convolution kernels.
In the sixth step, the method for obtaining the classification result by linearly weighting the fixed size matrix by using the full connection layer based on Softmax comprises the following specific steps:
the full-connection layer of Softmax is adopted to carry out linear weighting on the spectrum-space information combined characteristics to obtain a predicted output result, and the difference between the predicted output result and the actual result is calculated through a loss function to obtain loss:
then adopting a back propagation algorithm to obtain hyperspectral image features, and obtaining classification results from the hyperspectral image features through a Softmax output layer, wherein y is an input sample, n is a class number, and p (v) i ) Is a label of a training sample, q (v i ) Is a class of model predictions.
Further, in the present invention, the Softmax output layer functions as:
wherein z is i The output value of the i node, and b is the number of output nodes, namely the number of categories of the classifier.
The hyperspectral image classification method combining the spatial pyramid attention mechanism to improve the residual network in the specific embodiment is shown in fig. 1, the network principle schematic diagram is shown in fig. 4, and the method comprises the following steps:
And a step a1, inputting hyperspectral images to be classified.
In this embodiment, two disclosed data sets are employed: MUUFL data sets and Trento data sets.
(1) MUUFL dataset: MUUFL dataset was photographed in the university of misiisibi, gulfport school, 11 months 2010, with 64 bands, a spatial resolution of 1m, 11 categories, and details shown in table 1.
Table 1MUUFL dataset details table
Sequence number Class name Number of samples
1 Tree and tree 23246
2 Large grassland 4270
3 Mixed ground 6882
4 Soil and sand 1826
5 Road 6687
6 Water and its preparation method 466
7 Building shade 2233
8 Building construction 6240
9 Sidewalk 1385
10 Yellow sidewalk edge 183
11 Environment-friendly plate 269
Sum up 53267
(2) Trento dataset: the Trento dataset was photographed in rural areas in the south of treanto, italy, 166×600 pixels, containing 63 bands, 6 categories, with a spatial resolution of 1m, the details being shown in table 2.
Table 2Trento dataset details table
Sequence number Class name Number of samples
1 Apple tree 4034
2 Building construction 2903
3 Ground surface 479
4 Tree and tree 9123
5 Olive garden 10591
6 Road 3174
Sum up 53267
Step a2, removing spectral redundancy using a principal component analysis method (Principal Component Analysis, PCA);
for the principal component analysis method, the principle is that most of energy characteristics in original data are reflected through fewer components, and components with higher correlation in the original data are converted into new components which are not correlated with each other, and the new components are called principal components. The hyperspectral data is de-redundant and each band is treated as a vector. The method comprises the following steps: and expanding the 200-dimensional spectrum channel of each pixel point in the hyperspectral image matrix into a1×200 characteristic matrix. And (3) averaging the elements in the feature matrix according to columns, and respectively subtracting the average value of the corresponding columns of the feature matrix from each element in the feature matrix.
Step a3, calculating a covariance matrix of the three-dimensional original hyperspectral data and solving a characteristic value and a characteristic vector;
covariance is calculated for every two columns of elements in the feature matrix, a covariance matrix of the feature matrix is constructed, and the covariance matrix of the feature matrix is calculated according to the following two formulas in sequence.
σ(x j ,x k )=E[(x j -E(x j ))(x k -E(x k ))]
Wherein σ (x j ,x k ) Represents x j And x k Covariance between the two is j, k= … m, m represents the number of columns of the feature matrix, E represents matrix expectation, A represents a covariance matrix, and the feature value and the feature vector are obtained according to the covariance matrix.
Step b, arranging the feature vectors according to the sequence from the large feature value to the small feature value; and calculating B principal component components by using the characteristic vector as a weighting coefficient, wherein B is the wave band number of the hyperspectral image.
And sorting all the eigenvalues according to the sequence from big to small, selecting the first 3 eigenvalues from the sequence, and forming an eigenvector matrix according to columns by eigenvectors respectively corresponding to the 3 eigenvalues. And calculating B principal component components by using the characteristic vector as a weighting coefficient, wherein B is the wave band number of the hyperspectral image. Projecting the hyperspectral image matrix onto the selected feature vector matrix to obtain the feature matrix after dimension reduction.
And c, setting a plurality of convolution kernels in the last two convolution layers in the ResNet34 network identity residual block, determining the weight of each convolution kernel according to the input of the convolution layers, weighting the convolution kernels to obtain dynamic convolution kernels, and finally carrying out convolution operation to realize DConv matching. A residual joint feature network (dcres net) is constructed that contains dynamic convolution. Adding the obtained feature fusion data with the original input data subjected to linear convolution transformation element by element to obtain a later fusion feature;
And c1, setting a plurality of convolution kernels in the last two convolution layers of the ResNet34 network identity residual block, determining the weight of each convolution kernel according to the input of the convolution layers, weighting the convolution kernels to obtain dynamic convolution kernels, and finally carrying out convolution operation to realize dynamic convolution matching.
The conventional convolution layer has a static convolution kernel applied to all input samples. In the model, DConv dynamically weights given input along a four-dimensional convolution kernel space, so that convolution operation depends on the input of the model, and the capability of the model for acquiring input information and rich features is enhanced. Mathematically, a dynamic convolution operation can be defined as:
y(x i )=α[(k 1 w 1 +k 2 w 2 +···k n w n )*x i ]
x can obtain a dynamic convolution kernel weight value with data dependence through the attention module. For a specific input x i The dynamic convolution kernel may be generated by the above method, where k i =s i (x) Is a scalar weight that depends on the input sample, calculated using a routing function that contains a learnable parameter,and respectively representing convolution kernel parameters and paranoid parameters of d networks, wherein n is the standard kernel number of convolution kernel operation, and alpha represents an activation function. The dynamic convolutional layer linear mixing process is as follows:
α[(k 1 w 1 +k 2 w 2 +···k n w n )*x i ]=α(k 1 (w 1 *x i )+···k n (w n *x i ))
and x can obtain k attention parameters with different weights after passing through an attention module, and the k attention parameters are multiplied by k initialized convolution kernels to obtain k convolution kernels, and further the k convolution kernels with data dependence are obtained through weighted combination. Thus, the dynamic convolution has the same ability as the static convolution of n linear hybrids, with each routing function contributing to the dynamic convolution performance, the routing weights are calculated as follows:
s(x)=Sigmoid(Global AveragePool(x)*F)
Wherein F is a learnable weight;
the dynamic convolution only introduces two parts of additional computation, namely the weighted combination of the attention module kernel convolution kernels, and the network can obtain remarkable capacity improvement through a small amount of additional computation.
The goal of DConv networks is to increase the generation and expression capabilities of lightweight neural networks without increasing the depth and breadth of the network. The difference from a single core of conventional convolution is that dynamic convolution can combine a plurality of dynamic parallel convolution cores to form a dynamic core, the dynamic core has data dependence, parameters of the dynamic convolution core can be adjusted according to the dynamic difference of data, and the generation and expression capacity of a network can be effectively improved. The dynamic convolution flow is shown in fig. 5.
Step c2, constructing a residual joint characteristic network (DCResNet) containing dynamic convolution, namely, DCResNet conv1, conv2_x, conv3_x, conv4_x and conv5_x, wherein the convolution layers containing dynamic convolution are conv4_x and conv5_x, inputting the processed data with the size of 9 multiplied by band into an improved residual network, and outputting to obtain a 64 multiplied by 3 characteristic diagram;
it consists of five parts of conv1, conv2_x, conv3_x, conv4_x and conv5_x, and the parameter settings of the parts are shown in table 2. The model input size is 9 x 9 and the final conv5_x output image size is 1 x 1. Firstly, the processed data with the size of 9 multiplied by band is input into a common convolution layer of a residual network, the convolution kernel size is 7 multiplied by 7, the step size is 2, the padding is 3, the layer is used for extracting initial features of an image, the obtained result is input into a dynamic convolution layer with the size of 64 multiplied by 5, the convolution kernel in the dynamic convolution is 3 multiplied by 3, the step size is 1, the padding is 1, the convolution kernel of the dynamic convolution is obtained by multiplying attention and weight to obtain a convolution kernel after the batch size fusion, and 4 convolution kernels with the size of 3 multiplied by 3 are adopted as the dynamic convolution kernels for weighting processing to output the feature image with the size of 64 multiplied by 3.
U(x)=W 2 σ(W 1 x)
Wherein σ represents an improved linear function Mish, W 1 And W is 2 The weights of weight layer 1 and weight layer 2, respectively, then pass a shortcut, and the 2 nd ReLU to obtain the output H (x).
H(x)=U(x,{W i })+x
However, when the input and output dimensions need to be changed (e.g., changing the number of channels), a linear transformation W can be performed on x when performing a shortcut operation s
H(x)=U(x,{W i })+W s x
And c3, adding the fusion characteristics of the DCResNet and the original input data subjected to linear convolution transformation element by element to obtain refined characteristics, and capturing the spatial and spectral information characteristics of the input data.
Feature fusion data X q+1 And X q+2 The space size remains unchanged at 64 x 3. And weighting the characteristic feature fusion data and the original input data subjected to linear convolution transformation element by element to obtain the later fusion characteristic.
X q+2 =X q +F(X q ;ξ)
Wherein ζ= { H q+1 ,H q+2 ,b q+1 ,b q+2 },X q+2 Representing the input feature volume of layer (q+1), H q+1 And b q+1 N spatial convolution kernels each representing layer (q+1), F (X) q The method comprises the steps of carrying out a first treatment on the surface of the ζ) is the residual function of the two convolutional layer constructions, rather than directly mapping X using a jump connection q Where σ represents the modified linear function Mish, and X is the post-fusion feature obtained after weighting.
And d, combining the spatial feature pyramid network with a coordinate attention mechanism to construct an SPC module, and stacking the SPC module for 3 times. And acquiring and aggregating the multi-scale space features, the local and global information by utilizing an SPCS module. Then the SPCE module is utilized to adaptively enhance the spectrum characteristics under different spaces, so as to realize the joint attention to the spectrum-space information characteristics
Step d1, merging the feature graphs (FM) extracted by DCResNet by using an SPCS module to generate PF, then converging FM into merged feature vectors PFW and PFH in two directions, performing two-dimensional adaptive pooling operation on the FM, converging the FM into three feature graphs PF1, PF2 and PF3 with the sizes of 3×3, 6×6 and 8×8, and flattening and shaping the PF to generate feature RF. Aggregating multi-scale spatial features, local and global information RFW, RFH, RF1, RF2, RF 3;
combining the SPCS module and the SPCE module to construct a spatial pyramid coordinate attention mechanism combined module SPC, and stacking the SPC modules for 3 times, as shown in FIG. 4;
the structure of the SPC module 3 times stack is: SPCS module, SPCE module, batch normalization, 1×1 convolution, sigmoid, 5×5 convolution, concate layer, SPCS module, SPCE module, batch normalization, 1×1 convolution, sigmoid, 5×5 convolution, concate layer;
and d2, sending the feature map SF after SPCS aggregation into SPCE. The spectrum characteristics under different obtained spaces are adaptively enhanced by using SPCE, so that the joint attention to spectrum-space information characteristics is realized, and the structure of the space pyramid coordinate extrusion module is shown in fig. 6:
First, feature Maps (FM) are combined to generate PF. Specifically, one-dimensional global pooling is performed along the vertical and horizontal directions over FM. The spatial features of the local and global information can be effectively aggregated. The pooling operation for the c-th channel feature is specifically as follows:
where H and W represent the height and width of the pooling feature,and->Features of the c-th channels of PFW, PFH, PF, PF2, and PF3 generated by pooled FM are shown. AdapteveAvgPool (3), adapteveAvgPool (6) and AdaptaveAvgPool (8) represent adaptive average pooling with output sizes of 3×3, 6×6 and 8×8.
The cascading equation for the c-th channel of the feature RF in the spatial dimension is as follows:
wherein f sc Representing the output characteristics of the c-th channel after SPCS []Representing a join operation in the spatial dimension.And->Features RFW, RFH, RF, RF2 and RF3 after shaping in the c-th pass are shown, respectively.
And d3, combining the SPCS and the SPCE to construct a spatial pyramid coordinate attention mechanism combined module (Spatial Pyramid Coordinate Attention, SPC), and stacking the SPC module for 3 times to effectively enhance the combined characteristic information. The SPCE structure is shown in FIG. 7;
the feature map SF after SPCS aggregation is sent to SPCE to obtain the spectral space attention feature. SPCE can be decomposed into two steps, first reducing the SF channel to itself 1/r using a cx1 x 1 convolution. It is then split into two groups from the spatial dimension while the feature channels are extended to C. After the nonlinear operation is carried out on the channel, the channel weight with different spatial characteristics can be obtained, and in turn, the spectrum attention characteristics fused with different spatial information can be captured.
Specifically, the aggregate feature SF is first processed by 1 x 1 convolution and batch normalization.
f m =δ(F 1 (f s ))
Where δ is the Mish activation function. F1 represents a convolution operation. The core size is C/r×C×1×1, where C/r and C represent the number of output channels and input channels, respectively, and r represents the reduction ratio. f (f) s Is feature SF, and the feature MF size becomes C/r×s×1 after the above operation. Features are then split and reshaped along the spatial dimension and separated into two groups. One group is a row feature MFH and a column feature MFW having position information. Other features are MF1, MF2 and MF3, respectively 3×3, 6×6 and 8×8, respectively, containing multi-scale information.
To obtain a spectral attention feature with position information, a row feature MFH of size (C/r, H, 1) and a column feature MFW of size (C/r, W, 1), two convolutions are used to automatically learn the importance of each channel in the position feature to generate g h And g w The calculation method is as follows:
wherein F is h And F w A convolution operation with a kernel size of c×c/r×1×1 is shown.And->MFH and MFW are indicated, respectively. Will g h And g w After being applied to the feature map FM, the method is applied to g h 、g w And FM, and achieves attention to the information spectrum in the precise location area.
Wherein f c (h, w) is the characteristic of feature FM in the c-th channel,is characteristic of the c-th channel, and c o represents the coordinate feature.
To obtain spectral attention features of multi-scale information, the multi-scale features MF1, MF2 and MF3 are first convolved with convolution kernels of sizes c×c/r×3×3, c×c/r×6×6 and c×c/r×8×8, respectively, to obtain spectral attention features g at different scales 1 ∈R C×1×1 ,g 2 ∈R C×1×1 And g 3 ∈R C×1×1
Wherein the method comprises the steps ofAnd->Representing three scale-space features MF1, MF2 and MF3 after pooling the fusion features; g s ∈R C ×1×1 G is g 1 ,g 2 And g 3 The spatial pyramid coordinates of the c-th channel are characterized by the following:
wherein, the liquid crystal display device comprises a liquid crystal display device,the c-th channel feature, the superscript sp, represents the multi-scale feature, and finally, the pooled feature is combined with the spectrum attention feature through jump connection to generate the spatial pyramid coordinate attention feature.
Step d4, setting an activation function (i.e. an improved linear function represented by sigma ()) in the pyramid coordinate attention mechanism module as a Mish function;
the Mish activation function is a regularization function and has the function of avoiding saturation, and compared with the rigid zero boundary of the ReLU function, the Mish function can find a small convolution margin of a negative value to improve gradient flow, so that the model has better classification precision and generalization performance. The Mish function is applied to the SPCE, so that regularization property of the model can be enhanced, the situation of excessive fitting is prevented, and the realization formula is as follows:
f(v)=v·tanh(softplus(v))=v·tanh(ln(1+e v ))
Where v is the input data, tanh is the hyperbolic tangent function, softplus is an activation function, which can be considered as a smoothing of ReLU.
Step e: the combined feature map is input into an SPP layer and is converted into a matrix with a fixed size, the matrix is input into a full-connection layer based on Softmax, the extracted features are subjected to linear weighting, and finally a classification result is obtained through an output layer;
after the feature map is converted by the SPP layer, the extracted features are subjected to linear weighting to obtain a predicted output result, and the difference between the predicted output result and the actual result is calculated by a loss function, wherein the formula is as follows:
where y is the input sample, n is the number of categories, p (y i ) Is a label of a training sample, q (y i ) The model prediction type is that after the loss is calculated through the above formula, the model is back-propagated to obtain hyperspectral image characteristic representation;
and (3) obtaining a classification result through a Softmax output layer by expressing the characteristics of the hyperspectral image, wherein the Softmax function has the following formula:
wherein z is g The output value of the g node is L, namely the number of output nodes, namely the number of categories of the classifier, and L is the total number of categories.
Tables 1 and 2 show the proposed method and the comparison result of classifying MUUFL, trento datasets for six different methods of radial basis function-support vector machine (Radial Basis Function, RBF-SVM), extended morphology contour-support vector machine (Extended Morphological Profile, EMP-SVM), convolutional neural network (Convolutional Neural Networks, CNN), residual network (ResNet), spectral residual network (SSRN). It is evident that the method of the present invention has better classification performance than all the comparison methods, and that it has the highest classification index on both data sets.
For subjective evaluation of the test effect, fig. 2-8 and fig. 9-15 show truth diagrams of the patent data set and classification results of the methods. Compared with RBF-SVM, EMP-SVM, CNN, resNet and SSRN methods, the method is closer to real ground object distribution, the error classification area is smaller, and the effectiveness of the method in hyperspectral data classification is further demonstrated, and is particularly shown in tables 3 and 4.
Table 3 classification accuracy comparison table (%)
Category(s) RBF-SVM EMP-SVM CNN ResNet SSRN Ours
1 89.89±3.96 93.23±2.17 93.01±3.85 97.11±0.86 96.61±0.25 97.32±0.13
2 70.20±3.62 68.08±2.99 75.12±6.13 92.67±2.56 91.28±1.75 91.25±0.07
3 65.44±7.82 76.76±1.63 79.65±6.12 76.69±4.36 84.48±0.88 90.01±1.45
4 79.35±7.68 88.26±4.27 75.08±9.63 91.96±0.21 92.86±0.65 91.78±2.01
5 84.44±2.52 85.08±2.83 90.15±1.08 90.69±1.02 90.97±0.56 93.03±1.02
6 97.95±2.23 81.88±1.53 53.09±9.11 86.99±3.60 94.33±0.96 95.41±1.16
7 64.38±5.21 69.45±0.53 66.38±1.23 92.18±3.90 92.29±0.12 92.58±3.83
8 90.30±1.43 93.85±2.05 94.12±1.51 94.82±0.48 94.74±0.49 96.72±0.53
9 53.84±1.51 46.78±20.50 67.64±0.06 65.40±0.34 71.77±5.22 72.22±3.22
10 74.04±9.76 76.61±2.12 92.46±0.46 100.0±0.00 72.22±2.22 73.79±5.91
11 54.32±0.82 44.56±1.62 94.91±0.49 91.73±0.91 90.62±0.50 93.96±0.28
OA(%) 81.90±2.00 83.73±0.02 86.94±1.43 91.36±0.67 92.81±1.74 94.08±0.13
AA(%) 74.92±4.23 74.96±3.84 80.11±3.58 89.11±1.70 88.38±2.80 89.82±1.78
K×100 75.64±3.06 78.54±2.82 82.59±2.09 88.57±3.01 90.45±0.80 92.17±0.17
Table 4 classification accuracy comparison table (%)
Category(s) RBF-SVM EMP-SVM CNN ResNet SSRN Ours
1 82.59±8.53 82.59±8.53 96.53±3.02 94.85±2.16 97.60±2.76 98.90±0.06
2 83.56±3.92 83.56±3.92 82.95±0.68 83.52±6.41 88.45±6.95 89.50±5.01
3 96.89±2.20 96.89±2.20 97.61±3.03 92.91±1.00 94.91±0.72 98.48±1.22
4 95.37±2.10 95.37±2.10 89.01±0.08 99.48±0.45 99.15±1.17 99.98±0.01
5 92.40±2.76 92.40±2.76 94.92±0.35 99.25±0.01 98.45±1.78 98.68±0.13
6 77.43±4.61 77.43±4.61 87.41±6.37 81.47±0.93 88.52±0.71 95.99±1.73
OA(%) 89.52±1.25 89.52±1.25 93.17±1.27 94.84±0.09 96.19±0.56 98.32±0.41
AA(%) 88.04±0.70 88.04±0.70 91.37±2.25 91.91±1.01 94.51±0.89 96.92±1.36
K×100 85.91±1.72 85.91±1.72 90.80±1.73 93.11±1.25 94.91±0.76 96.29±0.35
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that the different dependent claims and the features described herein may be combined in ways other than as described in the original claims. It is also to be understood that features described in connection with separate embodiments may be used in other described embodiments.

Claims (10)

1. The hyperspectral image classification method combined with the spatial pyramid attention mechanism is characterized by comprising the following steps of:
removing spectral redundancy in hyperspectral images to be classified by adopting a principal component analysis method;
sampling the hyperspectral image with spectral redundancy removed by adopting a sliding window sampling method to obtain a sample set;
thirdly, carrying out multi-scale feature extraction and fusion on the sample set by adopting a ResNet34 shallow feature extraction network combined with dynamic convolution to obtain primary fusion features;
step four, carrying out linear convolution transformation on the sample set, and carrying out weighted fusion on the sample subjected to the linear convolution transformation and the primary fusion characteristic to obtain a pooling fusion characteristic;
step five, combining the spatial feature pyramid network with a coordinate extrusion attention mechanism SPCS and a coordinate attention excitation mechanism SPCE to construct SPC modules, stacking the three SPC modules, and respectively extracting multi-scale spatial features, local information and global information from the pooled fusion features; combining the extracted information to obtain spectrum-space information combined characteristics;
and step six, converting the spectrum-space information combined characteristics into a fixed size matrix by adopting a pooling layer of a space characteristic pyramid, and carrying out linear weighting on the fixed size matrix by adopting a full-connection layer based on Softmax to obtain a classification result.
2. The method for classifying hyperspectral images in combination with a spatial pyramid attention mechanism as claimed in claim 1, wherein in step one, the process of removing spectral redundancy in hyperspectral images to be classified by using a principal component analysis method is as follows:
step one, calculating covariance matrix, eigenvalue and eigenvector of hyperspectral image data to be classified;
step one, arranging feature vectors according to the sequence from the large feature value to the small feature value; the method comprises the steps of using a feature vector as a weighting coefficient, forming a feature vector matrix by n feature vectors with the largest feature value, projecting hyperspectral image data to be classified onto the feature vector matrix, adopting a principal component analysis method, calculating principal components of the feature matrix by using the weighting coefficient to obtain B principal component components, obtaining a dimension-reducing feature matrix, and realizing redundancy elimination of hyperspectral images to be classified, wherein n=3, and B is the band number of the hyperspectral images.
3. The method for classifying hyperspectral image in combination with spatial pyramid attention mechanism as claimed in claim 2, wherein in step one, the method for calculating covariance matrix of hyperspectral image data to be classified is as follows:
σ(q j ,q k )=E[(q j -E(q j ))(q k -E(q k ))]
wherein σ (q j ,q k ) Represents q j And q k Covariance between j, k= … m, m represents the number of eigenvalue matrix columns, E represents the matrix expectation, a represents the covariance matrix, q j And q k Respectively represent the jth random vector and the kth random vector after normalization.
4. The hyperspectral image classification method combining a spatial pyramid attention mechanism according to claim 1, 2 or 3, wherein in the third step, a ResNet34 shallow feature extraction network combined with dynamic convolution is adopted to extract and fuse multi-scale features of a sample set, and the specific method for acquiring primary fusion features is as follows:
step three, a dynamic convolution matching method is adopted to obtain the weight of a convolution kernel in a network identity residual block of the ResNet 34;
step three, constructing a residual error combined characteristic network containing dynamic convolution by utilizing the weight of the convolution kernel, and carrying out multi-scale characteristic extraction on a sample set to obtain a characteristic diagram;
and thirdly, fusing the features in the feature map by adopting a linear function to obtain primary fusion features.
5. The method for classifying hyperspectral images in combination with a spatial pyramid attention mechanism as claimed in claim 4, wherein in step three, the method for obtaining the weight of the convolution kernel in the network identity residual block of the res net34 by adopting a dynamic convolution matching method is as follows:
Using the formula:
y(x i )=α[(k 1 w 1 +k 2 w 2 +···k n w n )*x i ]
calculating to obtain the weight of the convolution kernel, wherein x represents the weight value of the dynamic convolution kernel with data dependence obtained through the attention module, and k n =s n (x) Is the nth scalar weight dependent on the input samples, n is the standard number of kernels for the convolution kernel operation,convolution kernel parameters and paranoid parameters representing d networks respectively, alpha representing an activation function, P representing P networks, x i Representing i dynamic convolution kernel total weight values, < ->Representing convolution kernel weights.
6. The method for classifying hyperspectral images in combination with a spatial pyramid attention mechanism as claimed in claim 4, wherein in step three two, the method for obtaining the feature map is as follows:
U(x)=W 2 σ(W 1 x)
wherein U (x) represents a feature map, σ () represents an improved linear function, W 1 And W is 2 The weights of weight layer 1 and weight layer 2, respectively.
7. The method for classifying hyperspectral images in combination with a spatial pyramid attention mechanism as claimed in claim 6, wherein in step three, the method for obtaining fused primary features is as follows:
H(x)=U(x,{W e })+W s x
wherein W is s Representing a linear transformation function, W e Represents W 1 Or W 2 H (x) representsAnd (5) primarily fusing the characteristics.
8. The hyperspectral image classification method combining a spatial pyramid attention mechanism according to claim 3, wherein in the fourth step, the sample after linear convolution transformation and the primary fusion feature are weighted, and the specific method for obtaining the pooled fusion feature is as follows:
X q+2 =X q +F(X q ;ξ)
Wherein ζ= { H q+1 ,H q+2 ,b q+1 ,b q+2 },X q ,X q+2 Respectively represent the input characteristic volumes of the q-th and q+1-th layers,representing the input characteristic volume H of the q+1 layer after dynamic convolution operation q+1 And b q+1 Features of n spatial convolution kernels of the (q+1) -th layer and linear transformations of the (q+1) -th layer, respectively, F (X) q The method comprises the steps of carrying out a first treatment on the surface of the ζ) is the residual function of the two convolution layer constructions of the q+1st layer and the Q layer, and Q is the pooling fusion feature.
9. The method for classifying hyperspectral images in combination with a spatial pyramid attention mechanism as claimed in claim 3, wherein in the fifth step, the specific method for obtaining the spectrum-spatial information combined feature is as follows:
fifthly, aggregating the pooling fusion features into two-direction combined feature vectors PFW and PFH by utilizing an SPCS module, and performing two-dimensional self-adaptive average pooling operation on the pooling fusion features to aggregate into three feature graphs PF1, PF2 and PF3;
flattening and shaping the three feature maps PF1, PF2 and PF3 to generate a feature multi-scale space feature RF1, local information RF2 and global information RF3; then adopting an SPCE module to aggregate PFW, PFH, RF, RF2 and RF3 to obtain aggregated characteristic information;
and fifthly, combining the pooling fusion characteristic with the characteristic information after aggregation by adopting a jump connection method to obtain the spectrum-space information combined characteristic.
10. The method for classifying hyperspectral images in combination with spatial pyramid attention mechanisms as claimed in claim 3, wherein in step six, the full-connection layer based on Softmax is used to linearly weight the fixed size matrix, and the specific method for obtaining the classification result is as follows:
the full-connection layer of Softmax is adopted to carry out linear weighting on the spectrum-space information combined characteristics to obtain a predicted output result, and the difference between the predicted output result and the actual result is calculated through a loss function to obtain loss:
then adopting a back propagation algorithm to obtain hyperspectral image features, and obtaining classification results from the hyperspectral image features through a Softmax output layer, wherein y is an input sample, n is a class number, and p (v) i ) Is a label of a training sample, q (v i ) Is a class of model predictions.
CN202310838661.0A 2023-07-10 2023-07-10 Hyperspectral image classification method combined with spatial pyramid attention mechanism Pending CN116843975A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310838661.0A CN116843975A (en) 2023-07-10 2023-07-10 Hyperspectral image classification method combined with spatial pyramid attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310838661.0A CN116843975A (en) 2023-07-10 2023-07-10 Hyperspectral image classification method combined with spatial pyramid attention mechanism

Publications (1)

Publication Number Publication Date
CN116843975A true CN116843975A (en) 2023-10-03

Family

ID=88161397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310838661.0A Pending CN116843975A (en) 2023-07-10 2023-07-10 Hyperspectral image classification method combined with spatial pyramid attention mechanism

Country Status (1)

Country Link
CN (1) CN116843975A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611930A (en) * 2024-01-23 2024-02-27 中国海洋大学 Fine granularity classification method of medical image based on CLIP
CN117765402A (en) * 2024-02-21 2024-03-26 山东科技大学 Hyperspectral image matching detection method based on attention mechanism

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611930A (en) * 2024-01-23 2024-02-27 中国海洋大学 Fine granularity classification method of medical image based on CLIP
CN117611930B (en) * 2024-01-23 2024-04-26 中国海洋大学 Fine granularity classification method of medical image based on CLIP
CN117765402A (en) * 2024-02-21 2024-03-26 山东科技大学 Hyperspectral image matching detection method based on attention mechanism
CN117765402B (en) * 2024-02-21 2024-05-17 山东科技大学 Hyperspectral image matching detection method based on attention mechanism

Similar Documents

Publication Publication Date Title
CN107316013B (en) Hyperspectral image classification method based on NSCT (non-subsampled Contourlet transform) and DCNN (data-to-neural network)
CN108491849B (en) Hyperspectral image classification method based on three-dimensional dense connection convolution neural network
CN111462126A (en) Semantic image segmentation method and system based on edge enhancement
CN110084159A (en) Hyperspectral image classification method based on the multistage empty spectrum information CNN of joint
CN116843975A (en) Hyperspectral image classification method combined with spatial pyramid attention mechanism
CN110533077B (en) Shape adaptive convolution depth neural network method for hyperspectral image classification
CN102938072B (en) A kind of high-spectrum image dimensionality reduction and sorting technique based on the tensor analysis of piecemeal low-rank
CN110728192A (en) High-resolution remote sensing image classification method based on novel characteristic pyramid depth network
He et al. A dual global–local attention network for hyperspectral band selection
CN104778476B (en) A kind of image classification method
CN103366184B (en) Polarization SAR data classification method based on hybrid classifer and system
CN106529484A (en) Combined spectrum and laser radar data classification method based on class-fixed multinucleated learning
CN113139512B (en) Depth network hyperspectral image classification method based on residual error and attention
CN104881682A (en) Image classification method based on locality preserving mapping and principal component analysis
CN114581773A (en) Multi-mode remote sensing data classification method based on graph convolution network
Paul et al. Dimensionality reduction using band correlation and variance measure from discrete wavelet transformed hyperspectral imagery
CN112580480A (en) Hyperspectral remote sensing image classification method and device
CN113673556A (en) Hyperspectral image classification method based on multi-scale dense convolution network
Zhang et al. Hyperspectral image classification using spatial and edge features based on deep learning
Shi et al. F 3 Net: Fast Fourier filter network for hyperspectral image classification
CN112381144B (en) Heterogeneous deep network method for non-European and Euclidean domain space spectrum feature learning
CN113052130B (en) Hyperspectral image classification method based on depth residual error network and edge protection filtering
CN106971402B (en) SAR image change detection method based on optical assistance
CN105719323A (en) Hyperspectral dimension reducing method based on map optimizing theory
CN116482618B (en) Radar active interference identification method based on multi-loss characteristic self-calibration network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination