CN113470036B

CN113470036B - Hyperspectral image unsupervised waveband selection method and system based on knowledge distillation

Info

Publication number: CN113470036B
Application number: CN202111023434.XA
Authority: CN
Inventors: 李树涛; 胡耀宸; 卢婷
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2021-09-02
Filing date: 2021-09-02
Publication date: 2021-11-23
Anticipated expiration: 2041-09-02
Also published as: CN113470036A

Abstract

The invention discloses a knowledge distillation-based hyperspectral image unsupervised waveband selection method and a knowledge distillation-based hyperspectral image unsupervised waveband selection system, wherein the method comprises the steps of dividing a hyperspectral image into image blocks; training a teacher network to extract a spatial spectrum feature from the image block of the hyperspectral image; training a student network for estimating the wave band weight vector corresponding to each image block, wherein the structure of the student network is more simplified than that of a teacher network, and a global nonlinear relation among wave bands is modeled through a channel attention module; calculating the importance weight W of each wave band based on the corresponding wave band weight of each image block, sequencing the obtained importance weight W of each wave band, and selecting the weights beforeLAnd taking each waveband as the obtained optimal waveband subset. The invention introduces the knowledge distillation idea in the deep neural network for band selection, leads the student network training with a more simplified structure through the teacher network, and leads the error to be easier to propagate reversely, thereby learning the importance weight of the band and realizing the optimal selection of a small number of representative bands.

Description

Hyperspectral image unsupervised waveband selection method and system based on knowledge distillation

Technical Field

The invention relates to a hyperspectral image processing technology, in particular to a method and a system for selecting a hyperspectral image unsupervised waveband based on knowledge distillation.

Background

The hyperspectral imaging technology is a novel imaging technology which integrates a map into a whole by carrying hyperspectral sensors (i.e. imaging spectrometers) on different space platforms, and imaging a target area by dozens to hundreds of continuous and subdivided spectral bands in ultraviolet, visible light, near infrared and mid-infrared spectral ranges, and simultaneously obtaining abundant space and spectral information of images. The hyperspectral remote sensing image has a three-dimensional data structure, the spectrum of the hyperspectral remote sensing image has the characteristics of continuous spectrum channels, a large number of wave bands and the like, the spectral resolution can reach the nanometer level, and the remote sensing earth observation capability and the discrimination capability of different surface features are greatly enhanced, so that the hyperspectral remote sensing image is widely applied to the fields of material exploration, surface feature classification, target detection and the like.

Due to the influence of complex factors such as imaging equipment, atmospheric environment, transmission media and the like, the hyperspectral image is difficult to avoid introducing noise to influence subsequent image processing and interpretation. In addition, the number of the wave bands of the hyperspectral image is large, the spectral dimension is high, the data volume is large, and the practical application is challenged. Firstly, the correlation among the wave bands of the hyperspectral images is strong, so that the information redundancy of the wave band images is caused; secondly, due to the characteristics of high dimensionality and high mass of the hyperspectral image, the calculation load of the image in the processes of transmission, storage and processing is large; finally, the problem of dimension disaster, namely the well-known Hughes phenomenon, exists in high-dimensional hyperspectral image classification, namely, the classification performance shows the trend of first improving and then reducing along with the continuous increase of the number of wave bands. Therefore, how to reduce the data dimensionality, remove redundant information and noise information of the hyperspectral image and improve the effectiveness and the high efficiency of data utilization on the premise of keeping important spatial spectrum information of the hyperspectral image becomes an important problem to be solved urgently.

The band selection method can be divided into two categories, namely supervised and unsupervised, according to whether label information is needed or not. The supervised wave band selection method generally takes the feature type information of an image as label information, and selects a wave band capable of improving feature separability to form a wave band subset; the unsupervised band selection method is to design an algorithm to select the band based on the characteristics of the data. However, in practical application, it takes time and labor to obtain a large amount of accurately marked sample information from a hyperspectral remote sensing image, so that an unsupervised algorithm is widely researched. For example, Chang et al in the literature "Constrained band selection for hyperspectral image, IEEE transactions on geoscience and remote sensing, 2006, 44(6): 1575- > 1585" propose a linear Constrained minimum variance method based on band dependence to select important bands. Zhu et al in the literature "unused hyperspectral band selection by dominant set extraction, IEEE Transactions on Geoscience and remove Sensing, 2015,54(1): 227-. A hyperspectral image band selection algorithm based on self-expression learning is proposed by Wei et al in the literature "Scalable one-pass selection-representation learning for hyperspectral band selection, IEEE Transactions on Geoscience and Remote Sensing,2019,57(7): 4360-4374". Wang et al propose an Optimal Neighborhood Reconstruction algorithm to solve the Band Selection problem in the literature, "Hyperspectral Band Selection video optical neighbor Reconstruction, IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(12): 8465-. The disadvantage of the above methods is that the global non-linear relationship between spectral image bands is not fully exploited.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a knowledge distillation-based hyperspectral image unsupervised waveband selection method and a knowledge distillation-based hyperspectral image unsupervised waveband selection system, and aims to establish a spectrum waveband global nonlinear mapping model (student network) by combining the spatial and spectral characteristics of a hyperspectral image, guide the student network to select an important waveband subset with small correlation and large information amount through a teacher network, and improve the subsequent ground feature classification performance.

In order to solve the technical problems, the invention adopts the technical scheme that:

a knowledge distillation-based hyperspectral image unsupervised waveband selection method comprises the following steps:

1) dividing the hyperspectral image into image blocks;

2) extracting a spatial spectrum characteristic from an image block of the hyperspectral image through a training teacher network;

3) training a student network for estimating a wave band weight vector corresponding to each image block based on a spatial spectrum feature extracted from the image block of the hyperspectral image sample by the teacher network, wherein the structure of the student network is more simplified than that of the teacher network, and a global nonlinear relation between wave bands is modeled by a channel attention module;

4) calculating the importance weight W of each wave band based on the corresponding wave band weight of each image block, sequencing the obtained importance weight W of each wave band, and selecting the weights beforeLAnd taking each waveband as the obtained optimal waveband subset.

Optionally, the teacher network in step 2) is a three-dimensional convolution self-encoder network, and the function expression of the teacher network for extracting the spatial spectrum features is as follows:

h _i ^k =f(f(x _i ω ^k-(1) +b ^k-(1))ω ^k +b ^k)

in the above formula, the first and second carbon atoms are,h _i ^krepresenting hyperspectral imagesiAn image blockx _iIn the teacher networkkThe spatial spectral characteristics of the output of the layer,fin order to be a linear rectifying activating function,ω ^k-(1)andb ^k-(1)are respectively teacher networkk-1 layer of convolution kernel parameters and biases,ω ^kandb ^kare respectively teacher networkkThe convolution kernel parameters and the bias of the layers.

Optionally, the three-dimensional convolutional self-encoder network comprises a first encoding layer, a second encoding layer, a pooling layer, a first decoding layer and a second decoding layer which are connected in sequence, wherein the first encoding layer and the second encoding layer respectively comprise a convolution module, a batch normalization module and a nonlinear activation function module, the first decoding layer and the second decoding layer respectively comprise a deconvolution module, a batch normalization module and a nonlinear activation function module, an input image block of the hyperspectral image is mapped to a new feature space through the first encoding layer and the second encoding layer, a spatial spectrum feature of the image block is obtained through the pooling layer, and a reconstructed image block is obtained through the first decoding layer and the second decoding layer by the spatial spectrum feature.

Optionally, the student network in step 3) includes a channel attention module, a weighting module and a nonlinear mapping module, which are connected in sequence, where the channel attention module is configured to obtain the second hyperspectral image according to the global nonlinear relationship between the modeling bandsiAn image blockx _iBand weight vector ofw _i(ii) a The weighting module is used for weighting the high spectrumFirst of the imageiAn image blockx _iWith corresponding band weight vectorw _iWeighting to obtain weighted image blocks; and the nonlinear mapping module is used for mapping the weighted image blocks to a new feature space and extracting features.

Optionally, the channel attention module obtains a second hyperspectral image of the arbitrary hyperspectral image by modeling a global nonlinear relationship between the bandsiAn image blockx _iBand weight vector ofw _iComprises the following steps: the second of the hyperspectral imageiAn image blockx _iPerforming global average pooling operation along spatial axis respectivelyavgpoolAnd maximum pooling operationmaxpoolObtaining the mean value and the maximum value of each channel characteristic, wherein the mean value and the maximum value of each channel characteristic are two values with the size of 1DA vector of (2), whereinDThe number of wave bands; respectively adding each size of 1DForward to a shared multi-layer perceptronMLPFor shared multi-layer perceptronMLPIs summed and then passed throughSigmoidActivating the function to obtain a hyperspectral imageiAn image blockx _iBand weight vector ofw _i。

Optionally, the nonlinear mapping module includes a first feature extraction layer, a second feature extraction layer and a pooling layer, which are connected in sequence, the first feature extraction layer and the second feature extraction layer include a convolution module, a batch normalization module and a nonlinear activation function module, which are connected in sequence, and the function expression of the nonlinear mapping module for extracting features is:

y _i ^j =f(f(x _i ω ^j-(1) +b ^j-(1))ω ^j +b ^j)

in the above formula, the first and second carbon atoms are,y _i ^jrepresenting hyperspectral imagesiAn image blockx _iIn a non-linear stateMapping module 1jThe characteristics of the output of the layer(s),fin order to be a linear rectifying activating function,ω ^j-(1)andb ^j-(1)respectively a non-linear mapping modulej-1 layer of convolution kernel parameters and biases,ω ^jandb ^jrespectively a non-linear mapping modulejThe convolution kernel parameters and the bias of the layers.

Optionally, the function expression of the loss function adopted in training the student network for estimating the band weight vector corresponding to each image block in step 3) is as follows:

in the above formula, the first and second carbon atoms are,Loss _sthe function of the loss is represented by,M×Nthe size of the space of the hyperspectral image,h _i ³the space spectrum features output by the teacher network,y _i ³in order to characterize the output of the student network,λin order to regularize the coefficients, the coefficients are,w _iis a hyperspectral imageiAn image blockx _iThe band weight vector of (2).

Optionally, the functional expression for calculating the band importance weight W of each band in step 4) is:

in the above formula, the first and second carbon atoms are,M×Nthe size of the space of the hyperspectral image,w _iis a hyperspectral imageiAn image blockx _iThe band weight vector of (2).

In addition, the invention also provides a knowledge distillation-based hyperspectral image unsupervised waveband selection system which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the knowledge distillation-based hyperspectral image unsupervised waveband selection method.

Furthermore, the invention also provides a computer readable storage medium having stored therein a computer program programmed or configured to execute the knowledge distillation based hyperspectral image unsupervised waveband selection method.

Compared with the prior art, the method has the following advantages:

1. the invention introduces a knowledge distillation idea in a deep neural network for waveband selection, wherein a teacher network has a more complex network structure and is used for learning effective feature representation of a hyperspectral image. Through the student network training with a more simplified teacher network guide structure, errors are easier to propagate reversely, so that the importance weight of a band is learned, and the optimal selection of a small number of representative bands is realized.

2. The student network of the invention models the global nonlinear relation between the wave bands through the channel attention module, obtains the importance weight reflecting the mutual relation of the wave bands and selects the wave bands.

Drawings

FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a training process of a teacher network and a student network in the embodiment of the invention.

Fig. 3 is a schematic structural diagram of a channel attention module according to an embodiment of the present invention.

FIG. 4 shows the number of bandsLIncreasing from 5 to 30 the overall accuracy of the present embodiment method versus the prior art method.

FIG. 5 shows the number of bandsLKappa coefficient contrast curves for the present example method and the prior art method increasing from 5 to 30.

FIG. 6 shows the number of selected bandsLIncreasing from 5 to 30 the average accuracy of the present method versus the prior art method.

Detailed Description

As shown in fig. 1, the knowledge-based distillation-based hyperspectral image unsupervised waveband selection method of the embodiment includes:

1) dividing the hyperspectral image into image blocks;

Referring to fig. 2, the Teacher Network and the Student Network in this embodiment form a Teacher-Student Network (Teacher-Student Network), that is, in this embodiment, a Teacher-Student Network (Teacher-Student Network) framework is used to implement Band Selection Based on the knowledge distillation idea, and for convenience of description, the name of the method in this embodiment is referred to as a Band Selection method (TSBS) Based on the Teacher-Student Network.

In the present embodiment, a hyperspectral image is represented as X = ∑ tonex ₁, x ₂,…,x _{M N×} }∈R ^{M N D××}The space size of which isM×NThe number of wave bands isD。x _iIs expressed as a size ofn×nAs an optional implementation manner, in this embodiment, the hyperspectral image blocknThe value is 5, i.e. the hyperspectral image block is divided into 5 × 5 with the pixel as the center. The purpose of band selection is to select a subset of bands that retain critical informationS∈R ^{M N L××}WhereinLMuch less thanD。

As an optional implementation manner, the teacher network in step 2) of this embodiment is a three-dimensional convolutional self-encoder network, and may extract effective features by utilizing spatial information and spectral information of a hyperspectral image to the maximum extent.

And the function expression of the teacher network for extracting the space spectrum features is as follows:

h _i ^k =f(f(x _i ω ^k-(1) +b ^k-(1))ω ^k +b ^k)

in the above formula, the first and second carbon atoms are,h _i ^krepresenting hyperspectral imagesiAn image blockx _iIn the teacher networkkThe spatial spectral characteristics of the output of the layer,ffor a linear commutation activation function (called ReLU for short),ω ^k-(1)andb ^k-(1)are respectively teacher networkk-1 layer of convolution kernel parameters and biases,ω ^kandb ^kare respectively teacher networkkThe convolution kernel parameters and the bias of the layers. As an optional implementation manner, in this embodimentk={1,2…,5}。

Referring to fig. 2, in this embodiment, the three-dimensional convolutional self-encoder network includes a first encoding layer, a second encoding layer, a pooling layer, a first decoding layer, and a second decoding layer, which are connected in sequence, where the first encoding layer and the second encoding layer both include a Convolution (Convolution) module, a Batch Normalization (Batch Normalization) module, and a nonlinear activation function (ReLU) module, the first decoding layer and the second decoding layer both include a Deconvolution (Deconvolution) module, a Batch Normalization (Batch Normalization) module, and a nonlinear activation function (ReLU) module, an image block of an input hyperspectral image is mapped to a new feature space through the first encoding layer and the second encoding layer, a spatio-spectral feature of the image block is obtained through the pooling layer, and the spatio-spectral feature is reconstructed by the first decoding layer and the second decoding layer. In this embodiment, the space spectrum feature is output through the pooling layer,kthe value is 3, so the empty spectrum feature output by the teacher network is recorded ash _i ³. The three-dimensional convolution self-encoder network consists of two parts: encoder for mapping an input to features, and decoder for mapping features to a reconstruction of an original input, in generalBy minimizing the difference between the input and the reconstruction, features that retain valid information can be obtained. In addition, the three-dimensional convolutional auto-encoder network may be from the second of the input hyper-spectral imagesiAn image blockx _iAnd the spatial spectral features along the spatial dimension and along the spectral dimension are automatically learned without manual design.

In this embodiment, the mean square error is used as a loss function when the teacher network is trained in step 2), and a function expression of the loss function is as follows:

in the above formula, the first and second carbon atoms are,Loss _trepresenting a loss function (reconstruction error),M×Nthe size of the space of the hyperspectral image,x _irepresenting hyperspectral imagesiIndividual image block, first of hyperspectral imageiAn image blockx _iIs of a size ofn×n，

Second for Hyperspectral images for teacher networksiAnd reconstructing the image block obtained by the image blocks. In this embodiment, the training teacher network performs feature extraction along the spatial dimension and along the spectral dimension on the hyperspectral image, and iteratively trains the three-dimensional convolutional self-encoder network by minimizing the reconstruction error (loss function shown in the above formula) of the original hyperspectral image to obtain an effective spatial spectral featureh _i ³。

Referring to fig. 2, the student network in step 3) of this embodiment includes a channel attention module, a weighting module, and a nonlinear mapping module, which are connected in sequence, where the channel attention module is used to obtain the second hyperspectral image of any hyperspectral image through the global nonlinear relationship between modeling bandsiAn image blockx _iBand weight vector ofw _i(ii) a The weighting module is used for weighting the first hyperspectral imageiAn image blockx _iWith corresponding band weight vectorw _iWeighting to obtain weighted image blocks; and the nonlinear mapping module is used for mapping the weighted image blocks to a new feature space and extracting features. The student network obtains the wave band importance weight vector by modeling the nonlinear relation between wave bands through the channel attention module, in the process, the wave band with richer effective information obtains higher weight, then the input image block is multiplied by the corresponding wave band weight, and the nonlinear mapping module maps the reweighed image block to a new feature space.

As shown in FIG. 3, in this embodiment, the channel attention module obtains the second hyperspectral image by modeling the global nonlinear relationship between the bandsiAn image blockx _iBand weight vector ofw _iComprises the following steps: the second of the hyperspectral imageiAn image blockx _iPerforming global average pooling operation along spatial axis respectivelyavgpoolAnd maximum pooling operationmaxpoolObtaining the mean value and the maximum value of each channel characteristic, wherein the mean value and the maximum value of each channel characteristic are two values with the size of 1DA vector of (2), whereinDThe number of wave bands; respectively adding each size of 1DForward to a shared multi-layer perceptronMLPFor shared multi-layer perceptronMLPIs summed and then passed throughSigmoidActivating the function to obtain a hyperspectral imageiAn image blockx _iBand weight vector ofw _iUsing a functional expression, it can be expressed as:

。

the weighting module is used for weighting the first hyperspectral imageiAn image blockx _iWith corresponding band weight vectorw _iWeighting to obtain weighted image blocks, wherein the weighted image blocks can be expressed as follows by adopting a function expression:

in the above formula, the first and second carbon atoms are,

representing the image block after the weighting and,

is a dot product.

Referring to fig. 2, the nonlinear mapping module in this embodiment includes a first feature extraction layer, a second feature extraction layer and a pooling layer, which are connected in sequence, the first feature extraction layer and the second feature extraction layer both include a Convolution (Convolution) module, a Batch Normalization (Batch Normalization) module and a nonlinear activation function (ReLU) module, which are connected in sequence, and the function expression of the nonlinear mapping module for extracting features is:

y _i ^j =f(f(x _i ω ^j-(1) +b ^j-(1))ω ^j +b ^j)

in the above formula, the first and second carbon atoms are,y _i ^jrepresenting hyperspectral imagesiAn image blockx _iIn a non-linear mapping blockjThe characteristics of the output of the layer(s),fin order to be a linear rectifying activating function,ω ^j-(1)andb ^j-(1)respectively a non-linear mapping modulej-1 layer of convolution kernel parameters and biases,ω ^jandb ^jrespectively a non-linear mapping modulejThe convolution kernel parameters and the bias of the layers. Whereinj < kIn this example, takej = {1，2}。

Referring to fig. 2, in step 3) of this embodiment, the function expression of the loss function adopted when training the student network for estimating the band weight vector corresponding to each image block is as follows:

in the above formula, the first and second carbon atoms are,Loss _sthe function of the loss is represented by,M×Nthe size of the space of the hyperspectral image,h _i ³the space spectrum features output by the teacher network,y _i ³in order to characterize the output of the student network,λin order to regularize the coefficients, the coefficients are,w _iis a hyperspectral imageiAn image blockx _iThe band weight vector of (2). In this embodiment, the reconstruction error of the effective feature is measured by constructing the loss function, the student network is trained by minimizing the loss function, and the band weight vector is updated. In the above formula, the first term on the right side is the reconstruction error between the spatial spectrum feature of the teacher network and the feature output by the student network, and the second term is the first term of the hyperspectral imageiAn image blockx _iBand weight vector ofw _iThe sparsity constraint of (1). As an optional implementation manner, in this embodiment, the regularization coefficientλThe value is 0.01. When the value of the loss function is large, it means that the selected band is not well mapped to the valid features of the teacher's network. And (3) performing iterative optimization on the channel attention module by minimizing the loss value in the training process to finally obtain the optimal waveband weight vector.

In this embodiment, the function expression for calculating the band importance weight W of each band in step 4) is as follows:

in the above formula, the first and second carbon atoms are,M×Nthe size of the space of the hyperspectral image,w _iis a hyperspectral imageiAn image blockx _iThe band weight vector of (2). The importance weights W of the wave bands are sorted in descending order, the larger the importance weight W of the wave band is, the more effective information contained in the wave band is, and the more effective information before selection isLEach band constitutes a band subset, i.e. an optimal band selection resultS∈R ^{M N L××}。

To verify the effectiveness of the method of the present embodiment (TSBS), the performance of the proposed method was evaluated using the public data set PaviaU and compared to other existing methods. The public data set PaviaU is the region of university of italian parkia detected by the rosss sensor in 2003, has a spatial size of 610 × 340, and includes 115 bands, 12 of which are removed due to the influence of noise, and in this embodiment, a noise-removed 103 band image is used. The image contains 9 types of objects, and the number of labeled samples is 42766. For the teacher network, the feature extraction network was trained using a 10% data set. For student networks, all data sets were used for training and testing. Optionally, in this embodiment, the Adam optimizer is used to optimize the network parameters, the batch size (batch size) is set to 64, and the learning rate is set to 1e ^-5The number of training times is 100.

In this embodiment, a support vector machine is selected as a classifier (SVM classifier for short) to evaluate the performance of various methods. Parameters C and γ of the SVM classifier are determined by cross validation, the kernel function is a radial basis kernel function, and 50 samples are randomly selected for training the SVM classifier for each class. Three objective indices were used to evaluate the classification accuracy, i.e. Overall Accuracy (OA), Kappa coefficient (Kappa) and Average Accuracy (AA). The number range of the selected wave bands is 5-30, and the step length is 5. For a fair comparison, the three evaluation indices were averaged over 10 classification results. The algorithm proposed in this embodiment is compared with several unsupervised band selection methods, including a linear constrained minimum variance band selection method (LCMV), a dominant set extraction based band selection method (DESBS), an extensible self-expression learning band selection method (SOP-SRL), and a deep learning based feature selection method (TSFS). Specific test results are shown in tables 1 to 3 and fig. 4 to 6, where tables 1 to 3 are comparison tables of classification results between the method of the present invention and other conventional methods, and fig. 4, 5 and 6 are comparison tables of classification results between the method of the present invention and other conventional methods.

Table 1: number of bandsLIncreasing from 5 to 30 for the method of the present embodiment and the prior artTable comparing the overall accuracy of the method.

Table 2: number of bandsLKappa coefficient comparison table for this example method and the prior art method when increasing from 5 to 30.

Table 3: number of bandsLThe average accuracy of the method of the embodiment and the prior method is compared when the number is increased from 5 to 30.

As can be seen from tables 1 to 3 and FIGS. 4 to 6, the number of selected bands is not limited to the aboveLWhen the number is increased from 5 to 30, the classification precision of different band selection methods is improved. The classification accuracy of all methods improves as the number of bands increases. The proposed method is superior to LCMV, dessb and TSFS in three objective indices. Only whenLSet to 20, SOP-SRL can achieve slightly higher OA and Kappa coefficients, and the method of the present embodiment performs better in most cases, i.e., it isL = 5, 15, 25 and 30. It is particularly noted that, as can be seen from the numerical comparison in FIG. 4, whenLThe proposed method has obvious advantages when it is small. For example, when 5 bands are selected, the proposed method TSBS achieves OA values that are 24.25%, 18.2%, 9.72% and 5.9% higher than LCMV, TSFS, SOP-SRL and DESBS, respectively. This is consistent with the purpose of band selection: and selecting as few bands as possible, reserving key information of the original data, and removing redundant information.

In summary, the present embodiment utilizes the idea of knowledge distillation, and introduces a teacher with a more complex structure and better performance to learn more effective image feature representation through network, so as to guide student network training with a more simplified structure and lower complexity, and enable errors to be propagated in reverse direction more easily, thereby implementing optimal selection of a small number of representative bands. The band selection method proposed in this embodiment mainly includes three stages: firstly, a teacher network realizes the extraction of the spatial spectrum feature of a hyperspectral image through a three-dimensional self-encoder, so as to obtain the effective feature representation of the image; secondly, training a student network under the guidance of a teacher network, specifically combining a channel attention module and a nonlinear mapping module, and learning the importance weight of a waveband by minimizing the reconstruction error of effective characteristics; and finally, sequencing the wave band weights to obtain a small number of optimal wave band selection results. The band selection performance is verified through the hyperspectral image classification result, and the method is shown to be capable of effectively selecting representative characteristic band subsets.

In addition, the present embodiment also provides a knowledge-based distillation hyperspectral image unsupervised waveband selection system, which includes a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to execute the steps of the aforementioned knowledge-based distillation hyperspectral image unsupervised waveband selection method.

Furthermore, the present embodiment also provides a computer-readable storage medium, in which a computer program is stored, which is programmed or configured to execute the aforementioned knowledge-based distillation hyperspectral image unsupervised waveband selection method.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. A knowledge distillation-based hyperspectral image unsupervised waveband selection method is characterized by comprising the following steps:

1) dividing the hyperspectral image into image blocks;

4) calculating the importance weight W of each wave band based on the corresponding wave band weight of each image block, sequencing the obtained importance weight W of each wave band, and selecting the weights beforeLThe wave bands are used as the obtained optimal wave band subsets;

the student network in the step 3) comprises a channel attention module, a weighting module and a nonlinear mapping module which are sequentially connected, wherein the channel attention module is used for obtaining the second hyperspectral image of any hyperspectral image through the global nonlinear relation between modeling wave bandsiAn image blockx _iBand weight vector ofw _i(ii) a The weighting module is used for weighting the second part of the hyperspectral imageiAn image blockx _iWith corresponding band weight vectorw _iWeighting to obtain weighted image blocks; the nonlinear mapping module is used for mapping the weighted image blocks to a new feature space and extracting features; the channel attention module obtains the second hyperspectral image of any hyperspectral image through the global nonlinear relation between modeling wave bandsiAn image blockx _iBand weight vector ofw _iComprises the following steps: the second of the hyperspectral imageiAn image blockx _iPerforming global average pooling operation along spatial axis respectivelyavgpoolAnd maximum pooling operationmaxpoolObtaining the mean value and the maximum value of each channel characteristic, wherein the mean value and the maximum value of each channel characteristic are two values with the size of 1DA vector of (2), whereinDThe number of wave bands; respectively adding each size of 1DForward to a shared multi-layer perceptronMLPFor shared multi-layer perceptronMLPIs summed and then passed throughSigmoidActivating the function to obtain a hyperspectral imageiAn image blockx _iBand weight vector ofw _i。

2. The knowledge distillation-based hyperspectral image unsupervised waveband selection method according to claim 1, wherein the teacher network in the step 2) is a three-dimensional convolution self-encoder network, and the functional expression of the teacher network for extracting the spatial spectrum features is as follows:

h _i ^k =f(f(x _i ω ^k-(1) +b ^k-(1))ω ^k +b ^k)

3. The knowledge-distillation-based hyperspectral image unsupervised waveband selection method according to claim 2, wherein the three-dimensional convolutional self-encoder network comprises a first encoding layer, a second encoding layer, a pooling layer, a first decoding layer and a second decoding layer which are sequentially connected, wherein the first encoding layer and the second encoding layer respectively comprise a convolution module, a batch normalization module and a nonlinear activation function module, the first decoding layer and the second decoding layer respectively comprise a deconvolution module, a batch normalization module and a nonlinear activation function module, an image block of an input hyperspectral image is mapped to a new feature space through the first encoding layer and the second encoding layer, a spatio-spectral feature of the image block is obtained through the pooling layer, and the spatio-spectral feature obtains an image block reconstruction through the first decoding layer and the second decoding layer.

4. The knowledge distillation-based hyperspectral image unsupervised waveband selection method according to claim 1, wherein the nonlinear mapping module comprises a first feature extraction layer, a second feature extraction layer and a pooling layer which are connected in sequence, the first feature extraction layer and the second feature extraction layer respectively comprise a convolution module, a batch normalization module and a nonlinear activation function module which are connected in sequence, and the function expression of the feature extracted by the nonlinear mapping module is as follows:

y _i ^j =f(f(x _i ω ^j-(1) +b ^j-(1))ω ^j +b ^j)

in the above formula, the first and second carbon atoms are,y _i ^jrepresenting hyperspectral imagesiAn image blockx _iIn a non-linear mapping blockjThe characteristics of the output of the layer(s),fin order to be a linear rectifying activating function,ω ^j-(1)andb ^j-(1)respectively a non-linear mapping modulej-1 layer of convolution kernel parameters and biases,ω ^jandb ^jrespectively a non-linear mapping modulejThe convolution kernel parameters and the bias of the layers.

5. The knowledge distillation-based hyperspectral image unsupervised waveband selection method according to claim 4, wherein the function expression of the loss function adopted in training the student network for estimating the waveband weight vector corresponding to each image block in the step 3) is as follows:

in the above formula, the first and second carbon atoms are,Loss _sthe function of the loss is represented by,M×Nthe size of the space of the hyperspectral image,h _i ³network for teachersThe output space spectrum characteristics are shown in the figure,y _i ³in order to characterize the output of the student network,λin order to regularize the coefficients, the coefficients are,w _iis a hyperspectral imageiAn image blockx _iThe band weight vector of (2).

6. The knowledge distillation-based hyperspectral image unsupervised waveband selection method according to claim 5, wherein the function expression for calculating the waveband importance weight W of each waveband in the step 4) is as follows:

7. A knowledge distillation based hyperspectral image unsupervised waveband selection system comprising a microprocessor and a memory connected with each other, characterized in that the microprocessor is programmed or configured to perform the steps of the knowledge distillation based hyperspectral image unsupervised waveband selection method of any of claims 1 to 6.

8. A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, the computer program being programmed or configured to perform the unsupervised waveband selection method for hyperspectral image based knowledge distillation according to any of claims 1 to 6.