CN114782246A

CN114782246A - Hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection

Info

Publication number: CN114782246A
Application number: CN202210273806.2A
Authority: CN
Inventors: 肖亮; 方健
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-03-19
Filing date: 2022-03-19
Publication date: 2022-07-22

Abstract

The invention discloses a hyperspectral super-resolution method by using asymmetric attention and wavelet sub-band injection, which mainly comprises the following steps: extracting multispectral multiresolution high-frequency detail characteristics by utilizing discrete wavelet transform; constructing a detail extraction network encoder; constructing a detail extraction network decoder; preprocessing the low-resolution hyperspectral image; constructing a space-spectrum fusion network encoder; constructing an asymmetric feature selection attention module; constructing a space-spectrum fusion network encoder; the network was trained using the L1 loss function. According to the invention, multispectral depth multiresolution details are extracted through a wavelet network and injected into U-Net, so that the learned hyperspectral image has better empty spectrum detail information.

Description

Hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection

Technical Field

The invention relates to a hyperspectral super-resolution method, in particular to a hyperspectral super-resolution method by using asymmetric attention and wavelet subband injection.

Background

The hyperspectral image has dozens to hundreds of wave bands, and compared with the multispectral image, the ground object can be clearly distinguished. However, in reality, it is difficult to obtain HS images with high spatial resolution due to hardware or budget constraints. Therefore, it is now common to fuse the LR-HS image and the HR-MS image to obtain the HR-HS image.

With the development of deep learning, more and more deep learning models are applied in hyperspectral super-resolution, such as space spectrum fusion network (SSF), deep Two-branch convolutional neural network (Two-CNN), Deep Recursive Network (DRN), progressive zero-center residual error network (PZRes-Net), and deep space spectrum attention convolutional neural network (DSSA-CNN). Some methods utilize deep learning network learning degradation models to reconstruct HS images, such as deep hyperspectral shaping (dhsis) and deep blind source hyperspectral fusion (DBIN), in order to increase the interpretability of deep learning, researchers propose deep deployment networks to solve the fusion HS problem, such as multispectral hyperspectral fusion networks (MHF-Net), fusion variational networks (VaFuNet), and model-dominated hyperspectral image super-resolution (MOG-DCN). These deep learning methods only use the concatenation of HS images and MS images along the spectral channel as the input to the network, and do not fully consider the potential multi-scale spatial information.

Researchers have proposed fusing HS images and MS images at different scales. For example, Zhou et al [ Zhou, Feng, et al. "Pyramid full fusion volumetric network for hyperspectral and multispectral image fusion." IEEE Journal of Selected Topics in Applied Earth updates and Remote Sensing 12.5(2019): 1549-. The network mainly consists of two sub-networks, the first sub-network is used for extracting spectral information in an LR-HS image through a convolution kernel and coding the spectral information into depth features, and the second sub-network is used for integrating an HR-MS image pyramid with the coded depth features to obtain an HR-HS image. Xu et al [ Xu, Shuang, et al, "HAM-MFN: Hyperspectral and Multispectral image multiscale fusion network with RAP loss." IEEE Transactions on Geoscience and Remote Sensing 58.7(2020): 4618-. The network firstly gradually amplifies the depth characteristics of an LR-HS image by utilizing deconvolution, then fuses the depth characteristics of the LR-HS image and the MS image on different scales, and finally obtains an HR-HS image. But this structure ignores the basic and shallow features of MSI. Thus, Xiao et al [ Xiao J, Li J, Yuan Q, et al, "A Dual-UNet with Multistage Details Injection for Hyperspectral Image Fusion," IEEE Transactions on Geoscience and Remote Sensing,2021 ] propose a Dual U-Net Fusion method that first extracts multispectral spatial features of different scales using an encoder-decoder network, and then injects these scale features into the U-Net network to reconstruct HS images. However, these networks simply use convolution to extract MS image multi-scale depth features and inject HS images to enhance resolution, and the fusion purpose should be to use spatial detail features of MS to inject HS images. Simply extracting the multi-scale features of the MS by convolution does not allow targeted acquisition of detailed features.

Disclosure of Invention

The invention aims to provide a hyperspectral super-resolution method by utilizing asymmetric attention and wavelet sub-band injection.

The technical solution for realizing the invention is as follows: a hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection comprises the following steps:

firstly, extracting multispectral multiresolution high-frequency detail characteristics by utilizing discrete wavelet transform;

secondly, constructing a detail extraction network encoder according to the high-frequency details extracted by the discrete wavelet transform and convolution operation;

thirdly, constructing a detail extraction network decoder through the depth characteristics and deconvolution operation of the detail extraction encoder;

fourthly, preprocessing the low-resolution hyperspectral image, namely, up-sampling the low-resolution hyperspectral image to the same space size of the hyperspectral image;

fifthly, utilizing the hyperspectral image, the high-resolution multispectral image and the depth feature extracted by the detail extraction network encoder after upsampling to construct a space-spectrum fusion network encoder;

sixthly, selecting the depth multi-resolution features output by the space spectrum fusion network encoder by using an asymmetric feature selection attention module;

seventhly, selecting depth characteristics output by an attention module, a space spectrum fusion network encoder and a detail extraction network decoder by using the asymmetric characteristics, constructing a space spectrum fusion network decoder, and obtaining a fusion result by using a Relu activation function;

in an eighth step, the network is trained using the L1 loss function.

An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the above-described hyper-spectral super resolution method with asymmetric attention and wavelet subband injection.

The method of the invention has simple structure, and compared with the prior art, the method has the remarkable characteristics that: (1) firstly, extracting multi-resolution wavelet detail features by utilizing discrete wavelet transform, then converting the detail features into depth detail features by using an encoder-decoder, and injecting the depth detail features into an HS image to enhance the resolution; (2) selecting different resolution characteristics by using an asymmetric characteristic selection attention module; (3) the network has simple structure and low computation complexity.

The present invention is described in further detail below with reference to the attached drawing figures.

Drawings

FIG. 1 is a block diagram of the process of the present invention.

FIG. 2 is a block diagram of an asymmetric feature selection attention module.

FIG. 3 is a pseudo-color composite and corresponding mean square error images (

bands

30, 20, 10) of the fusion results of the method of the present invention on real and fake applets data. (a) CNMF, (b) HySure, (c) iccv, (d) DHSIS, (e) DBIN, (f) CNN-Fus, (g) MOG-dcn, (h) DMW-UNet.

FIG. 4 is a comparison of MPSNR curves of various bands of fusion results of different methods on real and fake applets data by the method of the present invention.

FIG. 5 is a MSSIM curve comparison of each band of fusion results of different methods on real and fake applets data by the method of the present invention.

FIG. 6 is a pseudo-color composite image and corresponding mean square error image (

bands

30, 20, 10) of the fusion results of the different methods of the present invention on img1 data. (a) CNMF. (b) HySure. (c) ICCV. (d) dhsis. (e) DBIN. (f) CNN-Fus. (g) MOG-DCN. (h) DMW-UNet.

FIG. 7 is a comparison of MPSNR curves of various bands of fusion results of different methods on img1 data by the method of the present invention.

Fig. 8 is a MSSIM curve comparison of various bands of fusion results of different methods on img1 data by the method of the present invention.

Detailed Description

Aiming at the problems in the prior art, the method firstly utilizes discrete wavelet transform to extract the multi-scale detail features of the MS, then utilizes convolution to learn the detail features to obtain the multi-scale depth detail features, and finally injects the features into U-Net to enhance the resolution of the HS image. The method has the capability of extracting the space-spectrum characteristics of the hyperspectral image in the Euclidean space and the topological space at the same time, and has excellent performance when being applied to the super-resolution of the hyperspectral image. The experimental results show that the average peak signal-to-noise ratio (MPSNR) on the CAVE dataset was 47.0245 and the average peak signal-to-noise ratio (MPSNR) on the Harvard dataset was 45.8118. Experimental results show that the network structure provided by the invention can well recover the spatial and spectral information of the HS image.

The implementation process of the invention is described in detail with reference to fig. 1, and the steps of the method of the invention are as follows:

in the first step, multispectral high-frequency detail characteristics are extracted by utilizing discrete wavelet transform. Note book

Representing a high resolution multispectral image, wherein H, W and b represent the number of high, wide, and wavebands, respectively, of the high resolution multispectral image. Multi-resolution high-frequency detail information of a multispectral image is extracted by utilizing a Haar discrete wavelet transform (a filter bank is 'db 1'), wherein a low-pass filter is represented by phi, and a high-pass filter bank is represented by psi. The low-frequency subband images listed as low-pass filters on the d-th scale are represented as

The high-frequency subband images in three directions are represented as

Wherein, in the process,

and

representing the conjugate of phi and psi respectively,

c and separable filter representing d-th scale

Convolution of (C)_d-1Representing the low-pass sub-band image at the d-1 scale,

and

respectively representing high-frequency subband images in the horizontal, vertical and diagonal directions at the d-th scale. The multispectral image is subjected to discrete wavelet transform to obtain a low-frequency sub-band image C₁And a high frequency subband image W₁ ¹、W₁ ²And W₁ ³And the three high-frequency sub-band images are spliced in the channel dimension to obtain a high-frequency image W₁Is represented as

W₁＝Concat(W₁ ¹,W₁ ²,W₁ ³)

Where Concat (.) represents the channel dimension splicing operation. Due to low frequency subband image C₁Medium and high frequenciesInformation, so obtaining a low-frequency subband image C using a discrete wavelet transform on a multispectral low-frequency subband image₂And a high frequency subband image W₂ ¹、W₂ ²And W₂ ³And the three high-frequency sub-band images are spliced in the channel dimension to obtain a high-frequency image W₂Is shown as

And secondly, constructing a detail extraction network encoder according to the high-frequency image extracted by the discrete wavelet transform and the convolution operation. Firstly, for the high-frequency image W₁The depth feature of the first output of the detail extraction encoder is obtained by convolution with 3 x 3, which is expressed as

WE₁＝Conv_3×3(W₁)

Wherein, Conv_3×3(.) represents a convolution operation with a convolution kernel of 3 x 3. Since the direct pooling operation results in a large loss of high frequency information, a discrete wavelet transform is used instead of the down-sampling operation. To WE₁Obtaining depth detail information by discrete wavelet transform, and obtaining W by channel dimension splicing₃. Then the high frequency image W is processed₂And W₃Splicing in channel dimension, and obtaining depth feature of the second output of the detail extraction encoder by a convolution of 3 × 3, which is expressed as

WE₂＝Conv_3×3(Concat(W₂,W₃))

And thirdly, constructing a detail extraction network decoder through the depth features extracted by the encoder and deconvolution operation. The purpose of the detail extraction network decoder is to supplement the information of the space spectrum joint network decoder, and the decoder consists of deconvolution with the step size of 2 and a convolution of 3 x 3. Depth characterization of the three outputs of the decoder, denoted as

Wherein Deconv (. lam.) denotes deconvolution, WD_nThe depth features of the output of the network decoder are extracted to show the details of the nth stage.

Fourthly, preprocessing the low-resolution hyperspectral image, namely sampling the low-resolution hyperspectral image to the space with the same size of the hyperspectral image at most, wherein the conversion process is

Wherein, Up (.) represents the spatial upsampling operation, the upsampling mode is bilinear interpolation, and the scale factor is 8.

And fifthly, utilizing the up-sampled hyperspectral image, high-resolution multispectral image and depth features extracted by the detail extraction network encoder to construct a space-spectrum fusion network encoder. This section uses the residual block to extract depth features, so the residual block will be briefly described below. The residual block consists of two 3 x 3 convolutions and the Relu activation function. The process is represented as

RB(X_in)＝Conv_3×3(Relu(Conv_3×3(X_in)))+X_in

Wherein X_inRepresenting the input characteristics, Relu (.) representing the Relu function, and RB (.) representing the residual block operation. The first output characteristic of the encoder is the depth characteristic obtained by splicing an up-sampled HS image and an MS image in a channel dimension as original input, then learning the spliced depth characteristic by using a convolution of 3 multiplied by 3 and a residual block. And then splicing the maximum pooling feature and the depth detail feature output by the encoder in the order of n-1, and learning the spliced depth feature by using the convolution of 3 multiplied by 3 and the residual block to obtain the output feature of the encoder in the order of n. The process is represented as

Wherein, MUe_nRepresents the output characteristics of the nth stage encoder, RB (-) represents the residual operation, maxpool (-) represents the maximum pooling operationDo this.

Sixthly, as shown in fig. 2, selecting the features output by the space-spectrum fusion network encoder by using an asymmetric feature selection attention module. Below is MUe₀An example of selecting an attention module for a baseline asymmetric feature is presented. First, the space size and the channel number size of the input depth feature are changed to MUe using deconvolution and convolution₀Again, the process is represented as

Then, the three obtained depth features are added element by element to obtain a depth feature Su, and the process is expressed as

The spatial and channel attention mechanisms will be used in combination to select important features for the multi-resolution features as follows.

In order to obtain the channel attention coefficient of the depth feature Su, the global receptive field of the Su is firstly extracted by using a global average pooling operation, each feature channel is abstracted into a feature point, and the process is expressed as the process of

Cs＝SAvgpool(Su)

Wherein SAvgpool (-) represents a spatial dimension averaging pooling operation. Then, a multilayer perceptron network of two layers is utilized to carry out nonlinear feature transformation to construct correlation between feature graphs, and the process is expressed as

Cz＝fc(Relu(fc(Cs)))

Where fc (.) denotes the fully connected layer.

In order to obtain the spatial attention coefficient of the depth feature, firstly, performing average pooling and maximum pooling on Su channel dimensions to obtain two depth features with the same spatial dimension as the depth feature Su and channel dimension of 1. The two depth features are then stitched together in the channel dimension, which is represented as

Ss＝Concat(CAvgpool(Su),CMaxpool(Su))

Wherein, CAvgpool (means) represents the average pooling operation of the channel dimension, and CMaxpool (means) represents the maximum pooling operation of the channel dimension. Then, 7 × 7 convolution calculation is performed on Ss to obtain spatial attention coefficient, which is expressed as the process

Sz＝Conv_7×7(Ss)

Multiplying the obtained space and channel attention coefficients to obtain a space spectrum attention coefficient expressed as

Sc＝Cz*Sz

Three spatial spectral attention coefficients are obtained using three 1 x 1 convolutions, which is represented as

SC_i＝Conv_1×1(Sc),i∈{0,1,2}

Wherein, Conv_1×1Representing a 1 × 1 convolution. Then using the softmax function for the three attention coefficients, we get Sa + Sb + Sc to 1, expressed as

Wherein Sa, Sb and Sc respectively represent Su₀、Su₁And Su₂The attention coefficient of (c). Multiplying the attention coefficient and the depth feature to obtain an attention module, and adding the three attention modules pixel by pixel to obtain the output of the module, wherein the process is expressed as

AFSSC₁＝Sa·Su₀+Sb·Su₁+Sc·Su₂

Wherein, AFSSC₁Is MUe₀For reference, MUe will be calculated using deconvolution and convolution operations₁And MUe₂Becomes a sum of MUe₀Likewise, Su is then obtained by addition. And AFSSC₀Is MUe₁For reference, MUe will be calculated using deconvolution or pooling and convolution operations₀And MUe₂Becomes a sum MUe₁Likewise, then sum to obtain Su.

Seventhly, the asymmetric feature selection attention module, the space spectrum fusion network encoder and the depth extraction network decoder are utilizedAnd (5) degree characteristics, constructing a space spectrum fusion network decoder, and obtaining a fusion result by utilizing a Relu activation function. Using MUe pairs of convolutions, residual blocks and deconvolutions₂Depth features are extracted to obtain a first output of the decoder. Then the asymmetric feature is selected as the output AFSSC of the attention module_n-1Output MUd of the space-spectrum fusion network encoder_n-1And detail extraction network decoder output WD_n-1And splicing, then utilizing a 3 multiplied by 3 convolution and a residual block, and finally utilizing deconvolution to obtain the nth output of the space spectrum fusion module decoder. The process is represented as

Wherein, MUd_nRepresenting the decoder of the nth stage.

Output MUd of the decoder₂And detail extraction decoder WD₂Splicing in channel dimension, extracting features by utilizing 3 multiplied by 3 convolution, adding up-sampled HS images and the extracted features pixel by pixel, and finally obtaining a fused image by using Relu activation function on the addition result

The process is represented as

In an eighth step, the network is trained using the L1 loss function. In particular to

Wherein, the first and the second end of the pipe are connected with each other,

HR-HSI representing network reconstruction, and Z represents referenced HR-HSI.

The effect of the present invention can be further illustrated by the following simulation experiments:

simulation conditions

The simulation experiment adopts two groups of real hyperspectral data: CAVE dataset and Harvard dataset. The CAVE data set consists of 32 indoor hyperspectral images, the spatial size of the CAVE data set is 512 multiplied by 512, the wavelength range is 400nm-700nm, the spectral resolution is 10nm, and 31 wave bands are provided. The Harvard data set consists of 50 indoor and outdoor hyperspectral images, the spatial size of the Harvard data set is 1040 multiplied by 1392, the wavelength range is 420 nm and 720nm, the spectral resolution is 10nm, and 31 wave bands are adopted. In the experiment, in order to conveniently collect the training data block, the invention intercepts 1024X 1024 space size from the upper left of the image as a training and testing set.

The LR-HS image is implemented by using an 8 × 8 gaussian filter (mean 0, standard deviation dimension 2) and by down-sampling each band of the reference image by 8 pixels in the vertical and horizontal directions, i.e. a decimation factor of 8 × 8. And simulating an HR-MS image of the same scene, and downsampling the HR-HS image. The spectrally downsampled matrix R uses the response function of a Nikon D700 camera. For the CAVE dataset, the first 20 hyperspectral images were used as training in the experiment, and the next 12 hyperspectral images were used as testing. For the Harvard dataset, the first 30 hyperspectral images were used in the experiment as training and the next 20 hyperspectral images as testing. Since deep learning needs to use a large amount of data as a training set, the training hyperspectral images are blocked as the training set of deep learning. The size of each LR-HSI block is 4 × 4 × 31, the size of each HR-MSI block is 32 × 32 × 3, and each HR-HSI block is 32 × 32 × 31.

The simulation experiment was completed under windows10 operating system using Python-3.8-Pythroch-1.10. In addition, the activation functions of the experiment are unified to Relu, the ADAM optimizer is used for training the network, the learning rate is set to be 0.0001, the iteration times are fixed to be 1000, and the configuration of the hyper-parameters of other network structures is summarized in Table 1.

Analysis of simulation experiment results

Table 2 shows the super-resolution results of the method of the present invention and the comparative method thereof on the CAVE data set, and Table 3 shows the super-resolution results of the method of the present invention and the comparative method thereof on the Harvard data set.

TABLE 1 network hyper-parameter configuration

TABLE 2 super-resolution of CAVE data sets by the method of the present invention and the method of comparing the same

Method	mpsnr	rmse	ergas	sam	uiqi	mssim
							CNMF	34.3027	5.4723	2.6006	7.8920	0.7710	0.9388
HySure	34.7822	5.3232	2.4181	11.5451	0.8043	0.9107
							ICCV15	35.6888	4.7694	2.2064	7.8787	0.7986	0.9531
DHSIS	46.2977	1.4653	0.6641	3.8452	0.9242	0.9904
							DBIN	45.7831	1.5233	0.6781	3.6035	0.9260	0.9925
CNN-Fus	44.5789	1.8994	0.8689	5.4241	0.8735	0.9850
							MOG-DCN	46.3167	1.4217	0.6450	3.5967	0.9278	0.9924
DMW-UNet	47.0245	1.3564	0.6091	3.4703	0.9336	0.9927

TABLE 3 results of super-resolution of Harvard data sets by the method of the present invention and its comparative method

Table 2 shows the super-resolution results of the CAVE data sets by the method of the present invention and the comparative method. It can be seen from Table 2 that DMW-UNet is the largest in the indices mpsnr, uiqi and mssim, and the RMSE, ERGAS and SAM are the smallest. From fig. 3 it can be illustrated that the reconstruction error is minimal with the method of the invention. From fig. 4 and 5, it can be seen that the MPSNR and MSSIM values of the method of the present invention are highest at each band.

Table 3 shows the results of the inventive and comparative methods for super-resolution of Harvard data sets. It can be seen from Table 2 that DMW-UNet is greatest at the indices mpsnr, uiqi, and mssim, and that RMSE, ERGAS, and SAM are least. From fig. 6, it can be illustrated that the reconstruction error is minimal by the method of the present invention. From fig. 7 and 8, it can be seen that MPSNR and MSSIM values of the method of the present invention are highest in each band.

From the experimental results, the method disclosed by the invention has better space and spectrum reconstruction capability.

Claims

1. A hyperspectral super-resolution method using asymmetric attention and wavelet subband injection is characterized by comprising the following steps of:

fifthly, utilizing the hyperspectral image, the high-resolution multispectral image and the depth feature extracted by the detail extraction network encoder after the upsampling to construct a space-spectrum fusion network encoder;

sixthly, selecting the depth multi-resolution characteristics output by the space spectrum fusion network encoder by using an asymmetric characteristic selection attention module;

in an eighth step, the network is trained using the L1 loss function.

2. The hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection as claimed in claim 1 wherein in the first step, the multispectral multiresolution high frequency detail features are extracted using discrete wavelet transform as follows:

note the book

Representing a high resolution multispectral image, wherein H, W and b represent the number of high, wide, and band of the high resolution multispectral image, respectively; extracting multi-resolution high-frequency detail information of the multispectral image by utilizing Haar discrete wavelet transform, wherein a low-pass filter is represented by phi, and a high-pass filter bank is represented by psi; the low-frequency subband images ranked as low-pass filters at the d-th scale are represented as

The high-frequency subband images in three directions are represented as

and

representing the conjugate of phi and psi respectively,

c and separable filter representing d-th scale

and

respectively representing high-frequency sub-band images in the horizontal direction, the vertical direction and the diagonal direction under the d-th scale; the multispectral image is subjected to discrete wavelet transform to obtain a low-frequency sub-band image C₁And a high frequency subband image W₁ ¹、W₁ ²And W₁ ³And the three high-frequency sub-band images are spliced in the channel dimension to obtain a high-frequency image W₁Is represented by W₁＝Concat(W₁ ¹,W₁ ²,W₁ ³)

Wherein, Concat (.) represents the channel dimension splicing operation; for multispectral low-frequency sub-band image C₁Obtaining a low-frequency sub-band image C by using discrete wavelet transform₂And high frequency subband images

And

and the three high-frequency sub-band images are spliced in the channel dimension to obtain a high-frequency image W₂Is represented as

3. The hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection according to claim 1 is characterized in that in the second step, a detail extraction network encoder is constructed according to the high frequency image extracted by discrete wavelet transform and convolution operation, specifically as follows:

firstly, for the high-frequency image W₁The depth feature of the first output of the detail extraction encoder is obtained by means of a convolution of 3 x 3, which is denoted WE₁＝Conv_3×3(W₁)

Wherein, Conv_3×3(.) represents a convolution operation with a convolution kernel of 3 x 3; using a discrete wavelet transform instead of a downsampling operation; for WE₁Performing discrete wavelet transform to obtain depth detail information, and splicing in channel dimension to obtain W₃(ii) a Then the high frequency image W is processed₂And W₃Splicing in channel dimension and obtaining depth feature of the second output of detail extraction encoder by a 3 x 3 convolution, which is expressed as WE₂＝Conv_3×3(Concat(W₂,W₃))。

4. The hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection according to claim 1 is characterized in that in the third step, a detail extraction network decoder is constructed by depth features extracted by an encoder and deconvolution operation, specifically as follows:

the detail extraction network decoder aims at supplementing the information of the space spectrum joint network decoder, and the decoder consists of a deconvolution with the step length of 2 and a convolution with the step length of 3 multiplied by 3; depth characterization of the three outputs of the decoder, denoted as

Wherein Deconv (. lam.) denotes deconvolution, WD_nThe depth feature of the output of the network decoder is extracted to show the details of the nth stage.

5. The hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection as claimed in claim 1 wherein the fourth step is to pre-process the low resolution hyperspectral image by up-sampling the low resolution hyperspectral image to at most the same spatial size of the spectrum and transform the image into a transform process

Wherein, Up () represents spatial upsampling operation, the upsampling mode is bilinear interpolation, and the scale factor is 8.

6. The hyperspectral super-resolution method by using asymmetric attention and wavelet subband injection according to claim 1 is characterized in that the fifth step of constructing a spatial-spectral fusion network encoder by using the up-sampled hyperspectral image, the high-resolution multispectral image and the depth features extracted by the detail extraction network encoder specifically comprises: depth features are extracted using a residual block consisting of two 3X 3 convolutions and the Relu activation function, which is denoted as RB (X)_in)＝Conv_3×3(Relu(Conv_3×3(X_in)))+X_in

Wherein, X_inRepresenting input features, Relu (.) representing a ReLU function, RB (.) representing a residual block operation; the first output characteristic of the encoder is that the up-sampled HS image and the MS image are spliced in a channel dimension to be used as original input, then the spliced depth characteristic is learned by utilizing convolution of 3 multiplied by 3 and a residual block, and finally the obtained depth characteristic is obtained; splicing the maximum pooling feature and the depth detail feature output by the encoder in the order of n-1, and learning the spliced depth feature by using the convolution of 3 multiplied by 3 and the residual block to obtain the output feature of the encoder in the order of n; the process is represented as

Wherein, MUe_nRepresents the output characteristics of the encoder at stage n, RB (-) represents the residual operation, maxpool (-.) Representing maximum pooling operation.

7. The hyperspectral super-resolution method by using asymmetric attention and wavelet subband injection according to claim 1 is characterized in that the sixth step selects the features output by the space-spectrum fusion network encoder by using an asymmetric feature selection attention module;

for MUe₀An attention module is selected for the reference asymmetric feature by first deconvoluting and convolving the spatial size and the channel number of the input depth feature to MUe₀In the same way, the process is represented as

Then, the obtained three depth features are added element by element to obtain a depth feature Su, and the process is expressed as

Cs＝SAvgpool(Su)

Wherein, SAvgpool (-) represents a spatial dimension average pooling operation; and then, utilizing a two-layer multilayer perceptron network to carry out nonlinear feature transformation to construct correlation between feature graphs, wherein the process is expressed as

Cz＝fc(Relu(fc(Cs)))

Wherein fc (.) denotes a fully connected layer;

in order to obtain a spatial attention coefficient of the depth feature, firstly, performing average pooling and maximum pooling on Su channel dimensions to obtain two depth features with the same spatial dimension as the Su spatial dimension of the depth feature and channel dimension of 1; the two depth features are then stitched together in the channel dimension, which is represented as

Ss＝Concat(CAvgpool(Su),CMaxpool(Su))

Wherein, CAvgpool (or CAvgpool) represents average pooling operation of channel dimension, and CMaxpool (or CMaxpool) represents maximum pooling operation of channel dimension; then, using 7 × 7 convolution calculation on Ss, spatial attention coefficient is obtained, and this process is expressed as

Sz＝Conv_7×7(Ss)

Sc＝Cz*Sz

SC_i＝Conv_1×1(Sc),i∈{0,1,2}

Wherein, Conv_1×1Represents a 1 × 1 convolution; then using the softmax function for the three attention coefficients, we get Sa + Sb + Sc equal to 1, denoted as

Wherein Sa, Sb and Sc respectively represent Su₀、Su₁And Su₂The attention coefficient of (1); multiplying the attention coefficient and the depth feature to obtain an attention module, and adding the three attention modules pixel by pixel to obtain the output of the module, wherein the process is expressed as

AFSSC₁＝Sa·Su₀+Sb·Su₁+Sc·Su₂

Wherein, AFSSC₁Is MUe₀For reference, MUe will be calculated using deconvolution and convolution operations₁And MUe₂Becomes a sum MUe₀Likewise, then add to obtain Su; and AFSSC₀Is MUe₁For reference, MUe will be calculated using deconvolution or pooling and convolution operations₀And MUe₂Becomes a sum of MUe₁Likewise, then sum to obtain Su.

8. The hyperspectral super-resolution method by using asymmetric attention and wavelet sub-band injection according to claim 1 is characterized in that the seventh step is to select the depth features of the attention module, the space spectrum fusion network encoder and the detail extraction network decoder by using asymmetric features, construct the space spectrum fusion network decoder, and obtain the fusion result by using Relu activation function; using pairs of convolutions, residual blocks and deconvolutions MUe₂Extracting depth features to obtain a first output of a decoder; then the asymmetric characteristic is selected as the output AFSSC of the attention module_n-1Output MUd of the space-spectrum fusion network encoder_n-1And detail extraction network decoder output WD_n-1Splicing, then utilizing a 3 multiplied by 3 convolution and a residual block, and finally utilizing deconvolution to obtain the nth output of a space spectrum fusion module decoder; the process is represented as

Wherein, MUd_nA decoder representing an nth stage; the decoder output MUd is then combined₂And detail extraction decoder WD₂Splicing in channel dimension, extracting features by using 3 multiplied by 3 convolution, adding up-sampled HS image and extracted features pixel by pixel, and finally obtaining a fused image by using Relu activation function on the addition result

The process is represented as

9. The hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection according to claim 1 wherein in the eighth step the network is trained using an L1 loss function, in particular

Wherein the content of the first and second substances,

HR-HSI representing network reconstruction, and Z represents HR-HSI of reference.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements a hyper-spectral super resolution method with asymmetric attention and wavelet subband injection as claimed in any of the claims 1-9.