CN114782246A - Hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection - Google Patents

Hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection Download PDF

Info

Publication number
CN114782246A
CN114782246A CN202210273806.2A CN202210273806A CN114782246A CN 114782246 A CN114782246 A CN 114782246A CN 202210273806 A CN202210273806 A CN 202210273806A CN 114782246 A CN114782246 A CN 114782246A
Authority
CN
China
Prior art keywords
image
attention
depth
asymmetric
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210273806.2A
Other languages
Chinese (zh)
Inventor
肖亮
方健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202210273806.2A priority Critical patent/CN114782246A/en
Publication of CN114782246A publication Critical patent/CN114782246A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a hyperspectral super-resolution method by using asymmetric attention and wavelet sub-band injection, which mainly comprises the following steps: extracting multispectral multiresolution high-frequency detail characteristics by utilizing discrete wavelet transform; constructing a detail extraction network encoder; constructing a detail extraction network decoder; preprocessing the low-resolution hyperspectral image; constructing a space-spectrum fusion network encoder; constructing an asymmetric feature selection attention module; constructing a space-spectrum fusion network encoder; the network was trained using the L1 loss function. According to the invention, multispectral depth multiresolution details are extracted through a wavelet network and injected into U-Net, so that the learned hyperspectral image has better empty spectrum detail information.

Description

Hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection
Technical Field
The invention relates to a hyperspectral super-resolution method, in particular to a hyperspectral super-resolution method by using asymmetric attention and wavelet subband injection.
Background
The hyperspectral image has dozens to hundreds of wave bands, and compared with the multispectral image, the ground object can be clearly distinguished. However, in reality, it is difficult to obtain HS images with high spatial resolution due to hardware or budget constraints. Therefore, it is now common to fuse the LR-HS image and the HR-MS image to obtain the HR-HS image.
With the development of deep learning, more and more deep learning models are applied in hyperspectral super-resolution, such as space spectrum fusion network (SSF), deep Two-branch convolutional neural network (Two-CNN), Deep Recursive Network (DRN), progressive zero-center residual error network (PZRes-Net), and deep space spectrum attention convolutional neural network (DSSA-CNN). Some methods utilize deep learning network learning degradation models to reconstruct HS images, such as deep hyperspectral shaping (dhsis) and deep blind source hyperspectral fusion (DBIN), in order to increase the interpretability of deep learning, researchers propose deep deployment networks to solve the fusion HS problem, such as multispectral hyperspectral fusion networks (MHF-Net), fusion variational networks (VaFuNet), and model-dominated hyperspectral image super-resolution (MOG-DCN). These deep learning methods only use the concatenation of HS images and MS images along the spectral channel as the input to the network, and do not fully consider the potential multi-scale spatial information.
Researchers have proposed fusing HS images and MS images at different scales. For example, Zhou et al [ Zhou, Feng, et al. "Pyramid full fusion volumetric network for hyperspectral and multispectral image fusion." IEEE Journal of Selected Topics in Applied Earth updates and Remote Sensing 12.5(2019): 1549-. The network mainly consists of two sub-networks, the first sub-network is used for extracting spectral information in an LR-HS image through a convolution kernel and coding the spectral information into depth features, and the second sub-network is used for integrating an HR-MS image pyramid with the coded depth features to obtain an HR-HS image. Xu et al [ Xu, Shuang, et al, "HAM-MFN: Hyperspectral and Multispectral image multiscale fusion network with RAP loss." IEEE Transactions on Geoscience and Remote Sensing 58.7(2020): 4618-. The network firstly gradually amplifies the depth characteristics of an LR-HS image by utilizing deconvolution, then fuses the depth characteristics of the LR-HS image and the MS image on different scales, and finally obtains an HR-HS image. But this structure ignores the basic and shallow features of MSI. Thus, Xiao et al [ Xiao J, Li J, Yuan Q, et al, "A Dual-UNet with Multistage Details Injection for Hyperspectral Image Fusion," IEEE Transactions on Geoscience and Remote Sensing,2021 ] propose a Dual U-Net Fusion method that first extracts multispectral spatial features of different scales using an encoder-decoder network, and then injects these scale features into the U-Net network to reconstruct HS images. However, these networks simply use convolution to extract MS image multi-scale depth features and inject HS images to enhance resolution, and the fusion purpose should be to use spatial detail features of MS to inject HS images. Simply extracting the multi-scale features of the MS by convolution does not allow targeted acquisition of detailed features.
Disclosure of Invention
The invention aims to provide a hyperspectral super-resolution method by utilizing asymmetric attention and wavelet sub-band injection.
The technical solution for realizing the invention is as follows: a hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection comprises the following steps:
firstly, extracting multispectral multiresolution high-frequency detail characteristics by utilizing discrete wavelet transform;
secondly, constructing a detail extraction network encoder according to the high-frequency details extracted by the discrete wavelet transform and convolution operation;
thirdly, constructing a detail extraction network decoder through the depth characteristics and deconvolution operation of the detail extraction encoder;
fourthly, preprocessing the low-resolution hyperspectral image, namely, up-sampling the low-resolution hyperspectral image to the same space size of the hyperspectral image;
fifthly, utilizing the hyperspectral image, the high-resolution multispectral image and the depth feature extracted by the detail extraction network encoder after upsampling to construct a space-spectrum fusion network encoder;
sixthly, selecting the depth multi-resolution features output by the space spectrum fusion network encoder by using an asymmetric feature selection attention module;
seventhly, selecting depth characteristics output by an attention module, a space spectrum fusion network encoder and a detail extraction network decoder by using the asymmetric characteristics, constructing a space spectrum fusion network decoder, and obtaining a fusion result by using a Relu activation function;
in an eighth step, the network is trained using the L1 loss function.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the above-described hyper-spectral super resolution method with asymmetric attention and wavelet subband injection.
The method of the invention has simple structure, and compared with the prior art, the method has the remarkable characteristics that: (1) firstly, extracting multi-resolution wavelet detail features by utilizing discrete wavelet transform, then converting the detail features into depth detail features by using an encoder-decoder, and injecting the depth detail features into an HS image to enhance the resolution; (2) selecting different resolution characteristics by using an asymmetric characteristic selection attention module; (3) the network has simple structure and low computation complexity.
The present invention is described in further detail below with reference to the attached drawing figures.
Drawings
FIG. 1 is a block diagram of the process of the present invention.
FIG. 2 is a block diagram of an asymmetric feature selection attention module.
FIG. 3 is a pseudo-color composite and corresponding mean square error images ( bands 30, 20, 10) of the fusion results of the method of the present invention on real and fake applets data. (a) CNMF, (b) HySure, (c) iccv, (d) DHSIS, (e) DBIN, (f) CNN-Fus, (g) MOG-dcn, (h) DMW-UNet.
FIG. 4 is a comparison of MPSNR curves of various bands of fusion results of different methods on real and fake applets data by the method of the present invention.
FIG. 5 is a MSSIM curve comparison of each band of fusion results of different methods on real and fake applets data by the method of the present invention.
FIG. 6 is a pseudo-color composite image and corresponding mean square error image ( bands 30, 20, 10) of the fusion results of the different methods of the present invention on img1 data. (a) CNMF. (b) HySure. (c) ICCV. (d) dhsis. (e) DBIN. (f) CNN-Fus. (g) MOG-DCN. (h) DMW-UNet.
FIG. 7 is a comparison of MPSNR curves of various bands of fusion results of different methods on img1 data by the method of the present invention.
Fig. 8 is a MSSIM curve comparison of various bands of fusion results of different methods on img1 data by the method of the present invention.
Detailed Description
Aiming at the problems in the prior art, the method firstly utilizes discrete wavelet transform to extract the multi-scale detail features of the MS, then utilizes convolution to learn the detail features to obtain the multi-scale depth detail features, and finally injects the features into U-Net to enhance the resolution of the HS image. The method has the capability of extracting the space-spectrum characteristics of the hyperspectral image in the Euclidean space and the topological space at the same time, and has excellent performance when being applied to the super-resolution of the hyperspectral image. The experimental results show that the average peak signal-to-noise ratio (MPSNR) on the CAVE dataset was 47.0245 and the average peak signal-to-noise ratio (MPSNR) on the Harvard dataset was 45.8118. Experimental results show that the network structure provided by the invention can well recover the spatial and spectral information of the HS image.
The implementation process of the invention is described in detail with reference to fig. 1, and the steps of the method of the invention are as follows:
in the first step, multispectral high-frequency detail characteristics are extracted by utilizing discrete wavelet transform. Note book
Figure BDA0003555057460000041
Representing a high resolution multispectral image, wherein H, W and b represent the number of high, wide, and wavebands, respectively, of the high resolution multispectral image. Multi-resolution high-frequency detail information of a multispectral image is extracted by utilizing a Haar discrete wavelet transform (a filter bank is 'db 1'), wherein a low-pass filter is represented by phi, and a high-pass filter bank is represented by psi. The low-frequency subband images listed as low-pass filters on the d-th scale are represented as
Figure BDA0003555057460000042
The high-frequency subband images in three directions are represented as
Figure BDA0003555057460000043
Figure BDA0003555057460000044
Figure BDA0003555057460000045
Wherein, in the process,
Figure BDA0003555057460000046
and
Figure BDA0003555057460000047
representing the conjugate of phi and psi respectively,
Figure BDA0003555057460000048
c and separable filter representing d-th scale
Figure BDA0003555057460000049
Convolution of (C)d-1Representing the low-pass sub-band image at the d-1 scale,
Figure BDA00035550574600000410
and
Figure BDA00035550574600000411
respectively representing high-frequency subband images in the horizontal, vertical and diagonal directions at the d-th scale. The multispectral image is subjected to discrete wavelet transform to obtain a low-frequency sub-band image C1And a high frequency subband image W1 1、W1 2And W1 3And the three high-frequency sub-band images are spliced in the channel dimension to obtain a high-frequency image W1Is represented as
W1=Concat(W1 1,W1 2,W1 3)
Where Concat (.) represents the channel dimension splicing operation. Due to low frequency subband image C1Medium and high frequenciesInformation, so obtaining a low-frequency subband image C using a discrete wavelet transform on a multispectral low-frequency subband image2And a high frequency subband image W2 1、W2 2And W2 3And the three high-frequency sub-band images are spliced in the channel dimension to obtain a high-frequency image W2Is shown as
Figure BDA00035550574600000412
And secondly, constructing a detail extraction network encoder according to the high-frequency image extracted by the discrete wavelet transform and the convolution operation. Firstly, for the high-frequency image W1The depth feature of the first output of the detail extraction encoder is obtained by convolution with 3 x 3, which is expressed as
WE1=Conv3×3(W1)
Wherein, Conv3×3(.) represents a convolution operation with a convolution kernel of 3 x 3. Since the direct pooling operation results in a large loss of high frequency information, a discrete wavelet transform is used instead of the down-sampling operation. To WE1Obtaining depth detail information by discrete wavelet transform, and obtaining W by channel dimension splicing3. Then the high frequency image W is processed2And W3Splicing in channel dimension, and obtaining depth feature of the second output of the detail extraction encoder by a convolution of 3 × 3, which is expressed as
WE2=Conv3×3(Concat(W2,W3))
And thirdly, constructing a detail extraction network decoder through the depth features extracted by the encoder and deconvolution operation. The purpose of the detail extraction network decoder is to supplement the information of the space spectrum joint network decoder, and the decoder consists of deconvolution with the step size of 2 and a convolution of 3 x 3. Depth characterization of the three outputs of the decoder, denoted as
Figure BDA0003555057460000051
Wherein Deconv (. lam.) denotes deconvolution, WDnThe depth features of the output of the network decoder are extracted to show the details of the nth stage.
Fourthly, preprocessing the low-resolution hyperspectral image, namely sampling the low-resolution hyperspectral image to the space with the same size of the hyperspectral image at most, wherein the conversion process is
Figure BDA0003555057460000052
Wherein, Up (.) represents the spatial upsampling operation, the upsampling mode is bilinear interpolation, and the scale factor is 8.
And fifthly, utilizing the up-sampled hyperspectral image, high-resolution multispectral image and depth features extracted by the detail extraction network encoder to construct a space-spectrum fusion network encoder. This section uses the residual block to extract depth features, so the residual block will be briefly described below. The residual block consists of two 3 x 3 convolutions and the Relu activation function. The process is represented as
RB(Xin)=Conv3×3(Relu(Conv3×3(Xin)))+Xin
Wherein XinRepresenting the input characteristics, Relu (.) representing the Relu function, and RB (.) representing the residual block operation. The first output characteristic of the encoder is the depth characteristic obtained by splicing an up-sampled HS image and an MS image in a channel dimension as original input, then learning the spliced depth characteristic by using a convolution of 3 multiplied by 3 and a residual block. And then splicing the maximum pooling feature and the depth detail feature output by the encoder in the order of n-1, and learning the spliced depth feature by using the convolution of 3 multiplied by 3 and the residual block to obtain the output feature of the encoder in the order of n. The process is represented as
Figure BDA0003555057460000061
Wherein, MUenRepresents the output characteristics of the nth stage encoder, RB (-) represents the residual operation, maxpool (-) represents the maximum pooling operationDo this.
Sixthly, as shown in fig. 2, selecting the features output by the space-spectrum fusion network encoder by using an asymmetric feature selection attention module. Below is MUe0An example of selecting an attention module for a baseline asymmetric feature is presented. First, the space size and the channel number size of the input depth feature are changed to MUe using deconvolution and convolution0Again, the process is represented as
Figure BDA0003555057460000062
Then, the three obtained depth features are added element by element to obtain a depth feature Su, and the process is expressed as
Figure BDA0003555057460000063
The spatial and channel attention mechanisms will be used in combination to select important features for the multi-resolution features as follows.
In order to obtain the channel attention coefficient of the depth feature Su, the global receptive field of the Su is firstly extracted by using a global average pooling operation, each feature channel is abstracted into a feature point, and the process is expressed as the process of
Cs=SAvgpool(Su)
Wherein SAvgpool (-) represents a spatial dimension averaging pooling operation. Then, a multilayer perceptron network of two layers is utilized to carry out nonlinear feature transformation to construct correlation between feature graphs, and the process is expressed as
Cz=fc(Relu(fc(Cs)))
Where fc (.) denotes the fully connected layer.
In order to obtain the spatial attention coefficient of the depth feature, firstly, performing average pooling and maximum pooling on Su channel dimensions to obtain two depth features with the same spatial dimension as the depth feature Su and channel dimension of 1. The two depth features are then stitched together in the channel dimension, which is represented as
Ss=Concat(CAvgpool(Su),CMaxpool(Su))
Wherein, CAvgpool (means) represents the average pooling operation of the channel dimension, and CMaxpool (means) represents the maximum pooling operation of the channel dimension. Then, 7 × 7 convolution calculation is performed on Ss to obtain spatial attention coefficient, which is expressed as the process
Sz=Conv7×7(Ss)
Multiplying the obtained space and channel attention coefficients to obtain a space spectrum attention coefficient expressed as
Sc=Cz*Sz
Three spatial spectral attention coefficients are obtained using three 1 x 1 convolutions, which is represented as
SCi=Conv1×1(Sc),i∈{0,1,2}
Wherein, Conv1×1Representing a 1 × 1 convolution. Then using the softmax function for the three attention coefficients, we get Sa + Sb + Sc to 1, expressed as
Figure BDA0003555057460000071
Wherein Sa, Sb and Sc respectively represent Su0、Su1And Su2The attention coefficient of (c). Multiplying the attention coefficient and the depth feature to obtain an attention module, and adding the three attention modules pixel by pixel to obtain the output of the module, wherein the process is expressed as
AFSSC1=Sa·Su0+Sb·Su1+Sc·Su2
Wherein, AFSSC1Is MUe0For reference, MUe will be calculated using deconvolution and convolution operations1And MUe2Becomes a sum of MUe0Likewise, Su is then obtained by addition. And AFSSC0Is MUe1For reference, MUe will be calculated using deconvolution or pooling and convolution operations0And MUe2Becomes a sum MUe1Likewise, then sum to obtain Su.
Seventhly, the asymmetric feature selection attention module, the space spectrum fusion network encoder and the depth extraction network decoder are utilizedAnd (5) degree characteristics, constructing a space spectrum fusion network decoder, and obtaining a fusion result by utilizing a Relu activation function. Using MUe pairs of convolutions, residual blocks and deconvolutions2Depth features are extracted to obtain a first output of the decoder. Then the asymmetric feature is selected as the output AFSSC of the attention modulen-1Output MUd of the space-spectrum fusion network encodern-1And detail extraction network decoder output WDn-1And splicing, then utilizing a 3 multiplied by 3 convolution and a residual block, and finally utilizing deconvolution to obtain the nth output of the space spectrum fusion module decoder. The process is represented as
Figure BDA0003555057460000072
Wherein, MUdnRepresenting the decoder of the nth stage.
Output MUd of the decoder2And detail extraction decoder WD2Splicing in channel dimension, extracting features by utilizing 3 multiplied by 3 convolution, adding up-sampled HS images and the extracted features pixel by pixel, and finally obtaining a fused image by using Relu activation function on the addition result
Figure BDA0003555057460000073
The process is represented as
Figure BDA0003555057460000074
In an eighth step, the network is trained using the L1 loss function. In particular to
Figure BDA0003555057460000081
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003555057460000082
HR-HSI representing network reconstruction, and Z represents referenced HR-HSI.
The effect of the present invention can be further illustrated by the following simulation experiments:
simulation conditions
The simulation experiment adopts two groups of real hyperspectral data: CAVE dataset and Harvard dataset. The CAVE data set consists of 32 indoor hyperspectral images, the spatial size of the CAVE data set is 512 multiplied by 512, the wavelength range is 400nm-700nm, the spectral resolution is 10nm, and 31 wave bands are provided. The Harvard data set consists of 50 indoor and outdoor hyperspectral images, the spatial size of the Harvard data set is 1040 multiplied by 1392, the wavelength range is 420 nm and 720nm, the spectral resolution is 10nm, and 31 wave bands are adopted. In the experiment, in order to conveniently collect the training data block, the invention intercepts 1024X 1024 space size from the upper left of the image as a training and testing set.
The LR-HS image is implemented by using an 8 × 8 gaussian filter (mean 0, standard deviation dimension 2) and by down-sampling each band of the reference image by 8 pixels in the vertical and horizontal directions, i.e. a decimation factor of 8 × 8. And simulating an HR-MS image of the same scene, and downsampling the HR-HS image. The spectrally downsampled matrix R uses the response function of a Nikon D700 camera. For the CAVE dataset, the first 20 hyperspectral images were used as training in the experiment, and the next 12 hyperspectral images were used as testing. For the Harvard dataset, the first 30 hyperspectral images were used in the experiment as training and the next 20 hyperspectral images as testing. Since deep learning needs to use a large amount of data as a training set, the training hyperspectral images are blocked as the training set of deep learning. The size of each LR-HSI block is 4 × 4 × 31, the size of each HR-MSI block is 32 × 32 × 3, and each HR-HSI block is 32 × 32 × 31.
The simulation experiment was completed under windows10 operating system using Python-3.8-Pythroch-1.10. In addition, the activation functions of the experiment are unified to Relu, the ADAM optimizer is used for training the network, the learning rate is set to be 0.0001, the iteration times are fixed to be 1000, and the configuration of the hyper-parameters of other network structures is summarized in Table 1.
Analysis of simulation experiment results
Table 2 shows the super-resolution results of the method of the present invention and the comparative method thereof on the CAVE data set, and Table 3 shows the super-resolution results of the method of the present invention and the comparative method thereof on the Harvard data set.
TABLE 1 network hyper-parameter configuration
Figure BDA0003555057460000083
Figure BDA0003555057460000091
TABLE 2 super-resolution of CAVE data sets by the method of the present invention and the method of comparing the same
Method mpsnr rmse ergas sam uiqi mssim
CNMF 34.3027 5.4723 2.6006 7.8920 0.7710 0.9388
HySure 34.7822 5.3232 2.4181 11.5451 0.8043 0.9107
ICCV15 35.6888 4.7694 2.2064 7.8787 0.7986 0.9531
DHSIS 46.2977 1.4653 0.6641 3.8452 0.9242 0.9904
DBIN 45.7831 1.5233 0.6781 3.6035 0.9260 0.9925
CNN-Fus 44.5789 1.8994 0.8689 5.4241 0.8735 0.9850
MOG-DCN 46.3167 1.4217 0.6450 3.5967 0.9278 0.9924
DMW-UNet 47.0245 1.3564 0.6091 3.4703 0.9336 0.9927
TABLE 3 results of super-resolution of Harvard data sets by the method of the present invention and its comparative method
Figure BDA0003555057460000092
Figure BDA0003555057460000101
Table 2 shows the super-resolution results of the CAVE data sets by the method of the present invention and the comparative method. It can be seen from Table 2 that DMW-UNet is the largest in the indices mpsnr, uiqi and mssim, and the RMSE, ERGAS and SAM are the smallest. From fig. 3 it can be illustrated that the reconstruction error is minimal with the method of the invention. From fig. 4 and 5, it can be seen that the MPSNR and MSSIM values of the method of the present invention are highest at each band.
Table 3 shows the results of the inventive and comparative methods for super-resolution of Harvard data sets. It can be seen from Table 2 that DMW-UNet is greatest at the indices mpsnr, uiqi, and mssim, and that RMSE, ERGAS, and SAM are least. From fig. 6, it can be illustrated that the reconstruction error is minimal by the method of the present invention. From fig. 7 and 8, it can be seen that MPSNR and MSSIM values of the method of the present invention are highest in each band.
From the experimental results, the method disclosed by the invention has better space and spectrum reconstruction capability.

Claims (10)

1. A hyperspectral super-resolution method using asymmetric attention and wavelet subband injection is characterized by comprising the following steps of:
firstly, extracting multispectral multiresolution high-frequency detail characteristics by utilizing discrete wavelet transform;
secondly, constructing a detail extraction network encoder according to the high-frequency details extracted by the discrete wavelet transform and convolution operation;
thirdly, constructing a detail extraction network decoder through the depth characteristics and deconvolution operation of the detail extraction encoder;
fourthly, preprocessing the low-resolution hyperspectral image, namely, up-sampling the low-resolution hyperspectral image to the same space size of the hyperspectral image;
fifthly, utilizing the hyperspectral image, the high-resolution multispectral image and the depth feature extracted by the detail extraction network encoder after the upsampling to construct a space-spectrum fusion network encoder;
sixthly, selecting the depth multi-resolution characteristics output by the space spectrum fusion network encoder by using an asymmetric characteristic selection attention module;
seventhly, selecting depth characteristics output by an attention module, a space spectrum fusion network encoder and a detail extraction network decoder by using the asymmetric characteristics, constructing a space spectrum fusion network decoder, and obtaining a fusion result by using a Relu activation function;
in an eighth step, the network is trained using the L1 loss function.
2. The hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection as claimed in claim 1 wherein in the first step, the multispectral multiresolution high frequency detail features are extracted using discrete wavelet transform as follows:
note the book
Figure FDA0003555057450000011
Representing a high resolution multispectral image, wherein H, W and b represent the number of high, wide, and band of the high resolution multispectral image, respectively; extracting multi-resolution high-frequency detail information of the multispectral image by utilizing Haar discrete wavelet transform, wherein a low-pass filter is represented by phi, and a high-pass filter bank is represented by psi; the low-frequency subband images ranked as low-pass filters at the d-th scale are represented as
Figure FDA0003555057450000012
The high-frequency subband images in three directions are represented as
Figure FDA0003555057450000013
Figure FDA0003555057450000014
Figure FDA0003555057450000015
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003555057450000016
and
Figure FDA0003555057450000017
representing the conjugate of phi and psi respectively,
Figure FDA0003555057450000018
c and separable filter representing d-th scale
Figure FDA0003555057450000021
Convolution of (C)d-1Representing the low-pass sub-band image at the d-1 scale,
Figure FDA0003555057450000022
and
Figure FDA0003555057450000023
respectively representing high-frequency sub-band images in the horizontal direction, the vertical direction and the diagonal direction under the d-th scale; the multispectral image is subjected to discrete wavelet transform to obtain a low-frequency sub-band image C1And a high frequency subband image W1 1、W1 2And W1 3And the three high-frequency sub-band images are spliced in the channel dimension to obtain a high-frequency image W1Is represented by W1=Concat(W1 1,W1 2,W1 3)
Wherein, Concat (.) represents the channel dimension splicing operation; for multispectral low-frequency sub-band image C1Obtaining a low-frequency sub-band image C by using discrete wavelet transform2And high frequency subband images
Figure FDA0003555057450000024
And
Figure FDA0003555057450000025
and the three high-frequency sub-band images are spliced in the channel dimension to obtain a high-frequency image W2Is represented as
Figure FDA0003555057450000026
3. The hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection according to claim 1 is characterized in that in the second step, a detail extraction network encoder is constructed according to the high frequency image extracted by discrete wavelet transform and convolution operation, specifically as follows:
firstly, for the high-frequency image W1The depth feature of the first output of the detail extraction encoder is obtained by means of a convolution of 3 x 3, which is denoted WE1=Conv3×3(W1)
Wherein, Conv3×3(.) represents a convolution operation with a convolution kernel of 3 x 3; using a discrete wavelet transform instead of a downsampling operation; for WE1Performing discrete wavelet transform to obtain depth detail information, and splicing in channel dimension to obtain W3(ii) a Then the high frequency image W is processed2And W3Splicing in channel dimension and obtaining depth feature of the second output of detail extraction encoder by a 3 x 3 convolution, which is expressed as WE2=Conv3×3(Concat(W2,W3))。
4. The hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection according to claim 1 is characterized in that in the third step, a detail extraction network decoder is constructed by depth features extracted by an encoder and deconvolution operation, specifically as follows:
the detail extraction network decoder aims at supplementing the information of the space spectrum joint network decoder, and the decoder consists of a deconvolution with the step length of 2 and a convolution with the step length of 3 multiplied by 3; depth characterization of the three outputs of the decoder, denoted as
Figure FDA0003555057450000027
Wherein Deconv (. lam.) denotes deconvolution, WDnThe depth feature of the output of the network decoder is extracted to show the details of the nth stage.
5. The hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection as claimed in claim 1 wherein the fourth step is to pre-process the low resolution hyperspectral image by up-sampling the low resolution hyperspectral image to at most the same spatial size of the spectrum and transform the image into a transform process
Figure FDA0003555057450000031
Wherein, Up () represents spatial upsampling operation, the upsampling mode is bilinear interpolation, and the scale factor is 8.
6. The hyperspectral super-resolution method by using asymmetric attention and wavelet subband injection according to claim 1 is characterized in that the fifth step of constructing a spatial-spectral fusion network encoder by using the up-sampled hyperspectral image, the high-resolution multispectral image and the depth features extracted by the detail extraction network encoder specifically comprises: depth features are extracted using a residual block consisting of two 3X 3 convolutions and the Relu activation function, which is denoted as RB (X)in)=Conv3×3(Relu(Conv3×3(Xin)))+Xin
Wherein, XinRepresenting input features, Relu (.) representing a ReLU function, RB (.) representing a residual block operation; the first output characteristic of the encoder is that the up-sampled HS image and the MS image are spliced in a channel dimension to be used as original input, then the spliced depth characteristic is learned by utilizing convolution of 3 multiplied by 3 and a residual block, and finally the obtained depth characteristic is obtained; splicing the maximum pooling feature and the depth detail feature output by the encoder in the order of n-1, and learning the spliced depth feature by using the convolution of 3 multiplied by 3 and the residual block to obtain the output feature of the encoder in the order of n; the process is represented as
Figure FDA0003555057450000032
Wherein, MUenRepresents the output characteristics of the encoder at stage n, RB (-) represents the residual operation, maxpool (-.) Representing maximum pooling operation.
7. The hyperspectral super-resolution method by using asymmetric attention and wavelet subband injection according to claim 1 is characterized in that the sixth step selects the features output by the space-spectrum fusion network encoder by using an asymmetric feature selection attention module;
for MUe0An attention module is selected for the reference asymmetric feature by first deconvoluting and convolving the spatial size and the channel number of the input depth feature to MUe0In the same way, the process is represented as
Figure FDA0003555057450000041
Then, the obtained three depth features are added element by element to obtain a depth feature Su, and the process is expressed as
Figure FDA0003555057450000042
In order to obtain the channel attention coefficient of the depth feature Su, the global receptive field of the Su is firstly extracted by using a global average pooling operation, each feature channel is abstracted into a feature point, and the process is expressed as the process of
Cs=SAvgpool(Su)
Wherein, SAvgpool (-) represents a spatial dimension average pooling operation; and then, utilizing a two-layer multilayer perceptron network to carry out nonlinear feature transformation to construct correlation between feature graphs, wherein the process is expressed as
Cz=fc(Relu(fc(Cs)))
Wherein fc (.) denotes a fully connected layer;
in order to obtain a spatial attention coefficient of the depth feature, firstly, performing average pooling and maximum pooling on Su channel dimensions to obtain two depth features with the same spatial dimension as the Su spatial dimension of the depth feature and channel dimension of 1; the two depth features are then stitched together in the channel dimension, which is represented as
Ss=Concat(CAvgpool(Su),CMaxpool(Su))
Wherein, CAvgpool (or CAvgpool) represents average pooling operation of channel dimension, and CMaxpool (or CMaxpool) represents maximum pooling operation of channel dimension; then, using 7 × 7 convolution calculation on Ss, spatial attention coefficient is obtained, and this process is expressed as
Sz=Conv7×7(Ss)
Multiplying the obtained space and channel attention coefficients to obtain a space spectrum attention coefficient expressed as
Sc=Cz*Sz
Three spatial spectral attention coefficients are obtained using three 1 x 1 convolutions, which is represented as
SCi=Conv1×1(Sc),i∈{0,1,2}
Wherein, Conv1×1Represents a 1 × 1 convolution; then using the softmax function for the three attention coefficients, we get Sa + Sb + Sc equal to 1, denoted as
Figure FDA0003555057450000043
Wherein Sa, Sb and Sc respectively represent Su0、Su1And Su2The attention coefficient of (1); multiplying the attention coefficient and the depth feature to obtain an attention module, and adding the three attention modules pixel by pixel to obtain the output of the module, wherein the process is expressed as
AFSSC1=Sa·Su0+Sb·Su1+Sc·Su2
Wherein, AFSSC1Is MUe0For reference, MUe will be calculated using deconvolution and convolution operations1And MUe2Becomes a sum MUe0Likewise, then add to obtain Su; and AFSSC0Is MUe1For reference, MUe will be calculated using deconvolution or pooling and convolution operations0And MUe2Becomes a sum of MUe1Likewise, then sum to obtain Su.
8. The hyperspectral super-resolution method by using asymmetric attention and wavelet sub-band injection according to claim 1 is characterized in that the seventh step is to select the depth features of the attention module, the space spectrum fusion network encoder and the detail extraction network decoder by using asymmetric features, construct the space spectrum fusion network decoder, and obtain the fusion result by using Relu activation function; using pairs of convolutions, residual blocks and deconvolutions MUe2Extracting depth features to obtain a first output of a decoder; then the asymmetric characteristic is selected as the output AFSSC of the attention modulen-1Output MUd of the space-spectrum fusion network encodern-1And detail extraction network decoder output WDn-1Splicing, then utilizing a 3 multiplied by 3 convolution and a residual block, and finally utilizing deconvolution to obtain the nth output of a space spectrum fusion module decoder; the process is represented as
Figure FDA0003555057450000051
Wherein, MUdnA decoder representing an nth stage; the decoder output MUd is then combined2And detail extraction decoder WD2Splicing in channel dimension, extracting features by using 3 multiplied by 3 convolution, adding up-sampled HS image and extracted features pixel by pixel, and finally obtaining a fused image by using Relu activation function on the addition result
Figure FDA0003555057450000052
The process is represented as
Figure FDA0003555057450000053
9. The hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection according to claim 1 wherein in the eighth step the network is trained using an L1 loss function, in particular
Figure FDA0003555057450000054
Wherein the content of the first and second substances,
Figure FDA0003555057450000055
HR-HSI representing network reconstruction, and Z represents HR-HSI of reference.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements a hyper-spectral super resolution method with asymmetric attention and wavelet subband injection as claimed in any of the claims 1-9.
CN202210273806.2A 2022-03-19 2022-03-19 Hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection Pending CN114782246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210273806.2A CN114782246A (en) 2022-03-19 2022-03-19 Hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210273806.2A CN114782246A (en) 2022-03-19 2022-03-19 Hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection

Publications (1)

Publication Number Publication Date
CN114782246A true CN114782246A (en) 2022-07-22

Family

ID=82426152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210273806.2A Pending CN114782246A (en) 2022-03-19 2022-03-19 Hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection

Country Status (1)

Country Link
CN (1) CN114782246A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726916A (en) * 2024-02-18 2024-03-19 电子科技大学 Implicit fusion method for enhancing image resolution fusion

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726916A (en) * 2024-02-18 2024-03-19 电子科技大学 Implicit fusion method for enhancing image resolution fusion
CN117726916B (en) * 2024-02-18 2024-04-19 电子科技大学 Implicit fusion method for enhancing image resolution fusion

Similar Documents

Publication Publication Date Title
CN111047515B (en) Attention mechanism-based cavity convolutional neural network image super-resolution reconstruction method
Xu et al. Deep gradient projection networks for pan-sharpening
CN110660038B (en) Multispectral image and full-color image fusion method based on generation countermeasure network
CN114119444B (en) Multi-source remote sensing image fusion method based on deep neural network
CN109272452B (en) Method for learning super-resolution network based on group structure sub-band in wavelet domain
Huang et al. Deep hyperspectral image fusion network with iterative spatio-spectral regularization
CN106952228A (en) The super resolution ratio reconstruction method of single image based on the non local self-similarity of image
CN113763299B (en) Panchromatic and multispectral image fusion method and device and application thereof
CN106920214B (en) Super-resolution reconstruction method for space target image
CN109447898B (en) Hyperspectral super-resolution calculation imaging system based on compressed sensing
Moustafa et al. Satellite imagery super-resolution using squeeze-and-excitation-based GAN
CN115272078A (en) Hyperspectral image super-resolution reconstruction method based on multi-scale space-spectrum feature learning
CN114998167B (en) High-spectrum and multi-spectrum image fusion method based on space-spectrum combined low rank
Li et al. HyperNet: A deep network for hyperspectral, multispectral, and panchromatic image fusion
CN104899835A (en) Super-resolution processing method for image based on blind fuzzy estimation and anchoring space mapping
CN117252761A (en) Cross-sensor remote sensing image super-resolution enhancement method
Zhai et al. An effective deep network using target vector update modules for image restoration
CN114782246A (en) Hyperspectral super-resolution method using asymmetric attention and wavelet sub-band injection
Ren et al. Remote sensing image recovery via enhanced residual learning and dual-luminance scheme
Deshpande et al. SURVEY OF SUPER RESOLUTION TECHNIQUES.
Qu et al. An interpretable unsupervised unrolling network for hyperspectral pansharpening
Fang et al. A multiresolution details enhanced attentive dual-UNet for hyperspectral and multispectral image fusion
Deng et al. Multiple frame splicing and degradation learning for hyperspectral imagery super-resolution
CN116188272B (en) Two-stage depth network image super-resolution reconstruction method suitable for multiple fuzzy cores
CN115861749A (en) Remote sensing image fusion method based on window cross attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination