CN116309070A

CN116309070A - Super-resolution reconstruction method and device for hyperspectral remote sensing image and computer equipment

Info

Publication number: CN116309070A
Application number: CN202310298937.0A
Authority: CN
Inventors: 张倩; 左世祥; 张晓锋; 李志军; 金箑; 余娟; 唐冬梅; 王炳乾; 陈建华
Original assignee: Aba Natural Resources And Science And Technology Information Research Institute
Current assignee: Aba Natural Resources And Science And Technology Information Research Institute
Priority date: 2023-03-24
Filing date: 2023-03-24
Publication date: 2023-06-23

Abstract

The invention discloses a hyperspectral remote sensing image super-resolution reconstruction method, a hyperspectral remote sensing image super-resolution reconstruction device and computer equipment, and relates to the technical field of image super-resolution reconstruction. The method comprises the steps of firstly taking a plurality of low-resolution images as input items, taking a plurality of high-resolution images as output items, importing a spectrum-space transform model comprising a linear embedding layer, K residual error blocks, a merging layer, an up-sampling layer, a global jump connection and a reverse embedding layer which are sequentially connected to perform training to obtain an image super-resolution reconstruction model, inputting the high-resolution hyperspectral remote sensing image to be reconstructed and with low resolution into the image super-resolution reconstruction model, outputting the high-resolution hyperspectral remote sensing image, respectively extracting spatial features and spectral features of the input images, integrating the spatial features and the spectral features and using the spatial features and the temporal features for super-resolution reconstruction of the hyperspectral remote sensing image, avoiding dependence on CNN, and finding that the super-resolution effect is greatly improved through experimental comparison.

Description

Super-resolution reconstruction method and device for hyperspectral remote sensing image and computer equipment

Technical Field

The invention belongs to the technical field of image super-resolution reconstruction, and particularly relates to a hyperspectral remote sensing image super-resolution reconstruction method, a hyperspectral remote sensing image super-resolution reconstruction device and computer equipment.

Background

Hyperspectral remote sensing images (HyperspectralImages, HSI) are digital images captured by sensors on an onboard or satellite platform that measure electromagnetic wave reflections of the earth's surface in multiple narrow and continuous bands, producing hyperspectral resolution data. The information collected from these images provides detailed information about the chemical and physical characteristics of the land objects so that the different land objects can be accurately classified and identified. Hyperspectral remote sensing images are widely used, for example, for mineral exploration (Govilet al, 2021; bedini, 2017), agriculture (Nguyenet al, 2021; zhang et al, 2016), urban planning (Navinet al, 2020; weber et al, 2018), and environmental monitoring (Stuart et al, 2021; niu et al, 2019), among others. Due to factors such as sensors, camera height, cost, etc., hyperspectral remote sensing images have to sacrifice spatial resolution in pursuit of more detailed spectral resolution (Fu et al 2021), which poses great difficulties for the application of hyperspectral remote sensing images, and therefore, how to increase spatial resolution while maintaining spectral resolution is a very challenging problem.

Image Super-resolution (SR) reconstruction is an ideal method to solve the above problems. The goal of image super resolution is to generate a high resolution image from a low resolution input image, typically by synthesizing the high frequency details lost in the high resolution image using information from the low resolution image and additional information from a priori knowledge or other sources. The image super-resolution can be classified into a multi-image super-resolution (Dian et al, 2021; li et al, 2018) and a single-image super-resolution (Jiang et al, 2020) according to the difference in the number of input images in the process. The multi-image super-resolution is generally realized by adopting a fusion method, namely, a high-resolution panchromatic image or a multi-spectrum image is used for improving the spatial resolution of a low-resolution high-spectrum image. In recent years researchers have proposed a number of novel algorithms to achieve this goal, such as: super-resolution processing of multiple images using a progressive multi-scale deformable residual network (Liu et al 2022 a); multiple image superfractionation tasks are performed using end-to-end deep neural networks (Arefinet al 2020). These efforts are of great importance to further advance the development of multi-image super-resolution technology. However, the multi-image super-resolution method has a significant disadvantage in that a high-resolution auxiliary image must be input during processing, and in the case that it is difficult to obtain an auxiliary image, the multi-image super-resolution will not work. In addition, the multi-image super-resolution process consumes more computer resources and time costs (Liet al 2020; chen et al 2022).

Single Image Super Resolution (SISR) is a technique for generating a high Resolution image by interpolating and recovering a low Resolution image. It is used more widely because it is not limited to auxiliary images. SISR is a popular research direction in the field of computer vision, and can be applied to various scenes, such as video processing (Liuet al, 2022), medical image processing (Chen et al, 2021 b), satellite image processing (Zhanget al, 2020; haut et al, 2018), and the like. The basic flow includes image preprocessing, feature extraction, high resolution image generation and post-processing. Currently, mainstream algorithms of SISR include interpolation-based methods (Park et al, 2003), reconstruction-based methods (Iraniet al, 1991), deep learning-based methods, and the like. Among them, the deep learning-based method has made great progress and has become the best method for SISR research at present. Common deep learning models include convolutional neural network (Convolutional Neural Networks, CNN; leconnet al, 1998) based super-resolution convolutional network (Dong et al, 2016), deep super-resolution model (Kimet al, 2016), super-resolution generation of the countermeasure network (Ledig et al, 2017) based on generation of the super-resolution of the countermeasure network (Generative Adversarial Networks, GAN; goodfellow et al, 2014), enhanced Encoder-decoder generation of the countermeasure network (Enhanced Encoder-Decoders Generative Adversarial Network, EEGAN; jiang et al, 2019), balanced two-stage residual network (Fanet al, 2017) based on a transducer (vaswanet al, 2017), sliding window transducer image reconstruction model (Swin Transformer for image restoration model, swin; lianget et al, 2021), etc., which can learn the mapping relationship between the low resolution image and the high resolution image, thereby enabling generation of the high resolution image. In the methods, the CNN-based model achieves good effect in the field of image superdivision, but has obvious defects in the task due to the algorithm, namely, the same convolution is used for checking different image areas to carry out convolution, so that the method is not suitable for the task of image superdivision; the second is that the convolution kernel often extracts local features, which cannot meet the model requirements of superdivision for processing large-scale features. The GAN-based approach also performs well in the superdivision field, but it suffers from an unstable training process and requires a lot of computational resources and time, not suitable for small teams or individuals (Chai et al 2022).

Transformer (Vaswani et al, 2017) is a neural network model based on self-attention mechanisms, which was originally widely used in the field of natural language processing, such as tasks of machine translation (Vaswani et al, 2017), text generation (Zhu et al, 2019; gongt al, 2019; dong et al, 2017), etc. Unlike conventional recurrent neural networks (Recurrent Neural Networks, RNN; hocchrite et al 1997) or Convolutional Neural Networks (CNN), the Transformer does not need to process sequences by looping or convolution, but rather weight-aggregate elements in the sequences by a self-attention mechanism to achieve sequence-to-sequence modeling. In recent years, the transducer also exhibits very excellent performance in the fields of image classification (Dosovitskiyet al.,2021;Touvronet al, 2021), object detection (Carionet al., 2020), image reconstruction (Chenet al.,2021 b), and image segmentation (Chen et al.,2021 a), and the like. In the field of image superdivision, (Liuet al., 2018) opens the way for the Transformer in the field of image superdivision by learning texture features, and algorithms such as high-efficiency super-resolution Transformer (Lu et al., 2022) and SwinIR successively obtain surprising effects. However, many scholars, including us, consider that conventional deep learning super-resolution methods cannot be directly applied to hyperspectral remote sensing images (Liu et al, 2022b; hu et al, 2022; tu et al, 2022; lei et al, 2021). Because the common image only has 3 color bands of red, green and blue, the methods are more focused on feature extraction among pixels, the number of bands of the remote sensing image can reach hundreds, the interaction among channels becomes extremely important and can not be ignored, more band data increase the task content of the remote sensing image superdivision, and meanwhile, the relation among a plurality of spectrum channels has a promotion effect on the image superdivision, and the information covered by the method is more abundant, so that the extracted features are more. In order to solve the above problem, (Jianget al 2020) performing spatial spectrum priori learning by using CNN to perform hyperspectral image super-division; (Liuet al 2022 b) performing interactive extraction on the spatial features and spectral features of the remote sensing image by adopting a method of combining CNN with a transducer to perform hyperspectral image superdivision. Although the methods consider the spectral feature extraction of the remote sensing images, the methods are still completely based on CNN or have serious dependence on CNN, and the superdivision effect of the methods also has a certain improvement space.

Disclosure of Invention

The invention aims to provide a hyperspectral remote sensing image super-resolution reconstruction method, a hyperspectral remote sensing image super-resolution reconstruction device, computer equipment and a computer readable storage medium, which are used for solving the problem that the existing single image super-resolution reconstruction technology is completely based on CNN or has serious dependence on CNN when being applied to hyperspectral remote sensing images, so that the super-resolution effect of the hyperspectral remote sensing images is to be improved.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

in a first aspect, a method for reconstructing a hyperspectral remote sensing image in super resolution is provided, including:

acquiring a real hyperspectral remote sensing image data set;

cutting out partial regions of the hyperspectral remote sensing image data set, wherein the sizes of the partial regions are respectively

Is used as a plurality of high resolution images in the training set, wherein +.>

And->

Respectively representing positive integers, wherein the square images are in one-to-one correspondence with the high-resolution images;

using a scale factor of 2 ^κ Downsampling each of the plurality of high-resolution images to obtain a plurality of low-resolution images corresponding to the plurality of high-resolution images one by one and having a size of H×W, wherein H represents a positive integer and has

W represents a positive integer and has->

Kappa represents a positive integer;

the method comprises the steps that the plurality of low-resolution images are used as input items, the plurality of high-resolution images are used as output items, a spectrum-space converter model is imported for training to obtain an image super-resolution reconstruction model, a network structure of the spectrum-space converter model comprises a linear embedded layer, K residual error modules, a merging layer, an up-sampling layer, a global jump connection and an anti-embedding layer which are sequentially connected, the residual error modules are formed by sequentially connecting M integrated converter modules and a merging layer in a block in series, a first residual error connection is added between an input end of a first integrated converter module in the M integrated converter modules and an output end of the merging layer in the block, the integrated converter modules comprise a 3D converter sub-module or two multi-head self-attention MSA sub-modules, the 3D converter sub-modules are used for extracting self-attention characteristics between all spectral bands in a local window, the two multi-head self-attention MSA sub-modules comprise self-attention MSA sub-modules which act on spectral image sub-modules in a different spectral window, and the self-attention characteristics are respectively extracted between the two sub-pixel modules in the spectral pixel window, and the spectral sub-modules are not used for extracting spectral characteristics between the spectral image sub-modules in the local window, and the spectral image sub-modules are respectively;

The linear embedded layer is used for inputting low resolution into an image

Mapping the number of spectral bands to a higher dimensional feature space and extracting the low resolution input image I _LR Is used for obtaining the characteristic image after embedding

F ₀ ＝H _E (I _LR ) Wherein C _in Representing the low resolution input image I _LR C represents the number of spectral bands of the embedded feature image F ₀ Channel number, C and C _in Is a positive integer, +.>

Representing a real image, H _E ( ) A processing function representing the linear embedded layer;

the K residual blocks are used for extracting the embedded characteristic image F ₀ Is used for obtaining deep feature images

Wherein k represents a value in the interval [1, K ]]Internal value positive integer ++>

A processing function representing a kth residual block among the K residual blocks;

the merging layer is used for merging the deep characteristic images F _K Feature integration is carried out to obtain a deeper feature image with deeper features extracted

F _M ＝H _M (F _K ) Wherein H is _M () A processing function representing the merge layer;

the up-sampling layer is used for processing the embedded characteristic image F ₀ And the deeper feature image F _M Performing upsampling processing on the feature image to obtain an upsampled feature image

F _HF ＝H _U (F ₀ +F _M ) Wherein H is _U () A processing function representing the upsampling layer;

the global jump connection is used for firstly embedding the characteristic image F by a bicubic interpolation method ₀ Performing up-sampling processing, performing feature extraction on the up-sampling processing result through a convolution layer with a boundary of 1, and finally, extracting the extracted feature image and the up-sampled feature image F _HF Adding to obtain combined characteristic image

Wherein H is _cov () Processing functions representing the convolution layers, +.>

Representing an upsampling processing function based on bicubic interpolation;

the anti-embedding layer is used for combining the characteristic images F _SK Is reduced to the number of spectral bands C _in Obtaining a high-resolution output image as an image super-resolution reconstruction result

I _HR ＝H _UE (F _SK ) Wherein H is _UE () A processing function representing the anti-embedding layer;

and inputting the hyperspectral remote sensing image to be reconstructed and with low resolution into the image super-resolution reconstruction model, and outputting to obtain the hyperspectral remote sensing image with high resolution.

Based on the above-mentioned invention, a super-resolution reconstruction scheme of a hyperspectral remote sensing image based on a space transducer and a spectrum transducer is provided, namely, a plurality of low-resolution images are taken as input items, a plurality of high-resolution images are taken as output items, a spectrum-space transducer model comprising a linear embedded layer, K residual blocks, a merging layer, an up-sampling layer, a global jump connection and an anti-embedding layer which are sequentially connected is imported for training, an image super-resolution reconstruction model is obtained, then a hyperspectral remote sensing image to be reconstructed and with low resolution is input into the image super-resolution reconstruction model, and a hyperspectral remote sensing image with high resolution is obtained, and because the space features and the spectral features of the input images are respectively extracted from the K residual blocks by adopting a space transducer and a spectrum transducer structure, and the two are integrated for the super-resolution reconstruction of the hyperspectral remote sensing image, dependence on CNN can be avoided, and the super-resolution effect is found through experiment contrast and is greatly improved, so that the super-resolution reconstruction scheme is convenient to be applied and popularized.

The second aspect provides a hyperspectral remote sensing image super-resolution reconstruction device, which comprises a data acquisition unit, an image shearing unit, a downsampling processing unit, a reconstruction model training unit and a reconstruction model application unit which are sequentially connected in a communication mode;

the data acquisition unit is used for acquiring a real hyperspectral remote sensing image data set;

the image cutting unit is used for cutting out partial areas of the hyperspectral remote sensing image data set, and the sizes of the partial areas are respectively

And->

the downsampling processing unit is used for adopting a proportion factor of 2 ^κ Downsampling each of the plurality of high-resolution images to obtain a plurality of low-resolution images corresponding to the plurality of high-resolution images one by one and having a size of H×W, wherein H represents a positive integer and has

W represents a positive integer and has

Kappa represents a positive integer;

the reconstruction model training unit is also in communication connection with the image shearing unit, and is used for taking the plurality of low-resolution images as input items and taking the plurality of high-resolution images as output items, leading in a spectrum-space transform model for training to obtain an image super-resolution reconstruction model, wherein the network structure of the spectrum-space transform model comprises a linear embedded layer, K residual blocks, a merging layer, an upsampling layer, a global jump connection and an anti-embedding layer which are sequentially connected, the residual blocks are formed by sequentially connecting M integrated transform modules and an intra-block merging layer in series, a first residual connection is added between the input end of a first integrated transform module in the M integrated transform modules and the output end of the intra-block merging layer, the integrated transducer module comprises a 3D transducer sub-module or two multi-head self-attention MSA sub-modules, wherein the 3D transducer sub-module is used for extracting self-attention characteristics among all spectrum bands in a local window, the two multi-head self-attention MSA sub-modules comprise a spectrum transducer sub-module acting on a spectrum domain and a space transducer sub-module acting on a space domain, the spectrum transducer sub-module is used for extracting the self-attention characteristics among different spectrum bands in a single pixel, the space transducer sub-module is used for extracting the self-attention characteristics among different pixels in the local window, and K and M respectively represent positive integers which are more than or equal to 2;

The linear embedded layer is used for inputting low resolution into an image

Representing a real image, H _E () A processing function representing the linear embedded layer;

Representing an upsampling processing function based on bicubic interpolation;

the reconstruction model application unit is used for inputting the hyperspectral remote sensing image to be reconstructed and with low resolution into the image super-resolution reconstruction model and outputting the hyperspectral remote sensing image with high resolution.

In a third aspect, the present invention provides a computer device, comprising a memory, a processor and a transceiver, which are in communication connection in turn, wherein the memory is configured to store a computer program, the transceiver is configured to send and receive a message, and the processor is configured to read the computer program, and execute the hyperspectral remote sensing image super-resolution reconstruction method according to any of the first aspect or any of the possible designs of the first aspect.

In a fourth aspect, the present invention provides a computer readable storage medium having instructions stored thereon which, when executed on a computer, perform the hyperspectral remote sensing image super resolution reconstruction method as described in the first aspect or any of the possible designs of the first aspect.

In a fifth aspect, the present invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the hyperspectral remote sensing image super resolution reconstruction method as described in the first aspect or any of the possible designs of the first aspect.

The beneficial effect of above-mentioned scheme:

(1) The invention creatively provides a super-resolution reconstruction scheme of a hyperspectral remote sensing image based on a space transducer and a spectrum transducer, namely, a plurality of low-resolution images are used as input items, a plurality of high-resolution images are used as output items, a spectrum-space transducer model comprising a linear embedded layer, K residual blocks, a merging layer, an up-sampling layer, a global jump connection and an anti-embedding layer which are sequentially connected is imported for training, an image super-resolution reconstruction model is obtained, then a hyperspectral remote sensing image to be reconstructed and with low resolution is input into the image super-resolution reconstruction model, and a hyperspectral remote sensing image with high resolution is output.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a super-resolution reconstruction method of a hyperspectral remote sensing image according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a network structure of a spectrum-space transducer model according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a network structure of an integrated transducer module according to an embodiment of the present application.

Fig. 4 is a schematic diagram illustrating an operation manner of the spectrum Transformer submodule according to an embodiment of the present application.

Fig. 5 is an exemplary diagram of a window segmentation method in a space Transformer sub-module according to an embodiment of the present application.

Fig. 6 is a diagram illustrating a relationship between a local window and a spectral band feature image in a 3D transducer sub-module according to an embodiment of the present application.

Fig. 7 is a schematic layer structure of a merging layer according to an embodiment of the present application.

Fig. 8 is an experimental result (scaling factor is 4) of the super-resolution reconstruction method of the hyperspectral remote sensing image provided in the embodiment of the present application on the Houston dataset and on different parameters, where fig. 8 (a) shows an experimental result for the number σ of bands at the time of embedding, fig. 8 (b) shows an experimental result for the window size value H, fig. 8 (c) shows an experimental result for the number K of residual blocks, and fig. 8 (d) shows an experimental result for the number M of integrated transducer modules in each residual block.

Fig. 9 is a super-resolution result (scale factor of 4, red:90, green:60, blue: 30) on a Pavia Centre dataset provided in an embodiment of the present application, where fig. 9 (a) shows a group-trunk-based super-resolution result, fig. 9 (b) shows a Bicubic-based super-resolution result, fig. 9 (c) shows a DDRN-based super-resolution result, fig. 9 (d) shows an SSPSR-based super-resolution result, fig. 9 (e) shows an intelactformer-based super-resolution result, fig. 9 (f) shows a swinlir-based super-resolution result, fig. 9 (g) shows an EEGAN-based super-resolution result, and fig. 9 (h) shows an SST-based super-resolution result.

Fig. 10 is an error result (scale factor of 4) of a real image and a reconstructed image on a Pavia Centre dataset provided in an embodiment of the present application, where fig. 10 (a) shows an error result based on group-trunk, fig. 10 (b) shows an error result based on Bicubic, fig. 10 (c) shows an error result based on DDRN, fig. 10 (d) shows an error result based on SSPSR, fig. 10 (e) shows an error result based on Interactformer, fig. 10 (f) shows an error result based on SwinIR, fig. 10 (g) shows an error result based on EEGAN, and fig. 10 (h) shows an error result based on SST.

Fig. 11 is a super-resolution result (scale factor of 4, red:120, green:80, blue: 60) on a Houston dataset provided in the embodiment of the present application, where fig. 11 (a) shows a group-trunk-based super-resolution result, fig. 11 (b) shows a Bicubic-based super-resolution result, fig. 11 (c) shows a DDRN-based super-resolution result, fig. 11 (d) shows an SSPSR-based super-resolution result, fig. 11 (e) shows an Interactformer-based super-resolution result, fig. 11 (f) shows a SwinIR-based super-resolution result, fig. 11 (g) shows an EEGAN-based super-resolution result, and fig. 11 (h) shows an SST-based super-resolution result.

Fig. 12 is an error result (scale factor of 4) of a real image and a reconstructed image on a Houston dataset provided in the embodiment of the present application, where fig. 12 (a) shows an error result based on group-trunk, fig. 12 (b) shows an error result based on Bicubic, fig. 12 (c) shows an error result based on DDRN, fig. 12 (d) shows an error result based on SSPSR, fig. 12 (e) shows an error result based on Interactformer, fig. 12 (f) shows an error result based on SwinIR, fig. 12 (g) shows an error result based on EEGAN, and fig. 12 (h) shows an error result based on SST.

Fig. 13 is a super-resolution result (scale factor of 4, red:90, green:60, blue: 30) on a Chikusei dataset provided in an embodiment of the present application, where fig. 13 (a) shows a group-trunk-based super-resolution result, fig. 13 (b) shows a Bicubic-based super-resolution result, fig. 13 (c) shows a DDRN-based super-resolution result, fig. 13 (d) shows an SSPSR-based super-resolution result, fig. 13 (e) shows an Interactformer-based super-resolution result, fig. 13 (f) shows a SwinIR-based super-resolution result, fig. 13 (g) shows an EEGAN-based super-resolution result, and fig. 13 (h) shows an SST-based super-resolution result.

Fig. 14 is an error result (scale factor of 4) of a real image and a reconstructed image on a Chikusei data set provided in an embodiment of the present application, where fig. 14 (a) shows an error result based on group-trunk, fig. 14 (b) shows an error result based on Bicubic, fig. 14 (c) shows an error result based on DDRN, fig. 14 (d) shows an error result based on SSPSR, fig. 14 (e) shows an error result based on Interactformer, fig. 14 (f) shows an error result based on SwinIR, fig. 14 (g) shows an error result based on EEGAN, and fig. 14 (h) shows an error result based on SST.

Fig. 15 is an absolute value result of a difference between a pixel calculated value and a true value provided in the embodiment of the present application, where fig. 15 (a) shows an absolute value result based on Pavia dataset, fig. 15 (b) shows an absolute value result based on Houston dataset, and fig. 15 (c) shows an absolute value result based on Chikuseidataset.

Fig. 16 is a hyperspectral image superdivision result under a real scenario provided in the embodiment of the present application, where fig. 16 (a) shows a hyperspectral image superdivision result for Nature Color (red: 4, speed: 3, blue: 2), fig. 16 (b) shows a hyperspectral image superdivision result for Color information (red: 8, speed: 4, blue: 3), fig. 16 (c) shows a hyperspectral image superdivision result for Short-Wave information (red: 12, speed: 8, blue: 4), fig. 16 (d) shows a hyperspectral image superdivision result for agricultural (red: 11, speed: 8, blue: 2), fig. 16 (e) shows a hyperspectral image superdivision result for NDVI, and fig. 16 (f) shows a hyperspectral image superdivision result for NDWI.

Fig. 17 is a schematic structural diagram of a super-resolution reconstruction device for hyperspectral remote sensing images according to an embodiment of the present application.

Fig. 18 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the present invention will be briefly described below with reference to the accompanying drawings and the description of the embodiments or the prior art, and it is obvious that the following description of the structure of the drawings is only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art. It should be noted that the description of these examples is for aiding in understanding the present invention, but is not intended to limit the present invention.

It should be understood that although the terms first and second, etc. may be used herein to describe various objects, these objects should not be limited by these terms. These terms are only used to distinguish one object from another. For example, a first object may be referred to as a second object, and similarly a second object may be referred to as a first object, without departing from the scope of example embodiments of the invention.

It should be understood that for the term "and/or" that may appear herein, it is merely one association relationship that describes an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: three cases of A alone, B alone or both A and B exist; as another example, A, B and/or C, can represent the presence of any one of A, B and C or any combination thereof; for the term "/and" that may appear herein, which is descriptive of another associative object relationship, it means that there may be two relationships, e.g., a/and B, it may be expressed that: the two cases of A and B exist independently or simultaneously; in addition, for the character "/" that may appear herein, it is generally indicated that the context associated object is an "or" relationship.

Examples:

as shown in fig. 1, the super-resolution reconstruction method of the hyperspectral remote sensing image provided in the first aspect of the present embodiment may be performed by, but not limited to, a computer device with a certain computing resource, for example, a server, a personal computer (Personal Computer, PC, refer to a multipurpose computer with a size, price and performance suitable for personal use, a desktop computer, a notebook computer, a small notebook computer, a tablet computer, a super book, etc. all belong to a personal computer), a smart phone, a personal digital assistant (Personal Digital Assistant, PDA), or an electronic device such as a wearable device. As shown in FIG. 1, the super-resolution reconstruction method of the hyperspectral remote sensing image can include, but is not limited to, the following steps S1 to S5.

S1, acquiring a real hyperspectral remote sensing image dataset.

In the step S1, the hyperspectral remote sensing image dataset may be a widely used HSI dataset, for example: pavia Centre Dataset (Huangand Zhang, 2009), houston Dataset (Debeset al., 2014), chikusei Dataset (Yokoyaand Iwasaki, 2016). Pavia Centre Dataset is a hyperspectral dataset obtained by a ROSIS sensor during the flight of Pavia in North Italy, the ground sampling distance is 1.3 m, the number of spectral bands is 102, and the image size of the dataset is 1096×1096×102; since some non-information areas are included in the scene, the non-information areas can be removed in the experiment to obtain high-resolution images with the residual effective area size of 1096×700×102. Houston Dataset is obtained from university campus of Houston and adjacent urban area, and consists of 144 spectral bands in 380nm to 1050nm region, ground sampling distance is 2.5 m, and image size is 349×1905×144; the size after excluding the non-information area of the edge is 1900×340×144.Chikusei Dataset was captured by Headwall Hyperspec-VNIR-C sensors in rural and urban areas of Chikusei, japan, with scene center points located at coordinates: 36.294N,140.008E, the data set has 128 wave bands, the spectrum range is 363nm to 1018nm, consists of 2517X 2335 pixels, and the ground sampling distance is 2.5 meters; because of the information missing problem at the image edge, the image can be cut into 2304×2048 pixels when in use. The hyperspectral remote sensing image data set can also be a data set which is made by cutting an area on a sentinel No. 2 satellite image by using a Google Earth engine, the data set is 1356 multiplied by 545, 13 wave bands (after other non-electromagnetic wave bands are removed), the ground sampling distance is 10 meters, the geographic position is located in a Guangdong national park in the west of China, and the image content comprises grasslands, lakes, roads, cultivated lands and the like.

S2, shearing partial areas of the hyperspectral remote sensing image dataset, wherein the sizes of the partial areas are respectively as follows

And->

And respectively representing positive integers, wherein the square images are in one-to-one correspondence with the high-resolution images.

In the step S2, the partial region may be, for example, a 70% region;

for example 48 x 48 and may be folded or rotated randomly during training. In addition, a further plurality of square images (which may be sized +.>

Alternatively, for example, the size of 128×128) may be used as a plurality of high resolution images in the test set (i.e., as output items during the test).

S3, adopting a scale factor of 2 ^κ Downsampling each of the plurality of high-resolution images to obtain a plurality of low-resolution images corresponding to the plurality of high-resolution images one by one and having a size of H×W, wherein H represents a positive integer and has

W represents a positive integer and has->

Kappa represents a positive integer.

In the step S3, κ may be set according to a specific requirement of super-resolution reconstruction of the image, for example, 1, 2 or 3. In addition, the scale factor 2 may also be used ^κ For each high resolution in the test setThe images are downsampled separately to obtain a plurality of additional low resolution images for use as input during testing.

S4, taking the plurality of low-resolution images as input items, taking the plurality of high-resolution images as output items, and leading the plurality of high-resolution images into a spectrum-space converter model for training to obtain an image super-resolution reconstruction model, wherein a network structure of the spectrum-space converter model comprises, but is not limited to, a linear embedded layer, K residual error modules, a merging layer, an upsampling layer, a global jump connection, an anti-embedding layer and the like, the residual error modules are formed by sequentially connecting M integrated converter modules and a merging layer in a block in series, a first residual error connection is added between the input end of a first integrated converter module in the M integrated converter modules and the output end of the merging layer in the block, the integrated converter modules comprise, but are not limited to, a 3D converter sub-module or two Self-Attention MSA (Multi-head sensor) sub-modules, the residual error modules are used for extracting spectral features between the M integrated converter modules and the Multi-head sensor sub-modules, and the three-dimensional sensor sub-pixel modules are used for extracting spectral features between the two Self-Attention sub-pixel modules, and the three-dimensional sensor modules are used for extracting spectral features between the two Self-Attention sub-pixel modules and the two Self-Attention pixel modules are respectively, and the Self-Attention window sub-aperture and the Self-pixel modules are used for extracting spectral features between the two Self-Attention pixel modules.

In the step S4, the network structure of the spectrum-space converter model is shown in fig. 2, and the network structure of the integrated converter module is shown in fig. 3. The integrated transducer module in this embodiment has two improvements relative to the standard transducer: (1) Replacing the traditional multi-head self-attention (MSA) mechanism with a serial form of spectrum multi-head self-attention (spectrum-MSA) and space multi-head self-attention (space-MSA), namely a serial structure of the spectrum converter sub-module and the space converter sub-module; (2) The three-dimensional multi-head attention is adopted to combine the spectrum multi-head self attention and the space multi-head self attention, namely the 3D transducer sub-module. Specifically, the integrated transducer module further includes, but is not limited to, a first normalization layer, a second normalization layer, and a multi-layer perceptron MLP (Multilayer Perceptron) sub-module (which may be composed of 1 GELU activation function sandwiched between 2 layers of MLP), etc.; the first normalization layer is positioned in front of the 3D converter sub-module or the two multi-head self-attention MSA sub-modules, and a second residual error connection is added between the input end of the first normalization layer and the output ends of the 3D converter sub-module or the two multi-head self-attention MSA sub-modules; the second normalization layer is located behind the adding node of the second residual connection and in front of the multi-layer perceptron MLP submodule, and a third residual connection is added between the input end of the second normalization layer and the output end of the multi-layer perceptron MLP submodule.

According to the process of integrating the transducer module as shown in fig. 3, the process can be expressed as:

wherein X represents the input feature image, LN () represents the processing function of the normalization layer, MSA _spectral () Representing the processing functions of the spectrum converter sub-module, MSA _spatial () Processing functions representing the space Transformer sub-modules, MSA _3D () Representing the processing functions of the 3D transducer sub-module,

and (3) representing the addition result characteristic image of the second residual connection, wherein MLP () represents the processing function of the MLP sub-module of the multi-layer perceptron, and Z represents the addition result characteristic image of the third residual connection, namely the output characteristic image of the integrated transducer module. Since in computer vision tasks a common image has 3 channels and corresponds to three gray values of red, green and blue, but this is significantly different from multispectral or hyperspectral remote sensing images, the latter having a spectrumThe number of bands (i.e. the number of channels) is often greater than 3, and different bands have special meanings for the remote sensing image, and based on this difference, the integrated transducer module is designed for the remote sensing image super-resolution task in this embodiment, and is used as a basic building block of the residual block, and can be used to extract depth features of the image, including both spectral features and spatial features, so that the 3D transducer sub-module and the two multi-head self-attention MSA sub-modules are both adopted, and have special meanings for the remote sensing image super-resolution task.

In the network structure of the integrated transducer module, specifically, as shown in fig. 4, the working mode of the spectrum transducer sub-module uses the multi-head attention mechanism of the standard transducer: when an input feature image passes through a multi-head attention layer, aiming at embedded elements of each input pixel in the input feature image

In each header a corresponding query matrix is calculated according to the following formula>

Key matrix

Sum value matrix->

In which W is ^Q A projection matrix representing the query matrix, W ^K Projection matrix representing the key matrix of the key, W ^V A projection matrix representing said value matrix, which three projection matrices are shared between different picture elements, i representing said each input picture element at said input feature imageThe number of rows in the image, j represents the number of columns of each input pixel in the input characteristic image, and d represents the dimensions of the query matrix, the key matrix and the value matrix; for each input pixel, N-head computing results are obtained by executing the Attention function for N times in parallel, and are multiplied by a weight matrix W after being connected in series ^O Obtaining self-attention characteristics corresponding to and between different spectral bands

The calculation formula of the Attention function is as follows:

in the formula, softMax () represents a normalized exponential function, T represents a matrix transpose symbol, and B represents a relative position code, which is a set of learnable parameters. Based on the foregoing manner of operation, the spectral transducer sub-module can be enabled to extract self-attention features between different spectral bands within a single pixel.

In the network structure of the integrated transducer module, since the process of super-resolution of the remote sensing image can be regarded as mapping the number of pixels from 1 to 4 (for x 2 superdivision task) or more, compared with the image classification task, the sensitivity of the remote sensing image to local features is higher than that of the local features, and therefore, in particular, the working mode of the space transducer sub-module is as follows: dividing an input feature image of size H×W×C into (H×W)/(m) ² ) The method comprises the steps of selecting a plurality of non-overlapping partial window forms, wherein the size of each partial window form is m multiplied by C, and m represents a positive integer greater than or equal to 2; and for each local window, calculating the self-attention characteristic corresponding to and among different pixels independently. The specific manner of image segmentation described above can be seen in fig. 5. Since self-attention calculations are confined to separate frames, resulting in failure of attention between frames to be considered, cross-window can be implemented based on local window alternating with limited sliding window The connection, i.e. preferably, divides the input feature image of size H×W×C into (H×W)/(m) ² ) Among the partial windows that do not overlap with each other, there are: dividing an input feature image of size H×W×C into (H×W)/(m) ² ) And in the partial windows which are not overlapped with each other, the input characteristic image is downwards and rightwards moved by m/2 pixels before each partial window is divided, wherein m represents a positive integer which is more than or equal to 2. In addition, the manner of calculating the self-attention features between different pixels is similar to that of the spectrum Transformer sub-module, and can be directly derived, which is not described herein.

In the network structure of the integrated transducer module, as shown in fig. 6, the window dividing manner in the 3D transducer sub-module is the same as that in the space transducer sub-module, that is, the shape of each partial window after division is m×m×c; the key point is to divide the number C of channels after embedding into the number C of spectral bands of the original HSI _in That is, the partial window after division includes m×m×c _in The 3d optical spectrum band feature images are then used to calculate the self-attention values of these spectral bands relative to each other, i.e. specifically, the 3d optical spectrum carrier sub-module works as follows: dividing an input feature image of size H×W×C into (H×W)/(m) ² ) In the partial window bodies which are not overlapped with each other, the size of each partial window body is m×m×c and includes m×m×c _in The characteristic images of the spectrum bands, m represents a positive integer greater than or equal to 2; for each local window, the self-attention characteristics corresponding to and between different spectrum bands are calculated independently. Similarly, an input feature image of size H×W×C is divided into (H×W)/(m) ² ) Among the partial windows that do not overlap with each other, there are: dividing an input feature image of size H×W×C into (H×W)/(m) ² ) And in the partial windows which are not overlapped with each other, the input characteristic image is downwards and rightwards moved by m/2 pixels before each partial window is divided, wherein m represents a positive integer which is more than or equal to 2. Furthermore, the specific calculation is between different spectral bandsSimilar to the self-attention feature calculation in the spectrum converter sub-module, the self-attention feature can be directly derived and will not be described herein.

In the network structure of the spectrum-space transducer model, specifically, the linear embedding layer is used for inputting low resolution into an image

Mapping the number of spectral bands to a higher dimensional feature space and extracting the low resolution input image I _LR Shallow features of (2) to obtain embedded feature images +.>

Representing a real image, H _E () Representing the processing function of the linear embedded layer. In detail, a specific embedding process in the linear embedding layer may be performed using a convolution of 3×3, and the set boundary is set to 1 in this process so as to maintain the image size.

In the network structure of the spectrum-space transducer model, specifically, the K residual blocks are used for extracting the embedded feature image F ₀ Is used for obtaining deep feature images

Representing the processing function of the kth residual block among the K residual blocks.

In the network structure of the spectrum-space transducer model, specifically, the merging layer is used for merging the deep feature image F _K Feature integration is carried out to obtain a deeper feature image with deeper features extracted

F _M ＝H _M (F _K ) Wherein H is _M () Representing the processing functions of the merge layer. The purpose of the merging layer is to enhance the fitting ability of the spectral-spatial transform model, which may be specifically composed of three convolutions alternating with two activation functions, as shown in fig. 7, where the middle convolution layer sets the boundary to 1, as this is done to ensure that the image is not downsampled. In addition, the purpose of the intra-block merge layer is the same as the merge layer, and a layer structure as shown in fig. 7 may also be employed.

In the network structure of the spectrum-space transducer model, specifically, the upsampling layer is used for embedding the post-embedding feature image F ₀ And the deeper feature image F _M Performing upsampling processing on the feature image to obtain an upsampled feature image

F _HF ＝H _U (F ₀ +F _M ) Wherein H is _U () Representing the processing function of the upsampling layer. In general, shallow features focus on low frequency features of the remote sensing image, while deep features focus on high frequency features of the remote sensing image, which may be aggregated by a jump connection. In this embodiment, the Sub-Pixel convolution layer may be specifically used to perform upsampling processing on the feature image.

In the network structure of the spectrum-space transducer model, specifically, the global jump connection is used for embedding the characteristic image F by a bicubic interpolation method ₀ Up-sampling, and up-sampling by convolution layer pair with boundary 1Extracting features from the sample processing result, and finally extracting the extracted feature image and the up-sampled feature image F _HF Adding to obtain combined characteristic image

An upsampling processing function based on bicubic interpolation is shown. The purpose of the global jump connection is to avoid model degradation, obtain higher accuracy and faster convergence speed.

In the network structure of the spectrum-space transducer model, specifically, the anti-embedding layer is used for combining the characteristic images F _SK Is reduced to the number of spectral bands C _in Obtaining a high-resolution output image as an image super-resolution reconstruction result

I _HR ＝H _UE (F _SK ) Wherein H is _UE () Representing the processing function of the anti-embedding layer.

In the step S4, the specific training process of the spectrum-space Transformer model may be conventionally obtained based on the existing model training process, for example, the optimizer selects a commonly used adaptive moment estimation, etc. Since the loss function can be used for measuring the quality of the reconstruction result of the remote sensing image, it is the target of the model to be optimized, preferably, the loss function L of the spectrum-space transducer model _total The calculation is performed according to the following formula:

wherein L is ₁ Represents a loss function based on the average absolute error, L _G Representing the spectral gradient loss function, I _GT Representing a true high-resolution image of the subject,

representing the spectral gradient of a real high resolution image, I _SR High resolution image representing reconstructed output, +.>

Spectral gradient, lambda, representing high resolution images of reconstructed output ₁ And lambda (lambda) ₂ Representing a set of hyper-parameters and having lambda ₁ +λ ₂ =1. Namely, for the super-resolution task of the remote sensing image, not only the effect of the spatial domain is required, but also the reduction degree of the spectral domain is ensured, so in this embodiment, two loss functions are used to measure the difference between the reconstructed high-resolution image and the reconstructed real image: when measuring the precision of the remote sensing image space domain, selecting a commonly used loss function based on average absolute error, wherein the loss does not excessively penalize large errors and has better convergence compared with the loss function based on mean square error; when measuring the accuracy of the spectrum domain of the remote sensing image, a spectrum gradient loss function is selected, and is commonly used when measuring the recovery degree of spectrum information.

S5, inputting the hyperspectral remote sensing image to be reconstructed and with low resolution into the image super-resolution reconstruction model, and outputting to obtain the hyperspectral remote sensing image with high resolution.

In the step S5, the size of the hyperspectral remote sensing image to be reconstructed and having low resolution may be

Or not be->

For example when->

In the case of 128 x 128,and can be at a scale factor of 2 ^κ 4, reconstructing to obtain the high-resolution hyperspectral remote sensing image with 512 multiplied by 512.

Based on the above-described hyperspectral remote sensing image super-resolution reconstruction method in detail in steps S1 to S5, the present embodiment further provides a verification experiment based on three data sets of Pavia Centre Dataset, houston Dataset, chikusei Dataset, etc.: (A) Setting parameters and selecting evaluation indexes in experiments; (B) An elimination analysis experiment is carried out, so that the advantages and disadvantages of the method are analyzed; (C) The experimental results were compared with the currently best methods and analyzed.

(A) Parameter setting and evaluation index selection in experiment:

(A1) Parameter setting: in the method of the present embodiment (named Spectral and Spatial Transformer, abbreviated SST), the number of channels of the feature image after linear embedding is increased by 9 times the number of channels of the input HSI (i.e., C and C _in Is equal to 9); the number of residual blocks is set to 6 (i.e., K equals 6); the number of integrated transducer modules in each residual block is set to 6 (i.e., M equals 6); the sizes of the 3 convolution kernels in the merging layer are 3×3, 1×1 and 3×3 in sequence, and the LeakyReLU is adopted as an activation function; the number of heads of multi-head self-attention is 3 in the spectrum converter sub-module and the space converter sub-module, and the size of a single window in the space converter sub-module is set to be 8 multiplied by 8; the network optimizer selects the usual adaptive moment estimates (Adaptive Moment Estimation, adam; kingmaand Ba, 2014), the initial learning rate is set to 0.0005; the weights of the spatial and spectral loss functions are uniformly distributed to be 0.5 (i.e., lambda ₁ And lambda (lambda) ₂ Equal to 0.5 respectively). In addition, due to the limitation of the memory of the computer graphics processor, the batch size during training is set to 4; all experiments were performed using Pytorch framework, with a graphics processor NVIDIA GeForceRTX3090 and a memory size of 24GB.

(A2) Evaluation index: 4 widely used image quality evaluation indexes, such as average peak signal-to-noise ratio (mps nr; huynh-Thu, 2008), average structural similarity (mean structure similarity, MSSIM; wanget al, 2004), spectral angle index (spectralangle mapper, SAM; square et al, 2018), and correlation coefficient (cross correlation, CC; loncan et al, 1988), are selected to verify the performance of the proposed SST method for HSI super-resolution. MPSNR and MSSIM respectively calculate the similarity and structural consistency between the real image and the generated image, measure the spatial vision quality by averaging all spectrum bands, and a higher MPSNR value means better vision quality and the MSSIM has a best quality value of 1.SAM and CC reflect the spectral fidelity of the image, with the SAM having an optimum value of 0 and the CC having an optimum value of 1.

(B) Elimination analysis experiments were performed: to check the effectiveness of the proposed algorithm, a series of elimination experiments were performed on Houston Dataset.

First, a 2-fold magnification factor (i.e., scale factor 2 ^κ Equal to 2), three sets of control experiments were performed to explore the advantages of integrating each sub-module in the transducer module: only a space transducer sub-module (Spatial Transformer Model, STM) is adopted to extract the space characteristics among the HSI pixels, and the spectrum characteristics among various spectrum bands are not considered, so that the method is also a thought adopted by a general image superdivision task; adopting a serial structure (Spectral and Spatial Transformer Concatenated Model, SSTM) of a spectrum converter sub-module and a space converter sub-module, wherein both spectrum characteristics and space characteristics are considered, but the two characteristics are respectively calculated by the two characteristic modules in a serial mode; the spectral features and the spatial features are calculated together in three dimensions by using a 3D transducer sub-module (3 DTM), and it should be noted that, due to the limitation of the GPU memory, the window size in the 3DTM experiment is set to 3. The experimental results are shown in Table 1.

TABLE 1 expression of experimental results of different network structures on Houston dataset (scaling factor of 2)

	MPSNR	MSSIM	SAM	CC
					STM	38.09	0.9322	4.32	0.9223
SSTM	39.12	0.9417	3.79	0.9324
					3DTM	39.04	0.9354	3.84	0.9348

From the values of the four evaluation indexes in Table 1, it can be seen that the SSTM has the best overall performance and the STM has the worst overall performance. From the index of the reduction degree of the SAM and the CC reaction spectrum, the experimental effect is obviously improved after the calculation of the spectrum characteristics is added, which shows that the spectrum characteristics are important for hyperspectral remote sensing. The implementation of 3DTM is slightly inferior to SSTM, probably because the window size is set to 3 (due to the limitation of GPU memory), and the window size is also one of the important parameters affecting the experimental effect, as can be expected from the analysis of fig. 8 (b), after increasing the window size, the experimental effect of 3DTM is likely to exceed SSTM. Considering that the calculation cost is also an important index for evaluating the quality of the model, the subsequent comparison experiment is carried out based on using SSTM as a representative model of SST.

Again using SSTM, a 4-fold magnification factor (i.e., scale factor 2 ^κ Equal to 4), the influence on the super-resolution effect of the HSI by the multiple sigma of the wave band number, the window size value H, the number K of residual blocks and the number M of the integrated transducer modules in each residual block in the SST method are respectively explored, and the experimental result is shown in fig. 8. For the multiple of the number of embedded bands (fig. 8 a), as σ increases, the super-resolution effect also increases gradually, SAM and MPSNR optimize particularly significantly in the interval of σ=3 to 9, and optimize slowly in the interval of σ=9 to 15, which indicates that changing the embedding dimension can bring improvement to the super-resolution effect of the model, but too large σ also increases the complexity and the computation cost of the model, so in the SST method, σ=9 is set; for the divided window size (fig. 8 b), the effect of the model increases with the increase of the window size value H, and h=8 is set in SST after comprehensively considering the calculation cost and the model effect; for the number of residual blocks (fig. 8 c), as K increases, the model effect increases and then decreases, the division point is at k=6, and the reason for the decrease in model effect may be network degradation or overfitting the training set due to the network depth being too deep, so k=6 is set in the SST method; for the number of integrated transducer modules (FIG. 8 d), the model effect increases progressively with increasing M value, but at M >Model effect improvement slows down at 6, and m=6 is set in SST method in order to save calculation cost.

(C) Comparison test: based on the 3 data sets mentioned above, six existing image super-resolution reconstruction methods were also selected for comparison experiments in this example, with scale factors set to 2, 4 and 8. The six existing image super-resolution reconstruction methods are respectively as follows: bicubic), deep bubble recursive networks (Deep distillation recursive network, DDRN; jiang et al, 2018), space-spectral pre-networks (Spatial-spectral prior network, SSPSR; jiang et al, 2020), interpreters (Liuet al, 2022 b), swinIR (Wang et al, 2021) and EEGAN (Jiang et al, 2019). Bicubic is a classical image processing method, which needs to be used for each band of a remote sensing image during experiments to realize super-resolution of HSIs; the DDRN and the SSPSR are based on a CNN hyperspectral remote sensing image super-resolution method, and the progressive up-sampling method is adopted in the SSPSR model, so that only 4 and 8 scale factors are used in a comparison experiment with the SSPSR; both the Interactformer and the SwinIR are image super-resolution methods based on a transducer; EEGAN is a GAN-based HSIs super resolution method. In addition, this example conducted a comparative experiment by selecting the SSTM method (Table 1) with the best effect among the three methods STM, SSTM and 3 DTM.

(C1) Comparative test based on Pavia Centre Dataset: table 2 shows experimental results evaluation with different scale factors using different super resolution algorithms on Pavia Centre Dataset. From table 2, it can be seen that SST method achieves the best effect on each scale factor, while Bicubic interpolation works the worst, which suggests that the deep learning method is more efficient than the conventional interpolation. Among the several algorithms used for machine learning, it was also found that the Interactformer and SwinIR algorithm using the fransformer structure and the EEGAN algorithm using GAN, whose MPSNR and MSSIM indices are overall higher than the DDRN and SSPSR using the CNN structure, are consistent with the related algorithms performing in other areas such as classification (Dosovitskiyet al., 2021) segmentation (Chen et al, 2021 b), probably because the algorithm in principle the fransformer and GAN models can extract more global features, resulting in better fitting ability of these models.

TABLE 2 quantitative comparison of super resolution using different methods on Pavia Centre dataset (bold represents best value)

Fig. 9 and 10 show the super-resolution results and errors at a scale factor of 4 and on the Pavia Centre dataset, respectively, from which it can be seen that the SST method is better overall, whereas the Bicubic method is not ideal in terms of both overall and detail, and DDRN and SwinIR also show checkerboard artifacts, and SSPSR and Interactformer show some degree of distortion in terms of detail, such as building boundaries, and the performance of the Interactformer method is closer to that of the SST method.

(C2) Comparative experiment based on Houston Dataset: table 3 shows experimental results evaluation using different super resolution reconstruction methods on Houston Dataset and with different scale factors. As can be seen from table 3, the effect of each method gradually decreases with increasing scale factor, but each evaluation index of the SST method is better than that of the other reconstruction methods. The SAM of the SwinIR method is significantly worse than the CC index compared with other machine learning methods, which may be because SwinIR is not a super-resolution reconstruction method for a remote sensing image, and lacks a model-related structure for extracting spectral features for super-resolution reconstruction of a remote sensing image.

TABLE 3 quantitative comparison of super resolution using different methods on Houston dataset (bolded represents the best value)

Fig. 11 shows the super-resolution results on the Pavia Centre dataset with a scale factor of 4, it can be seen that the super-resolution results of the SST method are relatively good among the several reconstruction methods of the experiment, because they restore the edge features of the feature better. The super-resolution image of SwinIR is significantly different in color from the group-trunk, probably because of the lack of a spectral feature extraction module in its model. All methods of the experiment are deficient in the degree of reduction at the level of detail, e.g. they do not reduce the black spots on the roof of the building in the image. As shown in fig. 12, it can be seen from the error image that the SST method has smaller errors (more dark blue pixels).

(C3) Comparative test based on Chikusei Dataset: table 4 shows experimental results evaluation using different super resolution methods and with different scale factors on the Chikusei Dataset. As can be seen from table 4, the values of the indexes are superior to the values of the results of the other two data sets, such as higher MPSNR value, MSSIM value and CC value, and lower SAM value, and the reason for this phenomenon may be that: (1) in the aspect of space, the large-area region of the Chikusei data set is in rural areas, and the types of ground objects are single, so that the reconstruction difficulty of super-resolution is reduced; (2) in the aspect of spectrum, as crops in the Chikusei data set have larger specific gravity, the spectrum response curves of the pixels are similar, which is beneficial for the model to learn the spectrum characteristics of the crops, so that the crops have higher spectrum fidelity. It can also be seen from table 4 that the SST method is superior to the other methods for each model evaluation, and the comparison of the results from each method is consistent with the other two data sets.

TABLE 4 quantitative comparison of super resolution using different methods on the Chikusei dataset (bold represents the best value)

Fig. 13 shows super-resolution results with scale factor 4 on the Chikusei dataset, which shows that the experimental results of several machine learning methods are all relatively close, indicating that the overall effect of the various methods on the Chikusei dataset is good, consistent with the information reflected by the evaluation index in table 4. However, SST, SSPSR, interactformer and EEGAN are significantly better than the other methods in terms of detail, which is more apparent in the region of the image where the feature changes drastically, such as the upper right corner of the image, which is also well shown in the error map (shown in fig. 14).

Fig. 15 shows the absolute values of the differences between the pixel values of the individual pixels in different bands and the pixel values of the real image in the resulting image on three data sets and generated based on different superdivision reconstruction methods, the data type of the pixel values being 16-bit non-symbol integer, the theoretical maximum value being 65535. The graph can reflect the spectrum fidelity of each super-resolution reconstruction algorithm, the pink dotted line in the graph represents the SST method, and the curve of the SST method is positioned at the lower position in most wavelength intervals in the experiment of three data sets, which is consistent with the effect reflected by SAM and CC in the evaluation index.

(D) Hyperspectral remote sensing image super-resolution reconstruction experiment under real scene: in order to check the availability of the proposed SST method in scientific research production work and the quality of the hyperspectral remote sensing image super-division effect, the embodiment also uses a Google Earth engine to cut a region on the sentinel No. 2 satellite image to manufacture a data set, wherein the data set size is 1356×545, the data set size comprises 13 wave bands (after other non-electromagnetic wave bands are removed), the ground sampling distance is 10 meters, the geographic position is located in a Guangdong national park in the west of China, and the image content comprises grasslands, lakes, roads, cultivated lands and the like. In the experiment, 70% of the area of the data set is cut into square blocks with 48 multiplied by 48 pixels and the other 30% of the area is cut into square blocks with 128 multiplied by 128 pixels to be used as training data, and the experiment is a real use scene of a simulation model, so that the test data has no group-trunk data, and a model with a scale factor of 4 is directly used for generating 512 multiplied by 512 pixels.

FIG. 16 shows the result of the hyperspectral image super-resolution experiment under real scene using SST method, wherein the upper left corner red frame in each picture is the input low resolution image, and (a) is the true color composite image; (b) Is a standard false color composite graph, the band combination is used for emphasizing healthy and unhealthy vegetation; (c) For short wave infrared band combinations, a darker green color in the combination indicates dense vegetation, and brown color indicates bare soil and areas of pile-up; (d) Is commonly used to monitor the health of crops; (e) Rendering a map for the calculated normalized vegetation index (NDVI, normalized Difference Vegetation Index), which is one of the important parameters reflecting crop growth and nutrition information; (f) A map is rendered for the calculated normalized water index (NDWI, normalized Difference Water Index) that highlights the water information in the image. By comparing the common band combination in the above 6 or calculating the low resolution image in the index with the corresponding super resolution result, the proposed SST method can be presumed to have good performances in terms of spatial reduction degree and spectral fidelity, and is enough to serve as the basis of daily remote sensing work.

The super-resolution reconstruction method of the hyperspectral remote sensing image based on the space Transformer and the spectrum Transformer is provided, namely, a plurality of low-resolution images are firstly used as input items, a plurality of high-resolution images are used as output items, the super-resolution reconstruction method of the hyperspectral remote sensing image based on the space Transformer and the spectrum Transformer is imported to the spectrum-space Transformer model comprising a linear embedded layer, K residual blocks, a merging layer, an up-sampling layer, a global jump connection and an anti-embedding layer which are sequentially connected, training is carried out, the image super-resolution reconstruction model is obtained, then the hyperspectral remote sensing image to be reconstructed and with low resolution is input into the image super-resolution reconstruction model, and the hyperspectral remote sensing image with high resolution is output.

As shown in fig. 17, a second aspect of the present embodiment provides a virtual device for implementing the super-resolution reconstruction method of a hyperspectral remote sensing image according to the first aspect, where the virtual device includes a data acquisition unit, an image clipping unit, a downsampling processing unit, a reconstruction model training unit, and a reconstruction model application unit that are sequentially connected in communication;

And->

W represents a positive integer and has->

Kappa represents a positive integer;

The linear embedded layer is used for inputting low resolution into an image

Representing an upsampling processing function based on bicubic interpolation;

I _HR ＝H _UE (F _SK ) Wherein H is _UE () Watch (watch)Showing a processing function of the anti-embedding layer;

The working process, working details and technical effects of the foregoing device provided in the second aspect of the present embodiment may refer to the super-resolution reconstruction method of the hyperspectral remote sensing image described in the first aspect, which are not described herein again.

As shown in fig. 18, a third aspect of the present embodiment provides a computer device for performing the super-resolution reconstruction method of a hyperspectral remote sensing image according to the first aspect, which includes a memory, a processor, and a transceiver that are sequentially communicatively connected, where the memory is configured to store a computer program, the transceiver is configured to send and receive a message, and the processor is configured to read the computer program and perform the super-resolution reconstruction method of a hyperspectral remote sensing image according to the first aspect. By way of specific example, the Memory may include, but is not limited to, random-Access Memory (RAM), read-Only Memory (ROM), flash Memory (Flash Memory), first-in first-out Memory (First Input First Output, FIFO), and/or first-in last-out Memory (First Input Last Output, FILO), etc.; the processor may be, but is not limited to, a microprocessor of the type STM32F105 family. In addition, the computer device may include, but is not limited to, a power module, a display screen, and other necessary components.

The working process, working details and technical effects of the foregoing computer device provided in the third aspect of the present embodiment may refer to the super-resolution reconstruction method of the hyperspectral remote sensing image described in the first aspect, which are not described herein again.

A fourth aspect of the present embodiment provides a computer readable storage medium storing instructions comprising the method for super-resolution reconstruction of a hyperspectral remote sensing image as described in the first aspect, i.e. the computer readable storage medium has instructions stored thereon which, when run on a computer, perform the method for super-resolution reconstruction of a hyperspectral remote sensing image as described in the first aspect. The computer readable storage medium refers to a carrier for storing data, and may include, but is not limited to, a floppy disk, an optical disk, a hard disk, a flash Memory, and/or a Memory Stick (Memory Stick), where the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.

The working process, working details and technical effects of the foregoing computer readable storage medium provided in the fourth aspect of the present embodiment may refer to the super-resolution reconstruction method of the hyperspectral remote sensing image as described in the first aspect, which are not described herein.

A fifth aspect of the present embodiment provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the hyperspectral remote sensing image super resolution reconstruction method as described in the first aspect. Wherein the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus.

Finally, it should be noted that: the foregoing description is only of the preferred embodiments of the invention and is not intended to limit the scope of the invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The super-resolution reconstruction method of the hyperspectral remote sensing image is characterized by comprising the following steps of:

acquiring a real hyperspectral remote sensing image data set;

And->

W represents a positive integer and has->

Kappa represents a positive integer;

The linear embedded layer is used for inputting low resolution into an image

Representing an upsampling processing function based on bicubic interpolation;

2. The method for reconstructing hyperspectral remote sensing images with super resolution according to claim 1, wherein the integrated transducer module further comprises a first normalization layer, a second normalization layer and a multi-layer perceptron MLP sub-module;

The first normalization layer is positioned in front of the 3D converter sub-module or the two multi-head self-attention MSA sub-modules, and a second residual error connection is added between the input end of the first normalization layer and the output ends of the 3D converter sub-module or the two multi-head self-attention MSA sub-modules;

the second normalization layer is located behind the adding node of the second residual connection and in front of the multi-layer perceptron MLP submodule, and a third residual connection is added between the input end of the second normalization layer and the output end of the multi-layer perceptron MLP submodule.

3. The hyperspectral remote sensing image super-resolution reconstruction method as claimed in claim 1, wherein the working mode of the 3d transformer sub-module is as follows:

dividing an input feature image of size H×W×C into (H×W)/(m) ² ) In the partial window bodies which are not overlapped with each other, the size of each partial window body is m×m×c and includes m×m×c _in The characteristic images of the spectrum bands, m represents a positive integer greater than or equal to 2;

for each local window, the self-attention characteristics corresponding to and between different spectrum bands are calculated independently.

4. The method for reconstructing hyperspectral remote sensing images with super resolution as set forth in claim 1, wherein the working mode of the spectral transducer sub-module is that of using a multi-head attention mechanism of a standard transducer:

when an input feature image passes through a multi-head attention layer, aiming at embedded elements of each input pixel in the input feature image

Keyword key matrix->

Sum value matrix->

In which W is ^Q A projection matrix representing the query matrix, W ^K Projection matrix representing the key matrix of the key, W ^V Representing the projection matrix of the value matrix, wherein the three projection matrices are shared among different pixels, i represents the number of rows of each input pixel in the input characteristic image, j represents the number of columns of each input pixel in the input characteristic image, and d represents the dimensions of the query matrix, the key matrix and the value matrix;

for each input pixel, N-head computing results are obtained by executing the Attention function for N times in parallel, and are multiplied by a weight matrix W after being connected in series ^O Obtaining self-attention characteristics corresponding to and between different spectral bands

The calculation formula of the Attention function is as follows:

in the formula, softMax () represents a normalized exponential function, T represents a matrix transpose symbol, and B represents a relative position code, which is a set of learnable parameters.

5. The hyperspectral remote sensing image super-resolution reconstruction method as claimed in claim 1, wherein the working mode of the space transducer sub-module is as follows:

dividing an input feature image of size H×W×C into (H×W)/(m) ² ) The method comprises the steps of selecting a plurality of non-overlapping partial window forms, wherein the size of each partial window form is m multiplied by C, and m represents a positive integer greater than or equal to 2;

and for each local window, calculating the self-attention characteristic corresponding to and among different pixels independently.

6. The method of claim 3 or 5, wherein the input feature image of size h×w×c is segmented into (h×w)/(m) ² ) The partial windows which are not overlapped with each other comprise: dividing an input feature image of size H×W×C into (H×W)/(m) ² ) And in the partial windows which are not overlapped with each other, the input characteristic image is downwards and rightwards moved by m/2 pixels before each partial window is divided, wherein m represents a positive integer which is more than or equal to 2.

7. The method for reconstructing hyperspectral remote sensing images with super resolution as recited in claim 1, wherein the loss function L of the spectrum-space transducer model _total The calculation is performed according to the following formula:

Spectral gradient, lambda, representing high resolution images of reconstructed output ₁ And lambda (lambda) ₂ Representing a set of hyper-parameters and having lambda ₁ +λ ₂ ＝1。

8. The hyperspectral remote sensing image super-resolution reconstruction device is characterized by comprising a data acquisition unit, an image shearing unit, a downsampling processing unit, a reconstruction model training unit and a reconstruction model application unit which are sequentially in communication connection;

Is used as a plurality of high resolution images in the training set, wherein +. >

And->

W represents a positive integer and has->

Kappa represents a positive integer;

The linear embedded layer is used for inputting low resolution into an image

Representing an upsampling processing function based on bicubic interpolation;

9. A computer device comprising a memory, a processor and a transceiver in communication connection in sequence, wherein the memory is configured to store a computer program, the transceiver is configured to transmit and receive a message, and the processor is configured to read the computer program and execute the hyperspectral remote sensing image super-resolution reconstruction method according to any one of claims 1 to 7.

10. A computer readable storage medium having instructions stored thereon which, when executed on a computer, perform the hyperspectral remote sensing image super resolution reconstruction method as recited in any one of claims 1 to 7.