CN116563103A

CN116563103A - Remote sensing image space-time fusion method based on self-adaptive neural network

Info

Publication number: CN116563103A
Application number: CN202310422272.XA
Authority: CN
Inventors: 冯天; 张微; 胡晨璐; 马梦婷; 马笑文
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-04-19
Filing date: 2023-04-19
Publication date: 2023-08-08

Abstract

The invention discloses a remote sensing image space-time fusion method based on a self-adaptive neural network. The invention provides a multi-stage remote sensing image space-time fusion model based on time feature enhancement and space texture migration in a space-time fusion task. In terms of time feature enhancement, a time interaction module (Temporal Interaction Module, TIM) is designed to take full advantage of the time difference features of the different phases. This module employs a cross-time gating mechanism that emphasizes temporal information variations in the characteristics of the different phases. On the spatial texture migration, the texture in the high resolution image is migrated into the low resolution image by using AdaIN to learn global spatial correlation. The invention can lead the low resolution image to have high resolution details through space-time fusion.

Description

Remote sensing image space-time fusion method based on self-adaptive neural network

Technical Field

The invention relates to the field of deep learning and computer vision, in particular to a remote sensing image self-adaptive network based on space-time fusion.

Background

With the continuous development of earth observation technology, the demand for high-resolution remote sensing images in short time series is increasing. The remote sensing image with high time and high spatial resolution plays an important role in crop and forest monitoring, surface disaster dynamic research and land coverage change detection.

However, due to some unavoidable technical and budget constraints, a single satellite often cannot obtain both high-time and high-spatial resolution remote sensing satellite images. Satellites that provide high spatial resolution images typically have a longer revisit period, while satellites with a short revisit period typically only provide images of low spatial resolution. For example, a medium resolution imaging spectrometer (MODIS) obtains a remote sensing image with a spatial resolution of 250-1000 meters, but only 1 day. The remote sensing image obtained by the land satellite Landsat-8 series sensor has a spatial resolution of 30 meters, but a time period of 16 days.

And the space-time fusion fuses the terrestrial satellite image and the MODIS image together to obtain a remote sensing image with high space-time resolution. While satisfying the constant change of the land object over time, it retains the detailed characteristics of the land object.

A number of spatio-temporal fusion algorithms have been proposed in recent years, which can be broadly divided into three categories: reconstruction-based methods, bayesian-based methods, and learning-based methods.

Based on the reconstruction method, the reconstruction-based method can be further subdivided into a weight function-based method and a unmixed-based method. The method based on the unmixing adopts a linear spectrum mixing theory, and estimates the value of the fine pixel by analyzing and decomposing the composition of the coarse pixel, but has the problems of large spectrum decomposition error and large intra-class variability. The weight function-based method estimates a predicted image by fusing information of all input images by a weight function. But there is a problem in that the prediction effect of small objects and linear objects is poor.

The Bayesian-based method fuses related information in the image time sequence, converts the fusion problem into a probability estimation problem, and obtains the fused image by using a maximum posterior estimator. The bayesian framework provides more flexibility in modeling the relationship between the input image and the predicted image. The Bayesian-based method has good interpretability, but the fusion efficiency is relatively low due to iterative solution.

The recent rise of deep learning has made it possible to fill the vast gap between spatio-temporal fusion space and time resolution by deep neural network models, which can process more abstract image features for better performance. Furthermore, in view of satisfactory performance in image generation, style migration, super resolution reconstruction, generation of a countermeasure network (GAN) is also used to accomplish the spatiotemporal fusion task. The GAN-STFM does not process temporal feature changes between coarse images, ignores temporal correlations, and does not fully reflect changes in fine images. It mitigates the resolution difference in that the fine image and feature connection is adjusted by the coarse image, but only the convolution and addition operations are used. OPGAN captures earth surface coverage time-varying information directly from the difference image of the coarse image, acquires content information from the unchanged region and texture information using the fine image, and can generate a predicted image by adding feature information of the difference image to the content information, thereby alleviating a huge spatial resolution gap. The MLFF-GAN learns global distribution relations between multi-temporal images by AdaIN and learns local information weights at small region changes using an Attention Module (AM) to handle time-varying differences and huge resolution differences. But its learned locally varying features are only used in the feature fusion stage and lack the ability to capture distant dependencies.

Disclosure of Invention

The invention aims to solve the technical problem of how to process and effectively fuse image features under different time and space resolutions by using deep learning and computer vision field technologies, and provides a remote sensing image space-time fusion method based on a self-adaptive neural network.

The specific technical scheme adopted by the invention is as follows:

a remote sensing image space-time fusion method based on a self-adaptive neural network comprises the following specific steps: inputting a group of high-resolution images, low-resolution images and low-resolution images corresponding to a reference date into a remote sensing image self-adaptive network model with U-shaped structural characteristics aiming at two remote sensing images with different resolutions to be fused, and generating the high-resolution images with the predicted dates by the remote sensing image self-adaptive network model;

the remote sensing image self-adaptive network model is used as a generator in advance and is trained together with the discriminator through an countermeasure training framework;

in the countermeasure training framework, a remote sensing image self-adaptive network model serving as a generator comprises a symmetrical Feature Refinement Module (FRM) and a Feature Fusion Module (FFM), and the output of each sub-module in the feature refinement module is transmitted to the sub-module of the feature fusion module at a symmetrical position, so that a U-shaped network structure is formed; each sub-module in the feature refinement module performs feature refinement and transmission through a Time Interaction Module (TIM), and uses an AdaIN layer to migrate the spatial texture of the high-resolution reference image; the feature fusion module is used for carrying out step-by-step fusion on the output of the time interaction module in the last feature refinement sub-module and the output of the AdaIN layer in each feature refinement sub-module, and finally mapping to obtain a high-resolution image of a predicted date;

in the countermeasure training framework, the discriminator is composed of a convolutional neural network and is used for judging the authenticity of the high-resolution image and the true high-resolution image generated by the generator through the convolutional neural network, so that the countermeasure training is realized.

Preferably, in the remote sensing image self-adaptive network model, the feature refinement module comprises four feature refinement sub-modules which are sequentially cascaded, the feature fusion module comprises four feature fusion sub-modules which are sequentially cascaded, the four feature refinement sub-modules and the four feature fusion sub-modules are symmetrically arranged in a one-to-one correspondence manner, and the output of each feature refinement sub-module of the feature refinement module is transmitted to the feature fusion sub-module at the symmetrical position as input;

the module of the feature refinement module inputs a high-resolution image F1 of a reference date t1, a low-resolution image C1 of the reference date t1 and a low-resolution image C2 of a predicted date t 2; for any ith feature refinement sub-module, its input is the module output of the front-end cascadeAnd->Wherein->And->Original module inputs C1, C2, and F1 representing feature refinement modules; in the ith feature refinement sub-module, the input +.>And->Firstly, extracting features by a feature extractor and respectively obtaining corresponding feature graphs> And->Then willFeature map->And->Inputting into a Time Interaction Module (TIM) and generating a corresponding feature map +.>And->And then->And->As input to the first AdaIN layer, a feature map is thus obtained>Will->And->As input to the second AdaIN layer to obtain a feature map +.>Finally, feature map->And->Is a feature map->And->Is a feature map->Feature map which is finally output as the ith feature refinement submodule>And->

In the feature fusion module, a first feature fusion sub-module is used for inputting a feature graphFeature mapFeature map difference +_>Splicing, wherein the splicing result is taken as an intermediate feature map after upsampling by an upsampling layerInputting to the next layer; then the second feature fusion submodule is used for inputting a feature map +.>Feature map->Intermediate feature map->Splicing, wherein the splicing result is taken as an intermediate characteristic diagram after upsampling by an upsampling layer>Inputting to the next layer; the third feature fusion submodule is used for inputting a feature map ++>Feature map->Intermediate feature map->Splicing, wherein the splicing result is taken as an intermediate characteristic diagram after upsampling by an upsampling layer>Inputting to the next layer; finally, the fourth feature fusion submodule is used for inputting a feature map->Feature map->Intermediate feature map->Splicing, wherein the splicing result is mapped to a high-resolution image F finally output by a feature fusion module through 3X 3 convolution ₂ ^′ 。

Preferably, in the Time Interaction Module (TIM), for the input feature mapAnd->First will->Minus->To obtain rough information D about the surface variations, and then to map the features +.>And->Respectively spliced with D to generate corresponding characteristic diagramsAnd->Then add to the feature map>And->The corresponding weight map ++is obtained by a 1X 1 convolution operation and a sigmoid function calculation>And->Finally will->Weighted multiplication to->Upper output feature map->Will->Weighted multiplication to->Upper output feature map->

Preferably, in the discriminator, the high-resolution image and the real high-resolution image generated by the generator are respectively transmitted into a 4×4 convolution layer and a LeakyReLU activation function, then pass through four convolution blocks, and finally output a value of 0 or 1 through a 4×4 convolution layer and a sigmoid activation function; wherein each convolution block is composed of a 4 x 4 convolution layer, batch normalization, and a LeakyReLU activation function in sequence.

Preferably, the upsampling layer is a PixelShuffle layer.

Preferably, in the feature refinement module, feature extractors of the four feature refinement sub-modules are four blocks of ResNet-50 respectively.

Preferably, the high resolution image uses a Landsat remote sensing image, and the low resolution image uses a MODIS remote sensing image.

Preferably, the reference date preferably adopts a date closest to the predicted date and having both the high resolution image and the low resolution image.

Compared with the prior art, the invention has the following beneficial effects:

the invention introduces a remote sensing image space-time self-adaptive network based on space-time fusion, and considers huge resolution difference and time characteristic change on the space-time fusion problem. In the aspect of processing the problem of huge resolution difference, adaIN is introduced to carry out texture migration, so that a low-resolution image is provided with high-resolution details. And on the basis of processing the time characteristic change difference, a Time Interaction Module (TIM) crossing a time gating mechanism is introduced, so that a characteristic diagram emphasizing the time information change is obtained. These treatments achieve better results.

Drawings

FIG. 1 is an overall block diagram of an countermeasure training framework in an embodiment of the invention;

FIG. 2 is a block diagram of a feature refinement module;

FIG. 3 is a block diagram of a TIM module;

FIG. 4 is a block diagram of a feature fusion module;

FIG. 5 is a flow chart of model training in an embodiment of the present invention;

FIG. 6 is a graph showing a portion of the test results in an embodiment of the present invention.

Detailed Description

In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, whereby the invention is not limited to the specific embodiments disclosed below. The technical features of the embodiments of the invention can be combined correspondingly on the premise of no mutual conflict.

The space-time fusion problem is to break through the limitation of the technology and acquire the remote sensing image with high time and high spatial resolution. Its inputs are a coarse and fine image of the reference date and a coarse image of the predicted date (i.e., a low resolution image) to generate a fine image of the predicted date (i.e., a high resolution image). There is a large difference in resolution between the coarse and fine images, and a large difference in time between the images at different times, which are two key points that need to be considered for the spatio-temporal fusion problem. The space-time fusion task generally adopts the connection of the features of the coarse image and the fine image to alleviate the huge resolution problem, but the texture details cannot be well extracted, and a great deal of local details are lost. In terms of processing time variation differences, the difference is generally carried out on coarse images at different times to capture information of the earth surface coverage over time, and the obtained information is shallow.

In the invention, a remote sensing image self-adaptive network model (Spatiotemporal Adaptive Network, STANet) based on space-time fusion is provided aiming at key points of time feature change and resolution difference in the space-time fusion problem, the enhancement of a time level is focused on features, and a model structure is improved by utilizing feature enhancement and texture migration. The core of the invention is that a space-time interaction module, namely a TIM module, based on a cross-time gating mechanism is provided in a STANet model, and an AdaIN layer is introduced to migrate textures. It should be noted that, the TIM module can be used as an embedded module to be combined with the feature refinement part in any space-time fusion model to achieve plug and play, and only two feature graphs of coarse images need to be input in the model feature refinement process, and the feature refined feature graphs are further output through the TIM module and transmitted to the next stage. In the next stage, two coarse image feature images subjected to feature refinement and a fine image feature image extracted before are required to be transmitted into an AdaIN layer, and the spatial features of the fine image are added on the coarse image, so that space-time fusion is realized based on an adaptive network. The key improvements of the invention include the following two aspects

(1) In consideration of huge time feature changes, a Time Interaction Module (TIM) is designed in the model to learn time feature difference information. The TIM adopts a cross-time gating mechanism, and finally a characteristic diagram emphasizing the time information change is obtained.

(2) Taking into account the huge resolution variation, adaIN learning global space errors are introduced into the model, and the backbone of the generated network is constructed into a U shape. And introducing the feature images of the low-resolution and high-resolution images into AdaIN for texture migration, so that the low-resolution images have high-resolution details.

In a preferred embodiment of the present invention, based on the above-mentioned stant model, a remote sensing image space-time fusion method based on an adaptive neural network is provided, which comprises the following specific steps: aiming at two remote sensing images with different resolutions to be fused, a group of high-resolution images and low-resolution images corresponding to a reference date and a low-resolution image of a predicted date are used as three images to be fused in a time-space mode, the three images are input into a remote sensing image self-adaptive network model with U-shaped structural characteristics, and the high-resolution images of the predicted date are generated by the remote sensing image self-adaptive network model STANet.

It should be noted that, the specific image types of the high-resolution image and the low-resolution image are not limited, and in the embodiment of the invention, the high-resolution image may be a Landsat remote sensing image, and the low-resolution image may be a MODIS remote sensing image. In addition, the reference date preferably adopts a date closest to the predicted date and having both the high resolution image and the low resolution image. Therefore, detail supplementation can be carried out on the low-resolution MODIS remote sensing image of the predicted date by utilizing the Landsat and MODIS remote sensing image information of the reference date, a high-resolution remote sensing image is generated, and the problem that the distance between Landsat images is too long is solved.

The remote sensing image adaptive network model is used as a generator in advance, and is trained together with the discriminator through an countermeasure training framework, and fig. 1 is an overall structure diagram of the countermeasure training framework. It comprises a generator and an authenticator, which is a Generation Antagonism Network (GAN). The GAN has good effects in the fields of image generation, style transfer, super-resolution reconstruction and the like. In space-time fusion, the GAN method is also applied and exhibits good performance. The generator of the countertraining framework is the STANet model. After the generator generates the predicted high-resolution remote sensing image, the predicted high-resolution remote sensing image and the real high-resolution remote sensing image are transmitted into the discriminator for discrimination, and training network parameters after the learning loss resistance is calculated, so that a training process is formed. Through countermeasure training, the STANet model can generate a high-resolution remote sensing image which is difficult for a discriminator to discriminate authenticity.

The overall structure of the STANet model of the invention is described in detail below, and comprises two modules: the device comprises a Feature Refinement Module (FRM) and a Feature Fusion Module (FFM), wherein the output of each sub-module in the feature refinement module is transmitted to the sub-module of the feature fusion module at a symmetrical position, so that a U-shaped network structure is formed, each sub-module in the feature refinement module performs feature refinement and transmission through a Time Interaction Module (TIM), and an AdaIN layer is used for migrating the spatial textures of a high-resolution reference image; the feature fusion module is used for carrying out step-by-step fusion on the output of the time interaction module in the last feature refinement sub-module and the output of the AdaIN layer in each feature refinement sub-module, and finally mapping to obtain a high-resolution image of the predicted date.

With continued reference to fig. 1, in the embodiment of the present invention, the feature refinement module and the feature fusion module are each divided into four sub-modules (blocks), so that the feature refinement module is composed of four feature refinement sub-modules that are sequentially cascaded, the feature fusion module is composed of four feature fusion sub-modules that are sequentially cascaded, and the four feature refinement sub-modules and the four feature fusion sub-modules are symmetrically arranged in a one-to-one correspondence manner, and the output of each feature refinement sub-module of the feature refinement module is transferred to the feature fusion sub-module at a symmetrical position as input.

Specifically, in embodiments of the present invention, the feature refinement module may employ ResNet-50 as a backbone network build feature extractor to extract multi-level features and construct the backbone of the resulting network into a U-shape. The backbone network is divided into four stages (stages), and the result obtained after each Stage is used as a multi-Stage feature, and the feature map is extracted from four feature refinement sub-modules in the module in a local-global mode, wherein the feature map contains space-time change information and ground details required for generating a fine image. The feature refinement module emphasizes temporal feature variations between the coarse images via a Temporal Interaction Module (TIM) after the stage features are obtained. The feature refinement module transmits the features obtained by the time interaction module and the phase features of the obtained fine image F1 into AdaIN to better acquire the system features and the global features in different images, and the texture distribution overall features of the fine image are reserved. After the feature refinement module obtains the feature of the emphasized texture information, the feature refinement module is spliced with the previous feature and then enters the next cycle. The feature fusion module and the previous feature refinement module form a U-shaped structure, the difference between the results C1 and C2 from the last stage AdaIN is used as input, the results C1 and C2 from the previous corresponding stage AdaIN are spliced, the PixelSuffle operation is carried out, and the mapping operation is adopted to map the number of channels of the picture to four channels in the last stage, so that an output image is obtained. The specific implementation of the feature refinement module and the feature fusion module is described in detail below.

As shown in fig. 2, a specific data processing flow in the feature refinement sub-module is illustrated. Module input of the feature refinement module is reference date t1A high resolution image F1, a low resolution image C1 of the reference date t1, and a low resolution image C2 of the prediction date t 2. For any i-th feature refinement sub-module, i=1, 2,3,4, its input is the module output of the front-end cascadeAnd->Wherein->And->The raw module inputs C1, C2, and F1 representing the feature refinement module. In the ith feature refinement sub-module, the input +.>And->Firstly, extracting features by a feature extractor and respectively obtaining corresponding feature graphs> And->In the embodiment of the invention, the feature refinement module adopts ResNet-50 to construct a feature extractor to extract multi-level features, so that the feature extractor of the four feature refinement sub-modules in the feature refinement module is respectively four blocks (blocks) of ResNet-50, namely corresponds to four stages (stages). In the ith feature refinement submodule, a feature map is extracted>And->After that, the feature map->And->Inputting into a Time Interaction Module (TIM) and generating a corresponding feature map +.>And->And then->And->As input to the first AdaIN layer, a feature map is thus obtained>Will->And->As input to the second AdaIN layer to obtain a feature map +.>Finally, in the ith feature refinement sub-module, the feature map is required to be +.>And->Performing concat to obtain splicing result, and fitting the feature map +.>And->And performing concat to obtain a splicing result. The ith feature refinement submodule is used for solving the feature map +.>And->Is a feature map->And->Is a feature map->Feature map which is finally output as the ith feature refinement submodule>And->These three inputs are then input into the next feature refinement sub-module. In addition, the ith feature refinement submodule also needs to output the feature graphs output by the two AdaIN layers to the ith feature fusion submodule +.>And->

In the feature refinement sub-module, the feature mapAnd->Information about the change in the surface between t1 and t2 can be emphasized, whereas +.>The texture and spectral distribution characteristics of (a) can be adapted after conversion using an AdaIN layer>And->Is->Details of feature space at the current block, thereby better capturing global features in different images.

It should be noted that the operations specifically performed in the AdaIN layer belong to the prior art, and are not described herein. The AdaIN layer may capture system features and global features in different images, whose formula is as follows:

μ (y) in the formula represents the mean value of the style graph, σ (y) represents the variance of the style graph, and given the feature x of the content graph and the feature y of the style graph, adaIN can migrate the style of y onto x, thereby achieving texture migration.

In addition, the data processing flow of the TIM module in the feature refinement sub-module is shown in fig. 3. A cross-time gating mechanism is employed in the TIM module that uses weights to emphasize the change in each pixel, which captures the corresponding information in more detail. The input of the TIM module is a feature mapAnd->Can first of all +.>Minus->To obtain rough information D about the surface variations and to map the features +.>And->Respectively spliced with D to generate corresponding characteristic diagram +.>And->Then add to the feature map>And->The corresponding weight map ++is obtained by a 1X 1 convolution operation and a sigmoid function calculation>And->For indicating the degree of variation of each part of the feature map. Theoretically, where the change is more, a higher weight can be obtained, so that the weight map is multiplied by the original feature to obtain a feature map emphasizing the change of information, i.e. & lt & gt>Weighted multiplication to->Upper output feature map->Will->Weighted multiplication to->Upper output feature map->Whereby the final output +.>The two output profiles emphasize the profile of the information change.

Conventional spatio-temporal fusion methods typically acquire spatio-temporal variation information based on differences between the feature map and the coarse image. However, such simple operations may not be effective to highlight changes over a period of time. The present invention therefore contemplates a Time Interaction Module (TIM) that relies on a cross-time gating mechanism that uses weights to emphasize the changes of each pixel and can capture corresponding information with more detail.

As shown in fig. 4, the multi-level feature fusion module includes four steps to reconstruct a fine image from the refined feature map. In the step 1 of the process, the process is carried out,and->And->Phase splice (concat). This post-concat feature map is upsampled by the PixelShuffle layer and input to the next layer. In each of the following steps j, the output of the previous step is equal to +.>And->It should be noted that the intermediate feature maps obtained by concat in step 2 and step 3 are up-sampled as in step 1, and the intermediate feature map obtained by concat in step 4 is mapped to the output fine image F by 3×3 convolution without up-sampling ₂ ^′ . The four steps are realized by four feature fusion sub-modules in the feature fusion module, specifically: first, the first feature fusion submodule is used for inputting a feature map +.>Feature map->Feature map difference +_>Performing concat splicing, and upsampling the splicing result by an upsampling layer to obtain an intermediate feature map +.>Inputting to the next layer; then the second feature fusion submodule is used for inputting a feature map +.>Feature map->Intermediate feature map->SplicingThe splicing result is up-sampled by the up-sampling layer and then used as an intermediate feature diagram +.>Inputting to the next layer; the third feature fusion submodule is used for inputting a feature map ++>Feature map->Intermediate feature map->Splicing, wherein the splicing result is taken as an intermediate characteristic diagram after upsampling by an upsampling layer>Inputting to the next layer; finally, the fourth feature fusion submodule is used for inputting a feature map->Feature map->Intermediate feature mapSplicing, wherein the splicing result is mapped to a high-resolution image F finally output by a feature fusion module through 3X 3 convolution ₂ ^′ 。

In the embodiment of the invention, the upsampling layers adopt PixelSheffe layers. The PixelShuffle layer converts a low resolution input image into a high resolution image by Sub-pixel operation. The PixelShuffle operation is an upsampling operation, and the implementation process does not directly generate the high-resolution image by means of interpolation or the like, but obtains a feature map of a square channel of an upsampling factor through convolution, and then obtains the high-resolution image through a period screening method, so that the feature size is doubled, and compared with an interpolation or deconvolution method, the method is more effective.

In the embodiment of the invention, the discriminator in the countermeasure training framework is composed of a convolutional neural network and is used for carrying out true and false discrimination on the high-resolution image and the real high-resolution image generated by the generator through the convolutional neural network so as to realize countermeasure training. Specifically, in the discriminator, the high-resolution image and the real high-resolution image generated by the generator are respectively transmitted into the discriminator, a 4×4 convolution layer and a LeakyReLU activation function are firstly passed through the discriminator, then four convolution blocks are passed through, and finally a value of 0 or 1 is output through a 4×4 convolution layer and a sigmoid activation function; wherein each convolution block is composed of a 4 x 4 convolution layer, batch normalization, and a LeakyReLU activation function in sequence. 0 indicates that the image is a predicted image, and 1 indicates that the image is a real image. That is, the output result of the discriminator is 0 representing a false matrix, and 1 representing a true matrix, all of which are expected when the input thereof is a true high-resolution image, and all of which are expected when the input thereof is a high-resolution image generated by the generator. It should be noted that the high resolution image generated by the generator and the true high resolution image are each input to the discriminator for true and false discrimination, not simultaneously.

The remote sensing image space-time fusion method based on the adaptive neural network is applied to a specific embodiment to show the technical effect achieved by the method.

Examples

The remote sensing image space-time fusion method based on the adaptive neural network in this embodiment is specifically described above, and is not described in detail, wherein the overall flow of the stant model countermeasure training process can be divided into three stages, namely data collection, model training and image generation, as shown in fig. 5.

1. Data preprocessing stage

The data set used in this example was evaluated on CIA (the Coleambally Irrigation Area) and LGC (the Lower Gwydir Catchments (LGC)) open source data sets. These two areas represent the climatic and earth coverage variations, respectively. The CIA dataset contains 17 pairs of cloudless MODIS-Landsat image pairs captured during 2001 and 2002; the LGC dataset contained 14 cloud-free image pairs captured during 2004 and 2005. The image sizes vary from 1280 x 1792 to 3072 x 2056, focusing only on the first four channels (i.e., B, G, R and NIR) of all images. The Landsat remote sensing image is used as a high-resolution image, and the MODIS remote sensing image is used as a low-resolution image. The spatial resolution of the MODIS remote sensing image is 250-1000 m, the time period is relatively dense, and the interval is 1 day. While the space resolution of the Landsat remote sensing image is 30 meters, but the time period is 16 days. Therefore, the Landsat remote sensing image with the missing time needs to be complemented.

2. Model training

Step 1, constructing a training data set, and batching the training data set according to a fixed batch size. Two pairs of data that are temporally adjacent on the one data set are combined into one set of data. The last four sets of data in each set of data are used as the validation set and the rest are used as the training set.

Step 2, training the GAN challenge training framework shown in fig. 1 by using training samples of each batch. The specific structure of the STANet as a generator is as described above, and is not repeated, and a convolutional encoder structure is adopted, so that a ResNet50 pretrained network can be used as a backbone, and each layer of characteristics can be saved. In the training process, the generator and the discriminator are needed to be used for adjusting network parameters in the whole model through the discriminator until all batches of the training data set participate in model training. After the appointed iteration times are reached, the model converges, and training is completed.

3. Image generation

The image of the test set is directly used as input and passed through a trained generator, namely a space-time self-adaptive model STANet, and finally a high-resolution image under a predicted date is generated, so that space-time fusion is realized.

In this embodiment, partial test results of the stant and other comparison methods are shown in fig. 6, from which it can be seen that the space-time adaptive model stant can well generate a predicted image, and in terms of visual effect, EDCSTFN and GAN-STFM are not accurate enough to predict the change of the climate, and a large number of black areas exist, which is difficult to generate with high quality. Furthermore, MS-Fusion suffers from deficiencies in the processing of spectral information, such as an insufficient degree of predicted climate change, compared to our stant, which is manifested in differences in color. In space detail generation, swimSTFM performs poorly in the generation of water parts.

The above embodiment is only a preferred embodiment of the present invention, but it is not intended to limit the present invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, all the technical schemes obtained by adopting the equivalent substitution or equivalent transformation are within the protection scope of the invention.

Claims

1. A remote sensing image space-time fusion method based on a self-adaptive neural network is characterized by comprising the following steps of: inputting a group of high-resolution images, low-resolution images and low-resolution images corresponding to a reference date into a remote sensing image self-adaptive network model with U-shaped structural characteristics aiming at two remote sensing images with different resolutions to be fused, and generating the high-resolution images with the predicted dates by the remote sensing image self-adaptive network model;

2. The remote sensing image space-time fusion method based on the adaptive neural network as claimed in claim 1, wherein in the remote sensing image adaptive network model, the feature refinement module comprises four feature refinement sub-modules which are sequentially cascaded, the feature fusion module comprises four feature fusion sub-modules which are sequentially cascaded, the four feature refinement sub-modules and the four feature fusion sub-modules are symmetrically arranged in a one-to-one correspondence manner, and the output of each feature refinement sub-module of the feature refinement module is transmitted into the feature fusion sub-modules at symmetrical positions as input;

the module of the feature refinement module inputs a high-resolution image F1 of a reference date t1, a low-resolution image C1 of the reference date t1 and a low-resolution image C2 of a predicted date t 2; for any ith feature refinement sub-module, its input is the module output of the front-end cascadeAnd->Wherein->And->Original module inputs C1, C2, and F1 representing feature refinement modules; in the ith feature refinement sub-module, the input +.>And->Firstly, extracting features by a feature extractor and respectively obtaining corresponding feature graphs> And->The feature map is then->And->Inputting into a Time Interaction Module (TIM) and generating a corresponding feature map +.>And->And then->And->As input to the first AdaIN layer to obtain a feature mapWill->And->As input to the second AdaIN layer to obtain a feature map +.>Finally, feature map->And->Is a feature map->And->Is a feature map->Respectively as the final output feature graphs of the ith feature refinement submoduleAnd->

In the feature fusion module, a first feature fusion sub-module is used for inputting a feature graphFeature map->Feature map difference +_>Splicing, wherein the splicing result is taken as an intermediate characteristic diagram after upsampling by an upsampling layer>Inputting to the next layer; then the second feature fusion submodule is used for inputting a feature map +.>Feature map->Intermediate feature mapSplicing, wherein the splicing result is taken as an intermediate characteristic diagram after upsampling by an upsampling layer>Inputting to the next layer; the third feature fusion submodule is used for inputting a feature map ++>Feature map->Intermediate feature map->Splicing, wherein the splicing result is taken as an intermediate characteristic diagram after upsampling by an upsampling layer>Inputting to the next layer; finally, the fourth feature fusion submodule is used for inputting a feature map->Feature map->Intermediate feature map->Splicing, wherein the splicing result is mapped to a high-resolution image F 'finally output by the feature fusion module through 3X 3 convolution' ₂ 。

3. The method of claim 1, wherein, in the Time Interaction Module (TIM), for the input feature map, the remote sensing image space-time fusion method based on the adaptive neural networkAnd->First will->Minus->To obtain rough information D about the surface variations, and then to map the features +.>And->Respectively spliced with D to generate corresponding characteristic diagram +.>And->Then add to the feature map>And->The corresponding weight map ++is obtained by a 1X 1 convolution operation and a sigmoid function calculation>And->Finally will->Weighted multiplication to->Upper output feature map->Will->Weighted multiplication to->Upper output feature map->

4. The remote sensing image space-time fusion method based on the adaptive neural network as claimed in claim 1, wherein in the discriminator, the high resolution image and the real high resolution image generated by the generator are respectively transmitted into a 4 x 4 convolution layer and a LeakyReLU activation function, then pass through four convolution blocks, and finally output a value of 0 or 1 through a 4 x 4 convolution layer and a sigmoid activation function; wherein each convolution block is composed of a 4 x 4 convolution layer, batch normalization, and a LeakyReLU activation function in sequence.

5. The method for space-time fusion of remote sensing images based on an adaptive neural network according to claim 1, wherein the upsampling layer is a PixelShuffle layer.

6. The remote sensing image space-time fusion method based on the adaptive neural network according to claim 1, wherein in the feature refinement module, feature extractors of four feature refinement sub-modules are four blocks of ResNet-50 respectively.

7. The method for space-time fusion of remote sensing images based on an adaptive neural network according to claim 1, wherein the high-resolution image is a Landsat remote sensing image, and the low-resolution image is a MODIS remote sensing image.

8. The method of claim 1, wherein the reference date is preferably a date closest to the predicted date and having both the high resolution image and the low resolution image.