CN116934583A

CN116934583A - Remote sensing image super-resolution algorithm based on depth feature fusion network

Info

Publication number: CN116934583A
Application number: CN202210337041.4A
Authority: CN
Inventors: 何小海; 翟国伟; 王正勇; 任超; 刘屹霄; 熊淑华; 陈洪刚
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2023-10-24

Abstract

The invention discloses a remote sensing image super-resolution algorithm based on a depth feature fusion network, which comprises the following implementation processes: constructing a two-way feature extraction and fusion network to fully utilize feature information of different depths for reconstruction; the method comprises the steps of constructing a channel multi-scale attention sub-module and a wide-activation residual sub-module, combining the two sub-modules to form a wide-activation multi-scale attention main module, synchronously extracting features of two branches by using the main module, and guiding the feature extraction by effectively utilizing multi-scale information without bringing extra parameters; and finally, respectively reconstructing the two paths of features extracted in the feature extraction stage through a deconvolution layer and a pixel recombination module, and fusing the reconstructed features through a fusion module. Compared with the prior art, the method provided by the invention ensures low parameter quantity and better subjective and objective effects, and has wide application fields in satellite detection, intelligent monitoring and automatic driving fields.

Description

Remote sensing image super-resolution algorithm based on depth feature fusion network

Technical Field

The invention relates to an image quality improvement technology, in particular to a remote sensing image super-resolution algorithm based on a depth feature fusion network, and belongs to the field of image processing.

Background

In a broad sense, super-resolution techniques improve the spatial resolution of images while refining detailed features not represented in low resolution images (LR). However, single Image Super Resolution (SISR) is an inherent inappropriateness problem because more high resolution image (HR) pixels need to be estimated at limited known LR pixels. This problem is exacerbated especially where the scale factor is large.

For the remote sensing field, the problem of low resolution of the remote sensing image is common due to the limitation of imaging equipment. In addition, various noise interferences are generated during the transmission process, such as: imaging visual angles, motion blur, stripe noise, atmospheric haze blur and the like, which further influence the quality of the remote sensing image. To alleviate this inherent contradiction in the remote sensing imaging process, upgrading hardware is the most straightforward and efficient method, but the additional significant cost is another problem to be solved. Therefore, there is an urgent need to develop a practical, inexpensive way to suppress this problem by means of software. In recent years, an image super-resolution algorithm based on deep learning shows excellent performance, however, the depth of a convolutional neural network is proportional to the performance of the network, particularly in residual learning, in order to extract more abundant features so as to reconstruct an HR image with higher quality, a general super-resolution reconstruction network usually adopts deeper network depth, which can certainly increase the parameter quantity of the network, and further cause the problems of heavy calculation resource burden, difficult landing and the like; in addition, compared with a natural image, the self-similarity of the remote sensing image in the interior is stronger under different scales, so that the design of a feature extraction network is fully considered and a lightweight network structure is ensured as much as possible when the super-resolution reconstruction task of the remote sensing image is processed.

Disclosure of Invention

The invention provides a remote sensing image super-resolution algorithm based on a depth feature fusion network, which extracts and fuses features at two branches of a deep network and a shallow network and finally outputs a reconstructed image. The multi-scale information of the remote sensing image is effectively utilized, and meanwhile, the light weight of the algorithm is ensured.

The invention realizes the above purpose through the following technical scheme:

step one: constructing a two-way feature extraction and fusion network: specifically, a Deep Net (D-Net) and a Shallow Net (S-Net) are respectively constructed, and feature extraction and fusion are respectively carried out on the input low-resolution images;

step two: feature pretreatment stage: specifically, the input low-resolution image is subjected to characteristic preprocessing through a convolution layer on each branch;

step three: build channel Multi-Scale attention submodule Multi-Scale-Attention (MSA) and wide activation residual submodule Wide Activation Residual Block (WRB);

step four, a step four is carried out; combininganMSAmodulewithaWRBmoduletoformaWide-activationMulti-Scale-Attentionmainmodule(W-M-A);

step five: feature extraction: specifically,thepreprocessingresultsobtainedbythetwopathsareusedastheinputofafeatureextractionmodule,whereinadeepnetworkD-Netpassesthrougha16-layerW-M-Amodule,anddeephigh-frequencyfeaturesofanimageareextractedwhilemulti-scaleinformationiseffectivelyutilized; theshallowlayernetworkS-Netextractsshallowlayerlowfrequencycharacteristicsoftheimageundertheguidanceofmulti-scaleattentionthrougha1layerW-M-Amodule;

step six: reconstruction and fusion stage: specifically, two paths of features extracted in the feature extraction stage are subjected to feature reconstruction through a deconvolution layer and a pixel recombination module respectively, and feature fusion after reconstruction is performed through a fusion module.

Drawings

Fig. 1 is a frame diagram of a remote sensing image super-resolution algorithm based on a depth feature fusion network.

FIG. 2 is a block diagram of a multi-scale attention module of the present invention.

Fig. 3 is a block diagram of a wide activation residual module according to the present invention.

FIG. 4 is a graph of performance versus parameter for the WHU-RS19 dataset for the present invention versus the nine methods at 4 times the super resolution reconstruction factor.

FIG. 5 is a graph comparing subjective visual effects of the invention with nine methods for a test image "FootballField_07" at 3 times the super resolution reconstruction factor: wherein (a) is an original image, and (b), (c), (d), (e), (f), (g), (h), (i), (j) and (k) are subjective visual effect contrast pictures after the super-resolution reconstruction of the invention respectively.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

fig. 1 shows the overall network structure of the present invention. Wherein Pixelshuffle represents a pixel reorganization module, and Convolition and Deconvolution represent a Convolution layer and a Deconvolution layer, respectively. The total system structure consists of a shallow layer feature extraction network, a deep layer feature extraction network and a fusion layer. Meanwhile, the method is divided into a feature extraction stage and a reconstruction stage in the reconstruction process. theW-M-amoduleisawide-scale-attention-basicmodule(W-M-a)proposedbythepresentinvention,fig.2showsamulti-scaleattentionsub-module(msa),andfig.3showsawide-scale-residualsub-module(wideactivationresidualblock,wrb).

inthetwo-waylearningprocess,theinventiondesignsawide-activationmulti-scaleattentionmoduleW-M-AtolearnthenonlinearrelationbetweenLRandHRimages. theproposedW-M-Astructureconsistsoftwoparts: a multiscale attention sub-Module (MSA) based on point-wise convolution, a wide-activation residual module (WRB). It should be noted that the MSA module and the WRB are not in a simple parallel relationship, as shown in fig. 1, a convolution layer of 1×1 is connected before the MSA, this path receives the network input of the previous module, and also receives the network inputs of all the previous modules, while the WRB branch parallel to the MSA module only receives the input from one layer on the network, and the last two branches are connected to a conventional hop respectively and then perform final feature fusion to realize output. By the method, the characteristic multiplexing is realized, and meanwhile, excessive calculation amount is not introduced; meanwhile, aiming at the characteristic of a wide-activation residual block, the mapping after residual is introduced is more sensitive to the change of output, and the small-amplitude change of the output characteristics can well update the weight parameters of the network, so that the effect of the residual network is better.

It is noted that the first layer of the dual-branch of fig. 1 is a 3 x3 convolution layer, which is not suitable for super-resolution tasks using convolution kernels with a large receptive field, since during downsampling, each pixel in the image corresponds to a small area in the original image, and during training, large receptive fields tend to introduce irrelevant information.

inthedeepfeatureextractionnetwork,theinventionisconnectedwith16W-M-Amodules,theshallownetworkonlyadoptsoneW-M-Amodule,theup-samplingofthefeaturesisrealizedatthetailendofeachbranchthroughadeconvolutionandpixelrecombinationmodule,andfinally,twopathsoffeaturesarefusedtooutputareconstructedhigh-resolutionimage.

thenetworkframeworkprovidedbytheinventionacquiresallavailablecharacteristicsthroughtheW-M-Amodule. For a given featureIt is not suitable to directly fuse the features of the current block because the features of different blocks are in different feature spaces. Thus, it is desirable to project these features into a common space that is suitable for fusion to prevent the features of different spaces from causing an imbalance in the attention weights. inW-M-A,1X1convolution+.>As such a projection unit. ithW-M-Ablock->The projection characteristics of (2) are obtained by the following formula (3-3):

wherein the method comprises the steps ofConnection->But in fusing the properties of the current layer +.>With different importance. Thus, the present invention by adding multiscale channel attention->To study channel weights at different scales. At this time, new feature map ∈ ->

Consists of an average pooling layer, a 1 x 1 convolution layer, a ReLU layer, and a sigmoid activation function layer. />Representing channel multiplication.

theoutputoftheithW-M-Amoduleisexpressedas:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the characteristics of the current layer->And->Is a feature factor corresponding to different features. These feature factors will automatically learn when training the model. The invention here uses additive operations, since it can better handle some of the features +.>In the case of 0, the simultaneous addition operation can reduce the number of parameters because the number of channels is not expanded. If the channels are directly connected, some invalid channels will occur, which will increase the network redundancy.

The invention uses UC Merced dataset as training set, which contains 21 kinds of remote sensing images. Each class includes 100 remote sensing images. Each image has a size of 256×256 pixels (0.3 m pixels). The WHU-RS19 dataset was used as a test set, which included 19 scene classes, each of which had approximately 50 images of 600 x 600 pixels (0.5 m pixels) in size, for a total of 1005 images. The Y component is experimentally tested and evaluated in the transformed YCbCr space. Where Y is the luminance component, cb is the blue chrominance component, and Cr is the red chrominance component). The Cb and Cr channels are simply enlarged with bicubic interpolation. The SR results of the present invention are evaluated on the Y channel of the image because human vision is more sensitive to the luminance channel.

The number of input, internal and output channels of the wide-active residual block in the wide-active multi-scale attention module of the present invention is set to 32, 128, 32, respectively. All experiments used Pytorch as the training framework and GTX3080 GPU as the accelerator. In the training process, the invention uses horizontal overturn, rotation and random rotation by 90 degrees, 180 degrees and 270 degrees for data enhancement. And optimizing the gradient by an ADAM optimizer so as to promote the updating of the network.

In order to verify the effectiveness of the method, nine typical image super-resolution algorithms are selected as comparison methods, and codes or models of the super-resolution algorithms are provided by original operators and default parameter settings are used. The nine contrast image super-resolution algorithms are:

method 1: the method proposed by Dong et al, reference "Image super-resolution using deep convolutional networks [ J ]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38 (2): 295-307 ]"

Method 2: the method proposed by Dong et al, reference "Accelerating the super-resolution convolutional neural network [ C ]. European Conference on Computer Vision,2016:391-407 ]"

Method 3: the procedure set forth in Lei et al, reference "Super-resolution for remote sensing images via local-global combined network [ J ]. IEEE Geoscience and Remote Sensing Letters,2017,14 (8): 1243-1247 ]"

Method 4: the method proposed by Kim et al, reference "Accurate image super-resolution using very deep convolutional networks [ C ]. IEEE Conference on Computer Vision and Pattern Recognition,2016:1646-1654 ]"

Method 5: the method proposed by Hui et al, reference "Fast and accurate single image super-resolution via information distillation network [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognment.2018:723-731 ]"

Method 6: the method proposed by Yang et al, reference "Super-resolution for remote sensing images via dual-domain network learning [ J ]. Journal of Electronic Imaging,2019,28 (6): 063231 ]"

Method 7 and method 8: the method proposed by Wang et al, (method 7 is a published medium parameter model, method 8 is a published high parameter model) reference "Lightweight single-image super-resolution network with attentive auxiliary feature learning [ C ]// Proceedings of the Asian conference on computer vision.2020 ]"

Method 9: the method proposed by Zhang et al, reference "Remote sensing image super-resolution via mixed high-order attention network [ J ]. IEEE Transactions on Geoscience and Remote Sensing,2020,59 (6): 5183-5196 ]"

In order to present a more fair comparison, the method 9 is repeated with the same network architecture and parameter settings in the original text, while keeping the parameter amounts at a lower level (specific parameters are r=1, cg=3, cb=3).

Figure 4 shows the parameter quantity-performance comparison results for each method on the WHU-RS19 dataset at 4-fold scale factor. As can be seen from the figure, method 1 shows a certain performance with a parameter quantity less than 100K, whereas method 2 shows a higher network performance than method 2 with an extreme parameter quantity of 57K, in comparison to methods 3 and 4 which do not show a better performance in terms of the balance of parameters and performance, methods 7 and 8 show a low parameter quantity and high performance by constructing a lightweight auxiliary feature attention module, method 9 shows a certain degradation of performance with a reduction of the basic module, zhang et al also demonstrate in the corresponding literature the impact of its higher order attention module on network performance by ablation experiments; in contrast, the method 6 sacrifices too many parameters, the best balance between the parameter and the performance is achieved, and the method is superior to the current most advanced remote sensing image super-resolution algorithm under the condition of the same parameter.

List one

Tables one and two show the PSNR and SSIM comparison of the reconstruction results of the different methods on the WHO-RS 19 dataset, the highest values being marked by bold, it can be seen that the invention achieves the best performance at different scaling factors.

Watch II

Claims

1. The remote sensing image super-resolution algorithm based on the depth feature fusion network is characterized by comprising the following steps of:

step three: build channel Multi-Scale attention submodule Multi-Scale-Attention (MSA) and wide activation residual submodule WideActivation Residual Block (WRB);

2. The depth feature fusion network-based remote sensing image super-resolution of claim 1The algorithm is characterized in that the network framework in the step one consists of two branches with different depths and functions, wherein D-Net effectively extracts high-frequency detail characteristics F of an image by utilizing the network depth _d The shallow network effectively extracts the low-frequency detail characteristic F by means of a convolution layer with shallower depth _s Simultaneously, two paths of features depend on multi-scale attention to obtain the feature f combining different semantic information _ms The outputs of the two branches after the feature extraction stage are respectively described as: f (F) _D (f _ms )，F _s (f _ms )。

3. The remote sensing image super-resolution algorithm based on depth feature fusion network as claimed in claim 1, wherein the network frame in the first step is composed of two stages of feature extraction and reconstruction, and is realized by two branches synchronously, and the final reconstruction result F _HR ＝F _D (f _ms )+F _s (f _ms )。

4. The remote sensing image super-resolution algorithm based on the depth feature fusion network according to claim 1, wherein the MSA module in the third step realizes multi-scale feature extraction in a point-by-point convolution mode, and compared with a multi-scale module based on common convolution, the point-by-point convolution calculation mode does not introduce extra parameters, so that the algorithm is a lightweight network structure design.

5. The remote sensing image super-resolution algorithm based on the depth feature fusion network according to claim 1, wherein the MSA module and the WRB module in the fourth step are not directly connected in parallel: before the MSA module, a 1X 1 convolution layer is firstly connected, the convolution layer receives the network input of the last module and the network input of all the previous modules, the parallel WRB branch only receives the input from one layer on the network, and the last two branches are respectively connected with a conventional jump connection and carry out final feature fusion to realize output; by the method, the characteristic multiplexing is realized, and meanwhile, excessive calculation amount is not introduced; meanwhile, aiming at the characteristic of a wide-activation residual block, the mapping after residual is introduced is more sensitive to the change of output, and the small-amplitude change of the output characteristics can well update the weight parameters of the network, so that the effect of the residual network is better.

6. The remote sensing image super-resolution algorithm based on the depth feature fusion network according to claim 1, wherein the feature extraction module in the fifth step is a lightweight network design; specifically,fortheMSAsub-moduleintheW-M-Amodule,theadoptedchannelattentionextractsthelocalfeatureL(X)throughpoint-by-pointconvolution:

L(X)＝β(PWConv ₂ (δ(β(PWConv ₁ (Z′))))) (1)

wherein PWCON ₁ 、PWConv ₂ Respectively representAnd->Convolution kernel of size, B represents BN layer, δ represents ReLU activation function, r represents channel dimension decay ratio; l (X) has the same shape as the input feature, and can preserve and highlight fine details in the underlying features; secondly,aWRBmoduleintheW-M-AmoduleexpandsthechannelnumberCbeforetheconvolutionlayerinputsReLu ₁ After the characteristics pass through the correction linear unit, the characteristics enter a convolution layer C ₂ At this time, convolutional layer C ₂ Input channel characteristics and C of (2) ₁ The number of the characteristic channels flowing in is the same, but the number of the output characteristic channels is reduced to the number of channels before expansion; the method has the advantages that the characteristic information is enabled to pass through the ReLU as much as possible while no extra parameter is brought, the nonlinearity of parameters is guaranteed, and meanwhile, the characteristic information acquired by a shallow network can effectively enter a deep network, so that deep characteristic extraction is carried out, and the network is promoted to learn deep semantic information; finally,theW-M-Amoduleadoptsadditiveoperationtofusetheextractedfeaturesofthetwosub-modules,andtheadditionoperationcanreducetheparameterquantityandreducethenetworkredundancy.