CN115660955A

CN115660955A - Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion

Info

Publication number: CN115660955A
Application number: CN202211287811.5A
Authority: CN
Inventors: 贾晓芬; 李方玗; 赵佰亭
Original assignee: Anhui University of Science and Technology
Current assignee: Anhui University of Science and Technology
Priority date: 2022-10-20
Filing date: 2022-10-20
Publication date: 2023-01-31

Abstract

The invention discloses a super-resolution reconstruction model, a method, equipment and a storage medium for efficient multi-attention feature fusion, wherein the reconstruction model comprises a feature extraction module and a reconstruction module, the feature extraction module gradually extracts deep feature information of an image by utilizing a 3 x 3 convolution and 8 Progressive Feature Fusion Blocks (PFFB), and weights the extracted feature information by combining an internal efficient multi-attention block (EMAB) to enable a network to pay more attention to high-frequency information; the reconstruction module consists of a multi-scale field block RFB _ x, a 3 × 3 convolution and a sub-pixel convolution layer, wherein the RFB _ x further enhances the characteristics extracted by the PFFB by using a multi-branch structure, and combines multi-scale characteristic information to improve the reconstruction performance of the model; and finally, superposing the double-cubic up-sampling result of the low-resolution LR image and the up-sampling result of the sub-pixel convolution layer to obtain a reconstructed image. The reconstructed image can recover more high-frequency information, and has abundant texture details which are closer to the original image.

Description

Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion

Technical Field

The invention belongs to the technical field of image reconstruction, and relates to a high-efficiency multi-attention feature fusion super-resolution reconstruction model, method, equipment and storage medium.

Background

The Single Image super-Resolution (SISR) technology is to restore a given Low-Resolution (LR) Image to a corresponding High-Resolution (HR) Image by using a specific algorithm, and aims to overcome or compensate the problems of Low imaging Image quality, insignificant region of interest, and the like caused by the limitation of an Image acquisition system or an acquisition environment. The traditional image super-resolution reconstruction algorithm mainly depends on basic digital image processing technology for reconstruction, the calculation is complex, and the original information of the image cannot be effectively recovered.

With the wide application of deep learning in image super-resolution reconstruction and gradually achieving better results, the method is receiving more and more attention of researchers. Dong, et al, first applied convolutional neural networks to super-resolution reconstruction techniques, proposed SRCNN generated a reconstructed Image in a pixel mapping manner using three convolutional layers, as detailed in "c.dong, c.c.lay, k.he, x.tang, image super-resolution using de-weighted conditional networks, in proc.ieee conf.com.vis.pattern recognitions (CVPR), 2014, pp.184-199"; dong et al propose FSRCNN based on SRCNN, which uses a smaller convolution layer inside and enlarges the size at the end of the network by means of a deconvolution layer, as detailed in "c.dong, c.c.loy, x.tang, accumulating the super-resolution volumetric network, computer Vision-ECCV works, 2016, pp.391-407"; the efficient subpixel convolutional neural network ESPCN proposed by Shi et al implements upsampling operations at the reconstruction module using the rearranged subpixel layer, as detailed in "W.Shi, J.Caballero, F.Huszar, et al, real-time single image and video super-resolution using an iterative sub-pixel convolutional network, in Proc.IEEE Conf computer Vis. Pattern Recognitit (CVPR), 2016, pp.1874-3". The super-resolution reconstruction method is light in weight, but the reconstruction effect is not as expected.

One has then begun to improve model performance by increasing network depth. Kim et al propose a 20-layer Deep-layer network VDSR that utilizes the idea of residual learning to accelerate the convergence rate of network training, as detailed in "j.kim, k.j.lee, m.k.lee, accurate Image Super-Resolution Using Very Deep computational Networks, in proc. IEEE Conf. Vis. Pattern Recognit (CVPR), 2016, pp.1646-1654"; ahn et al propose an architecture CARN for implementing a cascade mechanism on a Residual Network (ResNet), the middle part of which is based on ResNet, and the global and local use of the Network using the cascade mechanism to better integrate the characteristics of each layer of the Network, as detailed in "N.ahn, B.kang, A.K. Sohn, fast, accurate, and Lightweight Super-Resolution with a cascade resource Network, computer Vision-ECCV works, 2018, pp.252-268"; zhao et al, in "h, zhao, x.kong, j.he, et al, efficient image super-resolution using pixel alignment, computer Vision-ECCV works, 2020, pp.56-72"; the CFSRCNN proposed by Tian et al learns long and short path features by using a feature extraction module, and fuses the learned features by expanding information of a network shallow layer to a deep layer, which is described in detail in 'C, tian, Y.xu, W.Zuo, et al, coorse-to-fine CNN for image super-resolution, IEEE Transactions on Multimedia, vol.23,2021, pp.1489-1502'. According to the method, the performance of the model is improved by increasing the number of network layers, but the training difficulty is greatly increased due to the increased parameter quantity, and the reconstruction performance of some models is obviously reduced after the parameters are reduced. It is difficult to achieve a good balance of visible parameters and performance.

The inventor researches and discovers that the existing single image super-resolution reconstruction method based on deep learning has poor reconstruction effect and mainly has the following problems: (1) Part of the super-resolution reconstruction network based on the CNN is light in weight, but the performance is not as expected. (2) More and more models increase the network depth for pursuing the reconstruction effect, although the performance is improved, the time complexity of training is increased due to huge parameters, and meanwhile, the reconstruction effect can be influenced by the loss of the feature information in the deep network transmission process. How to fully utilize the feature information of the image, enable the limited features to realize better transmission and reuse, efficiently recover the edge contour and texture details of the image, and enable the model to be light-weighted as much as possible while ensuring the performance to become the problem which needs to be solved urgently at present.

Disclosure of Invention

In order to solve the problems, the invention provides an efficient multi-attention feature fusion image super-resolution reconstruction model which can improve the image quality, improve the observation effect and solve the problems of information loss and large parameter quantity caused by too deep network layers in the existing super-resolution reconstruction algorithm.

The invention has a second purpose of providing an efficient multi-attention feature fusion image super-resolution reconstruction method.

A third object of the present invention is to provide an electronic apparatus.

It is a fourth object of the present invention to provide a computer storage medium.

The technical scheme adopted by the invention is that the super-resolution reconstruction model for efficient multi-attention feature fusion comprises a feature extraction module and a reconstruction module;

the feature extraction module is divided into a shallow feature extraction module and a deep feature extraction module,

the shallow layer feature extraction module is a 3-by-3 convolution layer, and performs initial feature extraction on the input low-resolution LR image in a low-dimensional space, so that the calculated amount of the LR image is effectively reduced;

the deep layer feature extraction module comprises 8 Progressive Feature Fusion Blocks (PFFB), the PFFB adopts a progressive fusion connection mode to gradually extract deep layer feature information of the image so as to enhance feature transfer, and meanwhile, the PFFB combines an internal efficient multi-attention block (EMAB) to weight the extracted feature information so as to enable the network to pay more attention to high-frequency information;

the reconstruction module consists of a multi-scale reception field block RFB _ x, a 3 × 3 convolution and a sub-pixel convolution layer, the RFB _ x further enhances the characteristics extracted from the PFFB block by utilizing a multi-branch structure and fuses multi-scale characteristic information to improve the reconstruction performance of the model,

and then superposing the bicubic upsampling result of the low-resolution LR image and the upsampling result of the sub-pixel convolution layer to obtain a reconstructed image.

Further, the deep feature extraction module comprises 8 Progressive Feature Fusion Blocks (PFFB);

four efficient multi-attention blocks (EMAB) are adopted in the PFFB to progressively extract the deep information of the image layer by layer;

the PFFB realizes ' information interaction ' of convolution layer results in the EMAB block through multiple channel random mixing (C shuffle '), regroups output channels, and then mixes information of different channels, thereby solving the problem of unsmooth information circulation among convolution layers, and fully fusing the channels without increasing calculated amount;

the PFFB carries out C shuffle on each feature extracted by the EMAB block, then two adjacent features processed by the C shuffle are connected, and C shuffle is carried out again to improve the generalization capability of the network, 1 × 1 convolution is used for removing redundant information, and the generated result is subjected to feature fusion with the information operated by the next C shuffle; the operation is repeated among the EMAB blocks in the PFFB, so that local information is gradually collected and feature fusion is carried out, feature transmission is enhanced, and the accuracy of the reconstructed image is improved; finally, inputting the characteristic x by residual error learning _i The output characteristic x of the ith (i =0,1, \ 8230;, 7) PFFB block is obtained by superposition with the fused characteristics _i+1 LR image information is utilized to the maximum extent to relieve the loss of the features in the transmission process;

the PFFB strengthens feature extraction and fuses extracted multilayer information through a connection mode of 'progressive' feature fusion, and each layer can make full use of all features learned by the previous layers conveniently, so that the limited features can be better transmitted and reused.

Further, four efficient multi-attention blocks (EMABs) are adopted in the PFFB to extract features layer by layer;

the EMAB makes full use of the characteristic information of the channel and the space to gradually denoise the shallow characteristic of the image, so that the network focuses on the high-frequency details in the concerned image, and the enhancement of the texture detail information of the reconstructed image is facilitated;

the EMAB reduces the channel size by using a 1 × 1 convolution layer after two 3 × 3 convolution kernels, enlarges the sensing field by step convolution with the step size of 2 and further reduces the space dimension of the network by combining with a 2 × 2 maximum pooling layer; then further aggregating context information of a receptive field by using a cavity convolution layer, reducing the memory and simultaneously improving the network performance, performing up-sampling operation on the obtained features to restore the spatial dimension, and restoring the channel dimension by 1 × 1 convolution;

the EMAB uses an activation function Freuu to accelerate the convergence speed and prevent gradient explosion after 3 convolutional layers;

the EMAB adopts an efficient channel attention block to avoid the problem caused by dimensionality reduction, the channel attention is generated by rapid one-dimensional convolution, and the size of an internal convolution kernel is determined in a self-adaptive mode through nonlinear mapping of channel dimensionality;

the one-dimensional convolution can efficiently realize local cross-channel interaction, and the mutual communication between the local cross-channel interaction and the local cross-channel interaction is completed by capturing the information of the local cross-channel interaction, so that the effective channel attention is learned.

Further, the multi-scale field block RFB _ x in the reconstruction module is formed by combining 1 × 1, 3 × 3, 1 × 3, and 3 × 1 convolution kernels;

the RFB _ x is positioned behind the sequentially connected 8 PFFBs and is responsible for enhancing extracted deep-level features, multi-scale fusion features and reconstruction, the features with rich depth are reserved, and image details are recovered;

specifically, the output characteristic x of the 8 th PFFB block is set ₈ As the input of the RFB-x block, multi-branch convolution layers with different sizes are used for multi-scale feature extraction, and simultaneously, hole convolutions with different hole rates are introduced, wherein the larger the hole rate of the hole convolution is, the farther a sampling point is from a central point is, and the larger a receptive field is, so that the information can be captured in a larger area to generate a feature map with better effect, and meanwhile, the parameter quantity is not increased;

finally, the outputs of the plurality of branches are connected to fuse different features in a multi-scale mode to obtain the feature x extracted by the RFB _ x _e 。

A super-resolution reconstruction method for efficient multi-attention feature fusion is carried out according to the following steps:

s1, inputting a low-resolution LR image into a high-efficiency multi-attention feature fusion super-resolution reconstruction model;

s2, a feature extraction module of the super-resolution reconstruction model extracts shallow features of the LR image, then the deep features are extracted through 8 Progressive Feature Fusion Blocks (PFFB), and the deep features are sent to the reconstruction module;

s3, reconstructing the modelThe block utilizes RFB _ x to enhance the extracted deep-level features, and multi-scale fusion features obtain fused multi-dimensional features x _e ；

S4, feature x for outputting RFB _ x _e And performing 3 × 3 convolution and amplification through the sub-pixel convolution layer, performing bicubic upsampling on the input LR characteristics, and superposing the bicubic upsampling result of the LR image and the upsampling result of the sub-pixel convolution layer to obtain a reconstructed super-resolution image.

Further, the process of implementing deep layer feature extraction by the feature extraction module of S2 is as follows,

x ₀ ＝f _IFE (I _LR ) (1)

wherein, I _LR Is an input LR image, f _IFE Convolution operation of size 3 x 3, x ₀ In order to extract the original features of the image,

mapping function, x, for the ith (i =0,1, \8230;, 7) progressive feature fusion patch PFFB _i+1 And extracting deep features from the ith PFFB of the feature extraction module.

Further, the reconstruction module of S3 uses rfbxx to enhance the extracted deep level features by performing the following steps,

x _e ＝f _{RFB_x} (x ₈ ) (3)

wherein, f _{RFB_x} For the deep feature x extracted by 8 PFFB blocks using RFB _ x ₈ Function to perform enhancement, x _e To enhance the result, i.e., the multidimensional feature of the RFB _ x output.

Further, the S4 completes the reconstruction according to the following formula,

I _SR ＝f _P (x _e )+f _up (I _LR ) (4)

wherein f is _P Is to enhance the result x _e Performing 3-by-3 convolution and sub-pixel convolution operationsDo, f _up Is to perform a bicubic upsampling operation on the input low-resolution LR image, I _SR Is the final super-resolution SR image.

An electronic device realizes image reconstruction by adopting the method.

A computer storage medium having stored therein at least one program instruction which is loaded and executed by a processor to implement the image reconstruction method described above.

The invention has the beneficial effects that:

the invention provides an efficient multi-attention feature fusion image super-resolution reconstruction algorithm (EMAFFN) for improving image quality and observation effect and solving the problems of information loss and large parameter quantity caused by too deep network layers in the existing super-resolution reconstruction algorithm. The algorithm gradually extracts the feature information of the image through a Progressive Feature Fusion Block (PFFB), reduces the loss of the feature information in the process of deep network transmission, and meanwhile, weights the extracted feature in a self-adaptive manner by combining the branching effects of a high-efficiency multi-attention block (EMAB) in the PFFB on a channel and a space, so that the network pays more attention to high-frequency information, and finally, the extracted feature is enhanced by using a multi-scale receptive field block (RFB _ x) and the performance of a reconstruction module is improved by multi-scale fusion of the feature. The reconstructed image can recover more high-frequency information, and the texture details are rich and are closer to the original image.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of a reconstruction model according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a progressive feature fusion block PFFB in a reconstruction model according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of an efficient multi-attention block EMAB in a reconstruction model according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a multi-scale field block RFB _ x in the reconstruction model according to the embodiment of the present invention.

FIG. 5 is a comparison graph of the reconstruction effect of the reconstruction method according to the embodiment of the present invention and other algorithms on the low resolution image with the magnification factor of 4.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the case of the example 1, the following examples are given,

a high-efficiency multi-attention feature fusion super-resolution reconstruction model is structurally shown as figure 1 and comprises a feature extraction module and a reconstruction module;

wherein the feature extraction module is divided into a shallow feature extraction module and a deep feature extraction module,

the deep feature extraction module comprises 8 Progressive Feature Fusion Blocks (PFFB),

as shown in fig. 2, the pffb gradually extracts deep-level feature information of an image in a progressive fusion connection manner to enhance feature transfer, and weights the extracted feature information in combination with an efficient multi-attention block (EMAB) therein to make the network focus more on high-frequency information; in the PFFB, four efficient multi-attention blocks (EMABs) are adopted for progressive layer by layer, and deep information of an image is gradually extracted; PFFB realizes 'information interaction' on convolution layer results in EMAB blocks through multiple channel random mixing (C shuffle), regroups output channels, and mixesInformation of different channels solves the problem that information circulation between convolution layers is not smooth, and channels are fully fused while calculated amount is not increased; the PFFB carries out C shuffle on each feature extracted by the EMAB block, then two adjacent features processed by the C shuffle are connected for carrying out C shuffle again to improve the generalization capability of the network, 1 × 1 convolution is used for removing redundant information, and the generated result is subjected to feature fusion with the information processed by the next C shuffle; the operation is repeated among the EMAB blocks in the PFFB, so that local information is gradually collected and feature fusion is carried out, feature transmission is enhanced, and the accuracy of the reconstructed image is improved; finally, inputting the characteristic x by residual error learning _i The output characteristic x of the ith (i =0,1, \8230;, 7) PFFB block is obtained by superposition with the fused characteristics _i+1 LR image information is utilized to the maximum extent to relieve the loss of the features in the transmission process; the PFFB strengthens the feature extraction and fuses the extracted multilayer information through a connection mode of 'progressive' feature fusion, and is convenient for each layer to fully utilize all the features learned by the previous layers, so that the limited features can be better transmitted and reused.

As shown in FIG. 3, EMAB makes full use of the characteristic information of the channel and space to gradually denoise the shallow feature of the image, so that the network focuses on the high-frequency details in the concerned image, and is helpful to enhance the texture detail information of the reconstructed image; EMAB uses 1 × 1 convolution layer to reduce the channel size after two 3 × 3 convolution kernels, enlarges the receptive field through step convolution with step size of 2 and further reduces the space dimension of the network by combining with 2 × 2 maximum pooling layers; then further aggregating context information of a receptive field by using a cavity convolution layer, reducing the memory and simultaneously improving the network performance, performing up-sampling operation on the obtained features to restore the spatial dimension, and restoring the channel dimension by 1 × 1 convolution; EMAB uses an activation function Freuu to accelerate convergence speed and prevent gradient explosion after 3 convolutional layers; EMAB adopts a high-efficiency channel attention block to avoid the problem caused by dimensionality reduction, the channel attention is generated by rapid one-dimensional convolution, and the size of an internal convolution kernel is determined in a self-adaptive manner through nonlinear mapping of channel dimensionality; the one-dimensional convolution can efficiently realize local cross-channel interaction, and the mutual communication between the local cross-channel interaction and the local cross-channel interaction is completed by capturing the information of the local cross-channel interaction, so that the effective channel attention is learned.

The reconstruction module consists of a multi-scale field block RFB _ x, a 3 x 3 convolution and a sub-pixel convolution layer, wherein the RFB _ x further enhances the characteristics extracted from the PFFB block by utilizing a multi-branch structure and improves the reconstruction performance of the model by fusing multi-scale characteristic information

As in fig. 4, the multi-scale field patches RFB _ x in the reconstruction module are composed of 1 × 1, 3 × 3, 1 × 3, and 3 × 1 convolution kernels; the RFB _ x is positioned behind the sequentially connected 8 PFFBs and is responsible for enhancing extracted deep-level features, multi-scale fusion features and reconstruction, the characteristics with rich depth are reserved, and image details are recovered; specifically, the output characteristic x of the 8 th PFFB block is set ₈ As the input of the RFB-x block, multi-branch convolution layers with different sizes are used for multi-scale feature extraction, and simultaneously, hole convolutions with different hole rates are introduced, wherein the larger the hole rate of the hole convolution is, the farther a sampling point is from a central point is, and the larger a receptive field is, so that the information can be captured in a larger area to generate a feature map with better effect, and meanwhile, the parameter quantity is not increased; finally, the outputs of the plurality of branches are connected to fuse different features in a multi-scale mode to obtain the feature x extracted by the RFB _ x _e 。

Finally, as shown in FIG. 1, the feature x of RFB _ x is extracted _e And superposing the bicubic up-sampling result of the low-resolution LR image and the up-sampling result of the sub-pixel convolution layer to obtain a reconstructed image.

In the case of the example 2, the following examples are given,

s3, the reconstruction module utilizes the RFB _ x to enhance the extracted deep-level features, and obtains the fused multi-dimensional features x by the multi-scale fusion features _e ；

S4, outputting feature x of RFB _ x _e And performing 3 × 3 convolution and amplification through the sub-pixel convolution layer, performing bicubic upsampling on the input LR characteristics, and superposing the bicubic upsampling result of the LR image and the upsampling result of the sub-pixel convolution layer to obtain a reconstructed super-resolution image.

Further, the process of the feature extraction module of S2 for implementing deep feature extraction is as follows,

x ₀ ＝f _IFE (I _LR ) (1)

wherein, I _LR Is an input LR image, f _IFE Convolution operation of size 3 x 3, x ₀ In order to extract the initial features of the image,

mapping function, x, for the ith (i =0,1, \8230;, 7) progressive feature fusion patch PFFB _i+1 The deep features extracted by the ith PFFB of the feature extraction module.

Further, the reconstruction module of S3 uses rfbx to enhance the extracted deep level features by,

x _e ＝f _{RFB_x} (x ₈ ) (3)

wherein f is _{RFB_x} For the deep feature x extracted by 8 PFFB blocks using RFB _ x ₈ Function to perform enhancement, x _e To enhance the result, i.e., the multidimensional feature of the RFB _ x output.

I _SR ＝f _P (x _e )+f _up (I _LR ) (4)

wherein f is _P Is to enhance the result x _e Performing 3 x 3 convolution and sub-pixel convolution operations, f _up Is to perform a bicubic upsampling operation on the input low-resolution LR image, I _SR Is the resulting super-resolutionAn SR image.

In order to reduce reconstruction errors, parameters in the network are optimally trained by using an L1 loss function. Training set for given N LR-HR image pairs

The optimization objectives are as follows:

where k represents the kth pair of LR-HR images in the training set, k ∈ [1]And k ∈ Z, N is 800, θ = { w = { [ N ] } _k ,b _k Is the learning parameter of the model, H _SR I.e. the model herein. And continuously training the optimized model parameters to enable L (theta) to reach the minimum value, so that the reconstructed image is as close to a real image as possible.

In order to verify the effectiveness of the high-efficiency multi-attention feature fusion image super-resolution reconstruction method, four widely used reference data sets in super-resolution reconstruction are selected: set5, set14, BSD100, urban100 as a test Set, keys's algorithm (R. Keys, cubic fusion interaction for digital image Processing, in IEEE Transactions on Acoustics, speech, and Signal Processing, vol.29, december 1981, pp.1153-1160.); dong's algorithm (c.dong, c.c.long, k.he, x.tang, image super-resolution using deputy connected networks, in proc.ieee conf.com.vis.pattern recognition, CVPR, 2014, pp.184-199.); dong's algorithm (C.dong, C.C.Loy, X.Tang, indexing the super-resolution proportional network, computer Vision-ECCV works, 2016, pp.391-407); the algorithm of Shi (W.Shi, J.Caballero, F.Huszar, et al., real-time single image and video super-resolution using an effective sub-pixel conditional network, in Proc.IEEE Conf. computer Vis.Pattern recognition (CVPR), 2016, pp.1874-1883); kim's algorithm (J.Kim, K.J.Lee, M.K.Lee, accurate Image Super-Resolution Using Very Deep computational Networks, in Proc.IEEE Conf Computer. Vis. Pattern recognition (CVPR), 2016, pp.1646-1654); the algorithm of Ahn (N.Ahn, B.Kang, A.K.Sohn, fast, accurate, and Lightweight Super-Resolution with a screening reactive Network, computer Vision-ECCV Workshos, 2018, pp.252-268); zhao's algorithm (h.zhao, x.kong, j.he, et al, effective image super-resolution using pixel alignment, computer Vision-ECCV works houses, 2020, pp.56-72); tian's algorithm (C.Tian, Y.xu, W.Zuo, et al, coarse-to-fine CNN for image super-resolution, IEEE Transactions on Multimedia, vol.23,2021, pp. 1489-1502) and the experimental results of the present invention were subjected to comparative analysis in both main and objective aspects.

As shown in fig. 5, a comparison graph of the effect of the image reconstruction experiment on the test set Urban100 is provided for the image super-resolution reconstruction method with efficient multi-attention feature fusion provided by the embodiment of the present invention and other algorithms. The left larger graph is an original image with high resolution, the part with rich texture details in the image is marked and enlarged by a rectangular frame, and the right eight smaller graphs are respectively the original image, a Dong SRCNN method reconstruction result graph, a Dong FSRCNN method reconstruction result graph, a Kim VDSR method reconstruction result graph, an Ahn CARN method reconstruction result graph, a ZHao PAN method reconstruction result graph, a Tian CFSRCNN method reconstruction result graph and a reconstruction result graph of the example method of the invention according to the sequence from left to right and from top to bottom. The method can be observed that the image reconstructed by the method almost accurately recovers the shape of the stripe, and the edge contour is clearest; however, images reconstructed by other methods have a visual blur phenomenon, and the real content in the images cannot be effectively restored. Therefore, the reconstruction effect of the method of the embodiment of the invention has obvious advantages, and the reconstructed image recovers more high-frequency information and is closer to the original image.

In order to avoid the deviation caused by qualitative analysis, the present example uses two objective indexes of peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) to perform objective quantitative analysis, and compares reconstruction effects with 2, 3, and 4 times of amplification times of Bicubic algorithm of Keys, SRCNN algorithm of Dong, FSRCNN algorithm of Dong, ESPCN algorithm of Shi, VDSR algorithm of Kim, car algorithm of Ahn, PAN algorithm of Zhao, and CFSRCNN algorithm of Tian on four test data sets of Set5, set14, BSD100, and Urban100, as shown in table 1:

table 1 PSNR and SSIM values of different algorithms on a test set

For PSNR and SSIM, the higher the value, the more similar the representation result is to the real image, and the higher the image quality. As can be seen from table 1, the method of the present embodiment achieves optimal SSIM values on four test sets at x 4 magnification. The comparison shows that the method of the embodiment of the present invention obtains the optimal value compared with other methods except that the PSNR and SSIM values of the PAN algorithm on a part of the data set are slightly higher than those of the method of the embodiment of the present invention. The PSNR average value of the method of the embodiment of the invention reaches 37.93dB at most, and SSIM reaches 0.9609 optimally. Therefore, the method provided by the embodiment of the invention has the advantages that the peak signal-to-noise ratio and the structural similarity of the reconstructed image are greatly improved, the visual quality of the reconstructed image is improved, and the detailed characteristics are richer.

The image reconstruction method of the embodiment of the invention can be stored in a computer readable storage medium if the image reconstruction method is realized in the form of a software functional module and sold or used as an independent product. Based on such understanding, the technical solution of the present invention or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the image reconstruction method according to the embodiment of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A super-resolution reconstruction model for efficient multi-attention feature fusion is characterized by mainly comprising a feature extraction module and a reconstruction module;

the reconstruction module consists of a multi-scale field block RFB _ x, a 3 × 3 convolution and a sub-pixel convolution layer, wherein the RFB _ x further enhances the features extracted from the PFFB block by using a multi-branch structure, and the reconstruction performance of the model is improved by fusing multi-scale feature information;

and finally, superposing the double-cubic up-sampling result of the low-resolution LR image and the up-sampling result of the sub-pixel convolution layer to obtain a reconstructed high-resolution image.

2. The super-resolution reconstruction model for efficient multi-attention feature fusion according to claim 1, wherein the deep feature extraction module comprises 8 Progressive Feature Fusion Blocks (PFFB);

the PFFB carries out C shuffle on each feature extracted by the EMAB block, then two adjacent features processed by Cshuffle are connected, and C shuffle is carried out again to improve the generalization capability of the network, 1 × 1 convolution is used for removing redundant information, and the generated result is subjected to feature fusion with the information operated by the C shuffle; the operation is repeated among the EMAB blocks in the PFFB, so that local information is gradually collected and feature fusion is carried out, feature transmission is enhanced, and the accuracy of the reconstructed image is improved; finally, inputting the characteristic x by residual error learning _i The output characteristic x of the ith (i =0,1, \ 8230;, 7) PFFB block is obtained by superposition with the fused characteristics _i+1 LR image information is utilized to the maximum extent to relieve the loss of the features in the transmission process;

3. The Progressive Feature Fusion Block (PFFB) of claim 2, wherein four efficient multi-attention blocks (EMAB) are employed in the PFFB to extract features layer by layer;

the one-dimensional convolution can efficiently realize local cross-channel interaction, and the mutual communication between the local cross-channel interaction and the local cross-channel interaction is completed by capturing the local cross-channel information, so that the effective channel attention is learned.

4. The high-efficiency multi-attention feature fusion super-resolution reconstruction model of claim 1, wherein the multi-scale field blocks RFB _ x in the reconstruction module are formed by combining 1 x 1, 3 x 3, 1 x 3 and 3 x 1 convolution kernels;

5. A super-resolution reconstruction method for efficient multi-attention feature fusion is characterized by comprising the following steps:

6. The efficient multi-attention feature fusion super-resolution reconstruction method according to claim 5, wherein the feature extraction module of S2 implements deep feature extraction as follows,

x ₀ ＝f _IFE (I _LR ) (1)

mapping function, x, for the i (i =0,1, \8230;, 7) th progressive feature fusion block PFFB _i+1 The deep features extracted by the ith PFFB of the feature extraction module.

7. The method for super-resolution reconstruction with efficient multi-attention feature fusion according to claim 5, wherein the reconstruction module of S3 utilizes RFBx to enhance the extracted deep features by,

x _e ＝f _{RFB_x} (x ₈ ) (3)

wherein f is _{RFB_x} For the deep feature x extracted by 8 PFFB blocks using RFB _ x ₈ Function to perform enhancement, x _e For enhancing the result, i.e. multi-dimensional features of the RFB _ x output。

8. The method for reconstructing super-resolution features with high efficiency and multi-attention feature fusion according to claim 5, wherein S4 completes the reconstruction according to the following formula,

I _SR ＝f _P (x _e )+f _up (I _LR ) (4)

wherein, f _P Is to enhance the result x _e Performing 3 x 3 convolution and sub-pixel convolution operations, f _up Is to perform a bicubic upsampling operation on the input low-resolution LR image, I _SR Is the final super-resolution SR image.

9. An electronic device, characterized in that image reconstruction is achieved with a method according to any one of claims 5 to 8.

10. A computer storage medium, characterized in that at least one program instruction is stored in the storage medium, which at least one program instruction is loaded and executed by a processor to implement the image reconstruction method according to any one of claims 5 to 8.