CN115660955A - Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion - Google Patents

Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion Download PDF

Info

Publication number
CN115660955A
CN115660955A CN202211287811.5A CN202211287811A CN115660955A CN 115660955 A CN115660955 A CN 115660955A CN 202211287811 A CN202211287811 A CN 202211287811A CN 115660955 A CN115660955 A CN 115660955A
Authority
CN
China
Prior art keywords
image
feature
convolution
pffb
reconstruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211287811.5A
Other languages
Chinese (zh)
Inventor
贾晓芬
李方玗
赵佰亭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Science and Technology
Original Assignee
Anhui University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Science and Technology filed Critical Anhui University of Science and Technology
Priority to CN202211287811.5A priority Critical patent/CN115660955A/en
Publication of CN115660955A publication Critical patent/CN115660955A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a super-resolution reconstruction model, a method, equipment and a storage medium for efficient multi-attention feature fusion, wherein the reconstruction model comprises a feature extraction module and a reconstruction module, the feature extraction module gradually extracts deep feature information of an image by utilizing a 3 x 3 convolution and 8 Progressive Feature Fusion Blocks (PFFB), and weights the extracted feature information by combining an internal efficient multi-attention block (EMAB) to enable a network to pay more attention to high-frequency information; the reconstruction module consists of a multi-scale field block RFB _ x, a 3 × 3 convolution and a sub-pixel convolution layer, wherein the RFB _ x further enhances the characteristics extracted by the PFFB by using a multi-branch structure, and combines multi-scale characteristic information to improve the reconstruction performance of the model; and finally, superposing the double-cubic up-sampling result of the low-resolution LR image and the up-sampling result of the sub-pixel convolution layer to obtain a reconstructed image. The reconstructed image can recover more high-frequency information, and has abundant texture details which are closer to the original image.

Description

Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion
Technical Field
The invention belongs to the technical field of image reconstruction, and relates to a high-efficiency multi-attention feature fusion super-resolution reconstruction model, method, equipment and storage medium.
Background
The Single Image super-Resolution (SISR) technology is to restore a given Low-Resolution (LR) Image to a corresponding High-Resolution (HR) Image by using a specific algorithm, and aims to overcome or compensate the problems of Low imaging Image quality, insignificant region of interest, and the like caused by the limitation of an Image acquisition system or an acquisition environment. The traditional image super-resolution reconstruction algorithm mainly depends on basic digital image processing technology for reconstruction, the calculation is complex, and the original information of the image cannot be effectively recovered.
With the wide application of deep learning in image super-resolution reconstruction and gradually achieving better results, the method is receiving more and more attention of researchers. Dong, et al, first applied convolutional neural networks to super-resolution reconstruction techniques, proposed SRCNN generated a reconstructed Image in a pixel mapping manner using three convolutional layers, as detailed in "c.dong, c.c.lay, k.he, x.tang, image super-resolution using de-weighted conditional networks, in proc.ieee conf.com.vis.pattern recognitions (CVPR), 2014, pp.184-199"; dong et al propose FSRCNN based on SRCNN, which uses a smaller convolution layer inside and enlarges the size at the end of the network by means of a deconvolution layer, as detailed in "c.dong, c.c.loy, x.tang, accumulating the super-resolution volumetric network, computer Vision-ECCV works, 2016, pp.391-407"; the efficient subpixel convolutional neural network ESPCN proposed by Shi et al implements upsampling operations at the reconstruction module using the rearranged subpixel layer, as detailed in "W.Shi, J.Caballero, F.Huszar, et al, real-time single image and video super-resolution using an iterative sub-pixel convolutional network, in Proc.IEEE Conf computer Vis. Pattern Recognitit (CVPR), 2016, pp.1874-3". The super-resolution reconstruction method is light in weight, but the reconstruction effect is not as expected.
One has then begun to improve model performance by increasing network depth. Kim et al propose a 20-layer Deep-layer network VDSR that utilizes the idea of residual learning to accelerate the convergence rate of network training, as detailed in "j.kim, k.j.lee, m.k.lee, accurate Image Super-Resolution Using Very Deep computational Networks, in proc. IEEE Conf. Vis. Pattern Recognit (CVPR), 2016, pp.1646-1654"; ahn et al propose an architecture CARN for implementing a cascade mechanism on a Residual Network (ResNet), the middle part of which is based on ResNet, and the global and local use of the Network using the cascade mechanism to better integrate the characteristics of each layer of the Network, as detailed in "N.ahn, B.kang, A.K. Sohn, fast, accurate, and Lightweight Super-Resolution with a cascade resource Network, computer Vision-ECCV works, 2018, pp.252-268"; zhao et al, in "h, zhao, x.kong, j.he, et al, efficient image super-resolution using pixel alignment, computer Vision-ECCV works, 2020, pp.56-72"; the CFSRCNN proposed by Tian et al learns long and short path features by using a feature extraction module, and fuses the learned features by expanding information of a network shallow layer to a deep layer, which is described in detail in 'C, tian, Y.xu, W.Zuo, et al, coorse-to-fine CNN for image super-resolution, IEEE Transactions on Multimedia, vol.23,2021, pp.1489-1502'. According to the method, the performance of the model is improved by increasing the number of network layers, but the training difficulty is greatly increased due to the increased parameter quantity, and the reconstruction performance of some models is obviously reduced after the parameters are reduced. It is difficult to achieve a good balance of visible parameters and performance.
The inventor researches and discovers that the existing single image super-resolution reconstruction method based on deep learning has poor reconstruction effect and mainly has the following problems: (1) Part of the super-resolution reconstruction network based on the CNN is light in weight, but the performance is not as expected. (2) More and more models increase the network depth for pursuing the reconstruction effect, although the performance is improved, the time complexity of training is increased due to huge parameters, and meanwhile, the reconstruction effect can be influenced by the loss of the feature information in the deep network transmission process. How to fully utilize the feature information of the image, enable the limited features to realize better transmission and reuse, efficiently recover the edge contour and texture details of the image, and enable the model to be light-weighted as much as possible while ensuring the performance to become the problem which needs to be solved urgently at present.
Disclosure of Invention
In order to solve the problems, the invention provides an efficient multi-attention feature fusion image super-resolution reconstruction model which can improve the image quality, improve the observation effect and solve the problems of information loss and large parameter quantity caused by too deep network layers in the existing super-resolution reconstruction algorithm.
The invention has a second purpose of providing an efficient multi-attention feature fusion image super-resolution reconstruction method.
A third object of the present invention is to provide an electronic apparatus.
It is a fourth object of the present invention to provide a computer storage medium.
The technical scheme adopted by the invention is that the super-resolution reconstruction model for efficient multi-attention feature fusion comprises a feature extraction module and a reconstruction module;
the feature extraction module is divided into a shallow feature extraction module and a deep feature extraction module,
the shallow layer feature extraction module is a 3-by-3 convolution layer, and performs initial feature extraction on the input low-resolution LR image in a low-dimensional space, so that the calculated amount of the LR image is effectively reduced;
the deep layer feature extraction module comprises 8 Progressive Feature Fusion Blocks (PFFB), the PFFB adopts a progressive fusion connection mode to gradually extract deep layer feature information of the image so as to enhance feature transfer, and meanwhile, the PFFB combines an internal efficient multi-attention block (EMAB) to weight the extracted feature information so as to enable the network to pay more attention to high-frequency information;
the reconstruction module consists of a multi-scale reception field block RFB _ x, a 3 × 3 convolution and a sub-pixel convolution layer, the RFB _ x further enhances the characteristics extracted from the PFFB block by utilizing a multi-branch structure and fuses multi-scale characteristic information to improve the reconstruction performance of the model,
and then superposing the bicubic upsampling result of the low-resolution LR image and the upsampling result of the sub-pixel convolution layer to obtain a reconstructed image.
Further, the deep feature extraction module comprises 8 Progressive Feature Fusion Blocks (PFFB);
four efficient multi-attention blocks (EMAB) are adopted in the PFFB to progressively extract the deep information of the image layer by layer;
the PFFB realizes ' information interaction ' of convolution layer results in the EMAB block through multiple channel random mixing (C shuffle '), regroups output channels, and then mixes information of different channels, thereby solving the problem of unsmooth information circulation among convolution layers, and fully fusing the channels without increasing calculated amount;
the PFFB carries out C shuffle on each feature extracted by the EMAB block, then two adjacent features processed by the C shuffle are connected, and C shuffle is carried out again to improve the generalization capability of the network, 1 × 1 convolution is used for removing redundant information, and the generated result is subjected to feature fusion with the information operated by the next C shuffle; the operation is repeated among the EMAB blocks in the PFFB, so that local information is gradually collected and feature fusion is carried out, feature transmission is enhanced, and the accuracy of the reconstructed image is improved; finally, inputting the characteristic x by residual error learning i The output characteristic x of the ith (i =0,1, \ 8230;, 7) PFFB block is obtained by superposition with the fused characteristics i+1 LR image information is utilized to the maximum extent to relieve the loss of the features in the transmission process;
the PFFB strengthens feature extraction and fuses extracted multilayer information through a connection mode of 'progressive' feature fusion, and each layer can make full use of all features learned by the previous layers conveniently, so that the limited features can be better transmitted and reused.
Further, four efficient multi-attention blocks (EMABs) are adopted in the PFFB to extract features layer by layer;
the EMAB makes full use of the characteristic information of the channel and the space to gradually denoise the shallow characteristic of the image, so that the network focuses on the high-frequency details in the concerned image, and the enhancement of the texture detail information of the reconstructed image is facilitated;
the EMAB reduces the channel size by using a 1 × 1 convolution layer after two 3 × 3 convolution kernels, enlarges the sensing field by step convolution with the step size of 2 and further reduces the space dimension of the network by combining with a 2 × 2 maximum pooling layer; then further aggregating context information of a receptive field by using a cavity convolution layer, reducing the memory and simultaneously improving the network performance, performing up-sampling operation on the obtained features to restore the spatial dimension, and restoring the channel dimension by 1 × 1 convolution;
the EMAB uses an activation function Freuu to accelerate the convergence speed and prevent gradient explosion after 3 convolutional layers;
the EMAB adopts an efficient channel attention block to avoid the problem caused by dimensionality reduction, the channel attention is generated by rapid one-dimensional convolution, and the size of an internal convolution kernel is determined in a self-adaptive mode through nonlinear mapping of channel dimensionality;
the one-dimensional convolution can efficiently realize local cross-channel interaction, and the mutual communication between the local cross-channel interaction and the local cross-channel interaction is completed by capturing the information of the local cross-channel interaction, so that the effective channel attention is learned.
Further, the multi-scale field block RFB _ x in the reconstruction module is formed by combining 1 × 1, 3 × 3, 1 × 3, and 3 × 1 convolution kernels;
the RFB _ x is positioned behind the sequentially connected 8 PFFBs and is responsible for enhancing extracted deep-level features, multi-scale fusion features and reconstruction, the features with rich depth are reserved, and image details are recovered;
specifically, the output characteristic x of the 8 th PFFB block is set 8 As the input of the RFB-x block, multi-branch convolution layers with different sizes are used for multi-scale feature extraction, and simultaneously, hole convolutions with different hole rates are introduced, wherein the larger the hole rate of the hole convolution is, the farther a sampling point is from a central point is, and the larger a receptive field is, so that the information can be captured in a larger area to generate a feature map with better effect, and meanwhile, the parameter quantity is not increased;
finally, the outputs of the plurality of branches are connected to fuse different features in a multi-scale mode to obtain the feature x extracted by the RFB _ x e
A super-resolution reconstruction method for efficient multi-attention feature fusion is carried out according to the following steps:
s1, inputting a low-resolution LR image into a high-efficiency multi-attention feature fusion super-resolution reconstruction model;
s2, a feature extraction module of the super-resolution reconstruction model extracts shallow features of the LR image, then the deep features are extracted through 8 Progressive Feature Fusion Blocks (PFFB), and the deep features are sent to the reconstruction module;
s3, reconstructing the modelThe block utilizes RFB _ x to enhance the extracted deep-level features, and multi-scale fusion features obtain fused multi-dimensional features x e
S4, feature x for outputting RFB _ x e And performing 3 × 3 convolution and amplification through the sub-pixel convolution layer, performing bicubic upsampling on the input LR characteristics, and superposing the bicubic upsampling result of the LR image and the upsampling result of the sub-pixel convolution layer to obtain a reconstructed super-resolution image.
Further, the process of implementing deep layer feature extraction by the feature extraction module of S2 is as follows,
x 0 =f IFE (I LR ) (1)
Figure BDA0003900610110000041
wherein, I LR Is an input LR image, f IFE Convolution operation of size 3 x 3, x 0 In order to extract the original features of the image,
Figure BDA0003900610110000051
mapping function, x, for the ith (i =0,1, \8230;, 7) progressive feature fusion patch PFFB i+1 And extracting deep features from the ith PFFB of the feature extraction module.
Further, the reconstruction module of S3 uses rfbxx to enhance the extracted deep level features by performing the following steps,
x e =f RFB_x (x 8 ) (3)
wherein, f RFB_x For the deep feature x extracted by 8 PFFB blocks using RFB _ x 8 Function to perform enhancement, x e To enhance the result, i.e., the multidimensional feature of the RFB _ x output.
Further, the S4 completes the reconstruction according to the following formula,
I SR =f P (x e )+f up (I LR ) (4)
wherein f is P Is to enhance the result x e Performing 3-by-3 convolution and sub-pixel convolution operationsDo, f up Is to perform a bicubic upsampling operation on the input low-resolution LR image, I SR Is the final super-resolution SR image.
An electronic device realizes image reconstruction by adopting the method.
A computer storage medium having stored therein at least one program instruction which is loaded and executed by a processor to implement the image reconstruction method described above.
The invention has the beneficial effects that:
the invention provides an efficient multi-attention feature fusion image super-resolution reconstruction algorithm (EMAFFN) for improving image quality and observation effect and solving the problems of information loss and large parameter quantity caused by too deep network layers in the existing super-resolution reconstruction algorithm. The algorithm gradually extracts the feature information of the image through a Progressive Feature Fusion Block (PFFB), reduces the loss of the feature information in the process of deep network transmission, and meanwhile, weights the extracted feature in a self-adaptive manner by combining the branching effects of a high-efficiency multi-attention block (EMAB) in the PFFB on a channel and a space, so that the network pays more attention to high-frequency information, and finally, the extracted feature is enhanced by using a multi-scale receptive field block (RFB _ x) and the performance of a reconstruction module is improved by multi-scale fusion of the feature. The reconstructed image can recover more high-frequency information, and the texture details are rich and are closer to the original image.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic structural diagram of a reconstruction model according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a progressive feature fusion block PFFB in a reconstruction model according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an efficient multi-attention block EMAB in a reconstruction model according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a multi-scale field block RFB _ x in the reconstruction model according to the embodiment of the present invention.
FIG. 5 is a comparison graph of the reconstruction effect of the reconstruction method according to the embodiment of the present invention and other algorithms on the low resolution image with the magnification factor of 4.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the case of the example 1, the following examples are given,
a high-efficiency multi-attention feature fusion super-resolution reconstruction model is structurally shown as figure 1 and comprises a feature extraction module and a reconstruction module;
wherein the feature extraction module is divided into a shallow feature extraction module and a deep feature extraction module,
the shallow layer feature extraction module is a 3-by-3 convolution layer, and performs initial feature extraction on the input low-resolution LR image in a low-dimensional space, so that the calculated amount of the LR image is effectively reduced;
the deep feature extraction module comprises 8 Progressive Feature Fusion Blocks (PFFB),
as shown in fig. 2, the pffb gradually extracts deep-level feature information of an image in a progressive fusion connection manner to enhance feature transfer, and weights the extracted feature information in combination with an efficient multi-attention block (EMAB) therein to make the network focus more on high-frequency information; in the PFFB, four efficient multi-attention blocks (EMABs) are adopted for progressive layer by layer, and deep information of an image is gradually extracted; PFFB realizes 'information interaction' on convolution layer results in EMAB blocks through multiple channel random mixing (C shuffle), regroups output channels, and mixesInformation of different channels solves the problem that information circulation between convolution layers is not smooth, and channels are fully fused while calculated amount is not increased; the PFFB carries out C shuffle on each feature extracted by the EMAB block, then two adjacent features processed by the C shuffle are connected for carrying out C shuffle again to improve the generalization capability of the network, 1 × 1 convolution is used for removing redundant information, and the generated result is subjected to feature fusion with the information processed by the next C shuffle; the operation is repeated among the EMAB blocks in the PFFB, so that local information is gradually collected and feature fusion is carried out, feature transmission is enhanced, and the accuracy of the reconstructed image is improved; finally, inputting the characteristic x by residual error learning i The output characteristic x of the ith (i =0,1, \8230;, 7) PFFB block is obtained by superposition with the fused characteristics i+1 LR image information is utilized to the maximum extent to relieve the loss of the features in the transmission process; the PFFB strengthens the feature extraction and fuses the extracted multilayer information through a connection mode of 'progressive' feature fusion, and is convenient for each layer to fully utilize all the features learned by the previous layers, so that the limited features can be better transmitted and reused.
As shown in FIG. 3, EMAB makes full use of the characteristic information of the channel and space to gradually denoise the shallow feature of the image, so that the network focuses on the high-frequency details in the concerned image, and is helpful to enhance the texture detail information of the reconstructed image; EMAB uses 1 × 1 convolution layer to reduce the channel size after two 3 × 3 convolution kernels, enlarges the receptive field through step convolution with step size of 2 and further reduces the space dimension of the network by combining with 2 × 2 maximum pooling layers; then further aggregating context information of a receptive field by using a cavity convolution layer, reducing the memory and simultaneously improving the network performance, performing up-sampling operation on the obtained features to restore the spatial dimension, and restoring the channel dimension by 1 × 1 convolution; EMAB uses an activation function Freuu to accelerate convergence speed and prevent gradient explosion after 3 convolutional layers; EMAB adopts a high-efficiency channel attention block to avoid the problem caused by dimensionality reduction, the channel attention is generated by rapid one-dimensional convolution, and the size of an internal convolution kernel is determined in a self-adaptive manner through nonlinear mapping of channel dimensionality; the one-dimensional convolution can efficiently realize local cross-channel interaction, and the mutual communication between the local cross-channel interaction and the local cross-channel interaction is completed by capturing the information of the local cross-channel interaction, so that the effective channel attention is learned.
The reconstruction module consists of a multi-scale field block RFB _ x, a 3 x 3 convolution and a sub-pixel convolution layer, wherein the RFB _ x further enhances the characteristics extracted from the PFFB block by utilizing a multi-branch structure and improves the reconstruction performance of the model by fusing multi-scale characteristic information
As in fig. 4, the multi-scale field patches RFB _ x in the reconstruction module are composed of 1 × 1, 3 × 3, 1 × 3, and 3 × 1 convolution kernels; the RFB _ x is positioned behind the sequentially connected 8 PFFBs and is responsible for enhancing extracted deep-level features, multi-scale fusion features and reconstruction, the characteristics with rich depth are reserved, and image details are recovered; specifically, the output characteristic x of the 8 th PFFB block is set 8 As the input of the RFB-x block, multi-branch convolution layers with different sizes are used for multi-scale feature extraction, and simultaneously, hole convolutions with different hole rates are introduced, wherein the larger the hole rate of the hole convolution is, the farther a sampling point is from a central point is, and the larger a receptive field is, so that the information can be captured in a larger area to generate a feature map with better effect, and meanwhile, the parameter quantity is not increased; finally, the outputs of the plurality of branches are connected to fuse different features in a multi-scale mode to obtain the feature x extracted by the RFB _ x e
Finally, as shown in FIG. 1, the feature x of RFB _ x is extracted e And superposing the bicubic up-sampling result of the low-resolution LR image and the up-sampling result of the sub-pixel convolution layer to obtain a reconstructed image.
In the case of the example 2, the following examples are given,
a super-resolution reconstruction method for efficient multi-attention feature fusion is carried out according to the following steps:
s1, inputting a low-resolution LR image into a high-efficiency multi-attention feature fusion super-resolution reconstruction model;
s2, a feature extraction module of the super-resolution reconstruction model extracts shallow features of the LR image, then the deep features are extracted through 8 Progressive Feature Fusion Blocks (PFFB), and the deep features are sent to the reconstruction module;
s3, the reconstruction module utilizes the RFB _ x to enhance the extracted deep-level features, and obtains the fused multi-dimensional features x by the multi-scale fusion features e
S4, outputting feature x of RFB _ x e And performing 3 × 3 convolution and amplification through the sub-pixel convolution layer, performing bicubic upsampling on the input LR characteristics, and superposing the bicubic upsampling result of the LR image and the upsampling result of the sub-pixel convolution layer to obtain a reconstructed super-resolution image.
Further, the process of the feature extraction module of S2 for implementing deep feature extraction is as follows,
x 0 =f IFE (I LR ) (1)
Figure BDA0003900610110000081
wherein, I LR Is an input LR image, f IFE Convolution operation of size 3 x 3, x 0 In order to extract the initial features of the image,
Figure BDA0003900610110000082
mapping function, x, for the ith (i =0,1, \8230;, 7) progressive feature fusion patch PFFB i+1 The deep features extracted by the ith PFFB of the feature extraction module.
Further, the reconstruction module of S3 uses rfbx to enhance the extracted deep level features by,
x e =f RFB_x (x 8 ) (3)
wherein f is RFB_x For the deep feature x extracted by 8 PFFB blocks using RFB _ x 8 Function to perform enhancement, x e To enhance the result, i.e., the multidimensional feature of the RFB _ x output.
Further, the S4 completes the reconstruction according to the following formula,
I SR =f P (x e )+f up (I LR ) (4)
wherein f is P Is to enhance the result x e Performing 3 x 3 convolution and sub-pixel convolution operations, f up Is to perform a bicubic upsampling operation on the input low-resolution LR image, I SR Is the resulting super-resolutionAn SR image.
In order to reduce reconstruction errors, parameters in the network are optimally trained by using an L1 loss function. Training set for given N LR-HR image pairs
Figure BDA0003900610110000083
The optimization objectives are as follows:
Figure BDA0003900610110000084
where k represents the kth pair of LR-HR images in the training set, k ∈ [1]And k ∈ Z, N is 800, θ = { w = { [ N ] } k ,b k Is the learning parameter of the model, H SR I.e. the model herein. And continuously training the optimized model parameters to enable L (theta) to reach the minimum value, so that the reconstructed image is as close to a real image as possible.
In order to verify the effectiveness of the high-efficiency multi-attention feature fusion image super-resolution reconstruction method, four widely used reference data sets in super-resolution reconstruction are selected: set5, set14, BSD100, urban100 as a test Set, keys's algorithm (R. Keys, cubic fusion interaction for digital image Processing, in IEEE Transactions on Acoustics, speech, and Signal Processing, vol.29, december 1981, pp.1153-1160.); dong's algorithm (c.dong, c.c.long, k.he, x.tang, image super-resolution using deputy connected networks, in proc.ieee conf.com.vis.pattern recognition, CVPR, 2014, pp.184-199.); dong's algorithm (C.dong, C.C.Loy, X.Tang, indexing the super-resolution proportional network, computer Vision-ECCV works, 2016, pp.391-407); the algorithm of Shi (W.Shi, J.Caballero, F.Huszar, et al., real-time single image and video super-resolution using an effective sub-pixel conditional network, in Proc.IEEE Conf. computer Vis.Pattern recognition (CVPR), 2016, pp.1874-1883); kim's algorithm (J.Kim, K.J.Lee, M.K.Lee, accurate Image Super-Resolution Using Very Deep computational Networks, in Proc.IEEE Conf Computer. Vis. Pattern recognition (CVPR), 2016, pp.1646-1654); the algorithm of Ahn (N.Ahn, B.Kang, A.K.Sohn, fast, accurate, and Lightweight Super-Resolution with a screening reactive Network, computer Vision-ECCV Workshos, 2018, pp.252-268); zhao's algorithm (h.zhao, x.kong, j.he, et al, effective image super-resolution using pixel alignment, computer Vision-ECCV works houses, 2020, pp.56-72); tian's algorithm (C.Tian, Y.xu, W.Zuo, et al, coarse-to-fine CNN for image super-resolution, IEEE Transactions on Multimedia, vol.23,2021, pp. 1489-1502) and the experimental results of the present invention were subjected to comparative analysis in both main and objective aspects.
As shown in fig. 5, a comparison graph of the effect of the image reconstruction experiment on the test set Urban100 is provided for the image super-resolution reconstruction method with efficient multi-attention feature fusion provided by the embodiment of the present invention and other algorithms. The left larger graph is an original image with high resolution, the part with rich texture details in the image is marked and enlarged by a rectangular frame, and the right eight smaller graphs are respectively the original image, a Dong SRCNN method reconstruction result graph, a Dong FSRCNN method reconstruction result graph, a Kim VDSR method reconstruction result graph, an Ahn CARN method reconstruction result graph, a ZHao PAN method reconstruction result graph, a Tian CFSRCNN method reconstruction result graph and a reconstruction result graph of the example method of the invention according to the sequence from left to right and from top to bottom. The method can be observed that the image reconstructed by the method almost accurately recovers the shape of the stripe, and the edge contour is clearest; however, images reconstructed by other methods have a visual blur phenomenon, and the real content in the images cannot be effectively restored. Therefore, the reconstruction effect of the method of the embodiment of the invention has obvious advantages, and the reconstructed image recovers more high-frequency information and is closer to the original image.
In order to avoid the deviation caused by qualitative analysis, the present example uses two objective indexes of peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) to perform objective quantitative analysis, and compares reconstruction effects with 2, 3, and 4 times of amplification times of Bicubic algorithm of Keys, SRCNN algorithm of Dong, FSRCNN algorithm of Dong, ESPCN algorithm of Shi, VDSR algorithm of Kim, car algorithm of Ahn, PAN algorithm of Zhao, and CFSRCNN algorithm of Tian on four test data sets of Set5, set14, BSD100, and Urban100, as shown in table 1:
table 1 PSNR and SSIM values of different algorithms on a test set
Figure BDA0003900610110000101
For PSNR and SSIM, the higher the value, the more similar the representation result is to the real image, and the higher the image quality. As can be seen from table 1, the method of the present embodiment achieves optimal SSIM values on four test sets at x 4 magnification. The comparison shows that the method of the embodiment of the present invention obtains the optimal value compared with other methods except that the PSNR and SSIM values of the PAN algorithm on a part of the data set are slightly higher than those of the method of the embodiment of the present invention. The PSNR average value of the method of the embodiment of the invention reaches 37.93dB at most, and SSIM reaches 0.9609 optimally. Therefore, the method provided by the embodiment of the invention has the advantages that the peak signal-to-noise ratio and the structural similarity of the reconstructed image are greatly improved, the visual quality of the reconstructed image is improved, and the detailed characteristics are richer.
The image reconstruction method of the embodiment of the invention can be stored in a computer readable storage medium if the image reconstruction method is realized in the form of a software functional module and sold or used as an independent product. Based on such understanding, the technical solution of the present invention or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the image reconstruction method according to the embodiment of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A super-resolution reconstruction model for efficient multi-attention feature fusion is characterized by mainly comprising a feature extraction module and a reconstruction module;
the feature extraction module is divided into a shallow feature extraction module and a deep feature extraction module,
the shallow layer feature extraction module is a 3-by-3 convolution layer, and performs initial feature extraction on the input low-resolution LR image in a low-dimensional space, so that the calculated amount of the LR image is effectively reduced;
the deep layer feature extraction module comprises 8 Progressive Feature Fusion Blocks (PFFB), the PFFB adopts a progressive fusion connection mode to gradually extract deep layer feature information of the image so as to enhance feature transfer, and meanwhile, the PFFB combines an internal efficient multi-attention block (EMAB) to weight the extracted feature information so as to enable the network to pay more attention to high-frequency information;
the reconstruction module consists of a multi-scale field block RFB _ x, a 3 × 3 convolution and a sub-pixel convolution layer, wherein the RFB _ x further enhances the features extracted from the PFFB block by using a multi-branch structure, and the reconstruction performance of the model is improved by fusing multi-scale feature information;
and finally, superposing the double-cubic up-sampling result of the low-resolution LR image and the up-sampling result of the sub-pixel convolution layer to obtain a reconstructed high-resolution image.
2. The super-resolution reconstruction model for efficient multi-attention feature fusion according to claim 1, wherein the deep feature extraction module comprises 8 Progressive Feature Fusion Blocks (PFFB);
four efficient multi-attention blocks (EMAB) are adopted in the PFFB to progressively extract the deep information of the image layer by layer;
the PFFB realizes ' information interaction ' of convolution layer results in the EMAB block through multiple channel random mixing (C shuffle '), regroups output channels, and then mixes information of different channels, thereby solving the problem of unsmooth information circulation among convolution layers, and fully fusing the channels without increasing calculated amount;
the PFFB carries out C shuffle on each feature extracted by the EMAB block, then two adjacent features processed by Cshuffle are connected, and C shuffle is carried out again to improve the generalization capability of the network, 1 × 1 convolution is used for removing redundant information, and the generated result is subjected to feature fusion with the information operated by the C shuffle; the operation is repeated among the EMAB blocks in the PFFB, so that local information is gradually collected and feature fusion is carried out, feature transmission is enhanced, and the accuracy of the reconstructed image is improved; finally, inputting the characteristic x by residual error learning i The output characteristic x of the ith (i =0,1, \ 8230;, 7) PFFB block is obtained by superposition with the fused characteristics i+1 LR image information is utilized to the maximum extent to relieve the loss of the features in the transmission process;
the PFFB strengthens feature extraction and fuses extracted multilayer information through a connection mode of 'progressive' feature fusion, and each layer can make full use of all features learned by the previous layers conveniently, so that the limited features can be better transmitted and reused.
3. The Progressive Feature Fusion Block (PFFB) of claim 2, wherein four efficient multi-attention blocks (EMAB) are employed in the PFFB to extract features layer by layer;
the EMAB makes full use of the characteristic information of the channel and the space to gradually denoise the shallow characteristic of the image, so that the network focuses on the high-frequency details in the concerned image, and the enhancement of the texture detail information of the reconstructed image is facilitated;
the EMAB reduces the channel size by using a 1 × 1 convolution layer after two 3 × 3 convolution kernels, enlarges the sensing field by step convolution with the step size of 2 and further reduces the space dimension of the network by combining with a 2 × 2 maximum pooling layer; then further aggregating context information of a receptive field by using a cavity convolution layer, reducing the memory and simultaneously improving the network performance, performing up-sampling operation on the obtained features to restore the spatial dimension, and restoring the channel dimension by 1 × 1 convolution;
the EMAB uses an activation function Freuu to accelerate the convergence speed and prevent gradient explosion after 3 convolutional layers;
the EMAB adopts an efficient channel attention block to avoid the problem caused by dimensionality reduction, the channel attention is generated by rapid one-dimensional convolution, and the size of an internal convolution kernel is determined in a self-adaptive mode through nonlinear mapping of channel dimensionality;
the one-dimensional convolution can efficiently realize local cross-channel interaction, and the mutual communication between the local cross-channel interaction and the local cross-channel interaction is completed by capturing the local cross-channel information, so that the effective channel attention is learned.
4. The high-efficiency multi-attention feature fusion super-resolution reconstruction model of claim 1, wherein the multi-scale field blocks RFB _ x in the reconstruction module are formed by combining 1 x 1, 3 x 3, 1 x 3 and 3 x 1 convolution kernels;
the RFB _ x is positioned behind the sequentially connected 8 PFFBs and is responsible for enhancing extracted deep-level features, multi-scale fusion features and reconstruction, the features with rich depth are reserved, and image details are recovered;
specifically, the output characteristic x of the 8 th PFFB block is set 8 As the input of the RFB-x block, multi-branch convolution layers with different sizes are used for multi-scale feature extraction, and simultaneously, hole convolutions with different hole rates are introduced, wherein the larger the hole rate of the hole convolution is, the farther a sampling point is from a central point is, and the larger a receptive field is, so that the information can be captured in a larger area to generate a feature map with better effect, and meanwhile, the parameter quantity is not increased;
finally, the outputs of the plurality of branches are connected to fuse different features in a multi-scale mode to obtain the feature x extracted by the RFB _ x e
5. A super-resolution reconstruction method for efficient multi-attention feature fusion is characterized by comprising the following steps:
s1, inputting a low-resolution LR image into a high-efficiency multi-attention feature fusion super-resolution reconstruction model;
s2, a feature extraction module of the super-resolution reconstruction model extracts shallow features of the LR image, then the deep features are extracted through 8 Progressive Feature Fusion Blocks (PFFB), and the deep features are sent to the reconstruction module;
s3, the reconstruction module utilizes the RFB _ x to enhance the extracted deep-level features, and obtains the fused multi-dimensional features x by the multi-scale fusion features e
S4, feature x for outputting RFB _ x e And performing 3 × 3 convolution and amplification through the sub-pixel convolution layer, performing bicubic upsampling on the input LR characteristics, and superposing the bicubic upsampling result of the LR image and the upsampling result of the sub-pixel convolution layer to obtain a reconstructed super-resolution image.
6. The efficient multi-attention feature fusion super-resolution reconstruction method according to claim 5, wherein the feature extraction module of S2 implements deep feature extraction as follows,
x 0 =f IFE (I LR ) (1)
Figure FDA0003900610100000032
wherein, I LR Is an input LR image, f IFE Convolution operation of size 3 x 3, x 0 In order to extract the initial features of the image,
Figure FDA0003900610100000031
mapping function, x, for the i (i =0,1, \8230;, 7) th progressive feature fusion block PFFB i+1 The deep features extracted by the ith PFFB of the feature extraction module.
7. The method for super-resolution reconstruction with efficient multi-attention feature fusion according to claim 5, wherein the reconstruction module of S3 utilizes RFBx to enhance the extracted deep features by,
x e =f RFB_x (x 8 ) (3)
wherein f is RFB_x For the deep feature x extracted by 8 PFFB blocks using RFB _ x 8 Function to perform enhancement, x e For enhancing the result, i.e. multi-dimensional features of the RFB _ x output。
8. The method for reconstructing super-resolution features with high efficiency and multi-attention feature fusion according to claim 5, wherein S4 completes the reconstruction according to the following formula,
I SR =f P (x e )+f up (I LR ) (4)
wherein, f P Is to enhance the result x e Performing 3 x 3 convolution and sub-pixel convolution operations, f up Is to perform a bicubic upsampling operation on the input low-resolution LR image, I SR Is the final super-resolution SR image.
9. An electronic device, characterized in that image reconstruction is achieved with a method according to any one of claims 5 to 8.
10. A computer storage medium, characterized in that at least one program instruction is stored in the storage medium, which at least one program instruction is loaded and executed by a processor to implement the image reconstruction method according to any one of claims 5 to 8.
CN202211287811.5A 2022-10-20 2022-10-20 Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion Pending CN115660955A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211287811.5A CN115660955A (en) 2022-10-20 2022-10-20 Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211287811.5A CN115660955A (en) 2022-10-20 2022-10-20 Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion

Publications (1)

Publication Number Publication Date
CN115660955A true CN115660955A (en) 2023-01-31

Family

ID=84989770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211287811.5A Pending CN115660955A (en) 2022-10-20 2022-10-20 Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion

Country Status (1)

Country Link
CN (1) CN115660955A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116563147A (en) * 2023-05-04 2023-08-08 北京联合大学 Underwater image enhancement system and method
CN117078516A (en) * 2023-08-11 2023-11-17 济宁安泰矿山设备制造有限公司 Mine image super-resolution reconstruction method based on residual mixed attention
CN117788477A (en) * 2024-02-27 2024-03-29 贵州健易测科技有限公司 Image reconstruction method and device for automatically quantifying tea leaf curl

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116563147A (en) * 2023-05-04 2023-08-08 北京联合大学 Underwater image enhancement system and method
CN116563147B (en) * 2023-05-04 2024-03-26 北京联合大学 Underwater image enhancement system and method
CN117078516A (en) * 2023-08-11 2023-11-17 济宁安泰矿山设备制造有限公司 Mine image super-resolution reconstruction method based on residual mixed attention
CN117078516B (en) * 2023-08-11 2024-03-12 济宁安泰矿山设备制造有限公司 Mine image super-resolution reconstruction method based on residual mixed attention
CN117788477A (en) * 2024-02-27 2024-03-29 贵州健易测科技有限公司 Image reconstruction method and device for automatically quantifying tea leaf curl
CN117788477B (en) * 2024-02-27 2024-05-24 贵州健易测科技有限公司 Image reconstruction method and device for automatically quantifying tea leaf curl

Similar Documents

Publication Publication Date Title
Wang et al. Ultra-dense GAN for satellite imagery super-resolution
CN110136063B (en) Single image super-resolution reconstruction method based on condition generation countermeasure network
CN113362223B (en) Image super-resolution reconstruction method based on attention mechanism and two-channel network
CN112507997B (en) Face super-resolution system based on multi-scale convolution and receptive field feature fusion
CN115660955A (en) Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion
CN111754438B (en) Underwater image restoration model based on multi-branch gating fusion and restoration method thereof
CN111951164B (en) Image super-resolution reconstruction network structure and image reconstruction effect analysis method
Luo et al. Lattice network for lightweight image restoration
CN111626927B (en) Binocular image super-resolution method, system and device adopting parallax constraint
CN112419191B (en) Image motion blur removing method based on convolution neural network
CN112699844A (en) Image super-resolution method based on multi-scale residual error level dense connection network
CN115331104A (en) Crop planting information extraction method based on convolutional neural network
CN113222818A (en) Method for reconstructing super-resolution image by using lightweight multi-channel aggregation network
Chen et al. Image denoising via deep network based on edge enhancement
CN111461976A (en) Image super-resolution method based on efficient lightweight coordinate neural network
CN116485646A (en) Micro-attention-based light-weight image super-resolution reconstruction method and device
Gong et al. Learning deep resonant prior for hyperspectral image super-resolution
Wang et al. Image super-resolution via lightweight attention-directed feature aggregation network
Chen et al. Underwater-image super-resolution via range-dependency learning of multiscale features
CN116188272B (en) Two-stage depth network image super-resolution reconstruction method suitable for multiple fuzzy cores
CN116777745A (en) Image super-resolution reconstruction method based on sparse self-adaptive clustering
CN116563187A (en) Multispectral image fusion based on graph neural network
CN113191947B (en) Image super-resolution method and system
Li et al. Parallel-connected residual channel attention network for remote sensing image super-resolution
CN116246110A (en) Image classification method based on improved capsule network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination