CN111161150A

CN111161150A - Image super-resolution reconstruction method based on multi-scale attention cascade network

Info

Publication number: CN111161150A
Application number: CN201911392155.3A
Authority: CN
Inventors: 付利华; 李宗刚; 张博; 陈辉; 赵茹
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-05-15
Anticipated expiration: 2039-12-30
Also published as: CN111161150B

Abstract

The invention provides an image super-resolution reconstruction method based on a multi-scale attention cascade network, which comprises the steps of extracting shallow features of a low-resolution image by using convolution operation; then, inputting the shallow layer characteristics into a characteristic extraction subnet to obtain cascade characteristics; further, the cascade characteristic passes through a convolution layer with convolution kernel of 1 to obtain an optimized characteristic; inputting the optimized features into an image depth learning up-sampling module to obtain a reconstructed image

At the same time, for low resolution images I_LRObtaining a reconstructed image by adopting a bicubic linear interpolation algorithm

Finally, the image is reconstructed

And

fusing to obtain the final high-resolution reconstructed image I_SR. The method is suitable for super-resolution reconstruction of the image, and the obtained reconstructed image has high definition, more real texture and good perception effect.

Description

Image super-resolution reconstruction method based on multi-scale attention cascade network

Technical Field

The invention belongs to the field of image restoration, relates to an image super-resolution reconstruction method, and particularly relates to an image super-resolution reconstruction method based on a multi-scale attention cascade network.

Background

Single image super-resolution reconstruction (SISR) has recently received a great deal of attention. In general, the purpose of SISR is to produce a visually High Resolution (HR) output from a Low Resolution (LR) input. However, the whole process is completely irreversible, since there are multiple solutions to the mapping between LR and HR. Therefore, a number of image super-resolution reconstruction (SR) methods have been proposed, ranging from earlier interpolation-based methods and model-based methods to more recently deep learning-based methods.

Interpolation-based methods are simple and fast, but cannot be applied more widely because of poor image quality. For more flexible SR methods, more advanced model-based methods and sparse matrix methods have been proposed by exploiting strong image priors, such as non-local similarities, and although such model-based methods can flexibly produce relatively high quality HR images, they still have some drawbacks: 1) such methods often involve a time-consuming optimization process; 2) when the image statistics are biased from the image, the reconstruction performance may degrade rapidly.

At present, Convolutional Neural Networks (CNNs) have been shown to provide significant performance in SISR problems. However, the existing SR model has the following problems: 1) the method has the characteristics of insufficient utilization: most approaches increase the depth of the network blindly to improve the performance of the network, but neglect to fully exploit the image feature characteristics of the LR. As the depth of the network increases, the information gradually disappears during transmission. How to fully exploit these features is crucial for the network to reconstruct high quality images. 2) SR image detail loss: using the interpolated LR image as input will increase the computational complexity while not facilitating the learning of the final image details. Therefore, recent methods focus more on enlarging LR images. However, the effect of the SR image cannot be improved only by the single network structure.

In order to solve the problems, the invention provides a novel image super-resolution reconstruction method based on deep learning.

Disclosure of Invention

The invention aims to solve the problems that: in the existing super-resolution reconstruction method based on deep learning, most methods increase the depth of the network blindly to improve the performance of the network, but neglect to fully utilize the characteristics of LR images; moreover, as the depth of the network increases, the characteristic information gradually disappears in the transmission process; the LR image amplified by interpolation is used as the input of the network, which increases the computational complexity and is not favorable for the network to learn the image details. A new super-resolution reconstruction method based on deep learning is required to be provided, so that the impression and robustness after image super-resolution reconstruction are improved.

In order to solve the above problems, the present invention provides an image super-resolution reconstruction method based on a multi-scale attention cascade network, which extracts the features of an LR image by using a U-shaped multi-scale attention block group and performs super-resolution reconstruction on the LR image by combining interpolation reconstruction and deep learning-based network reconstruction, and comprises the following steps:

1) low resolution image I_LRAs input to the multiscale attention cascade network, pair I_LRPerforming convolution operation to extract shallow feature F₀；

2) Shallow feature F₀Inputting a feature extraction subnet formed by n multi-scale attention blocks, cascading features output by each multi-scale attention block in the subnet to obtain cascading features F_c；

3) Will cascade feature F_cReducing the number of parameters by a convolution layer with convolution kernel of 1 to obtain optimized characteristic vector F_dOptimized feature vector F_dThe training of data and the feature extraction can be more effectively and intuitively carried out;

4) feature vector F to be optimized_dIn an input image deep learning up-sampling module, a reconstructed image is obtainedImage

5) To I_LRObtaining a reconstructed image using an interpolation algorithm

Will be provided with

And

fusing to obtain the final reconstructed image I_SR。

As a further preferable mode, the obtaining of the cascade characteristic in the step 2) specifically includes:

2.1) shallow feature F₀Inputting the feature extraction sub-network composed of n multi-scale attention blocks to respectively obtain n features F_i，i＝1,2,3,…,n。

For the ith multi-scale attention block, the input of the ith multi-scale attention block is the characteristic output F of the previous multi-scale attention block_i-1The output characteristic is F_i。

Each multi-scale attention block consists of a U-shaped structure module, a bottleneck layer structure module and a residual error module.

2.1.1) for the U-shaped structure module in the ith multi-scale attention block, the U-shaped structure module is formed by connecting non-local means, 3 × 3 convolution, 5 × 5 convolution, 7 × 7 convolution, attention mechanism, 5 × 5 convolution, 3 × 3 convolution and non-local means in series; furthermore, there is one Concat layer between the two 3 × 3 convolutions; there is one Concat layer between the two 5 x 5 convolution convolutions. The characteristic input of which is the output characteristic F of the previous multi-scale attention block_i-1After processing by the U-shaped structure module, the characteristic F is obtained_i,0；

For the feature input F of non-local mean, firstly, the input features F are respectively input into three convolution layers in parallel to obtain three features F_x,F_y,F_zThen, these three features are operated in through ConcatRow feature fusion and input of fused features into subsequent convolutional layers to obtain feature F_wAnd finally, the feature F_wAnd adding the input characteristic F point by point to obtain the output characteristic of the non-local mean value.

For the input feature F of the attention mechanism, first, the feature F is input to the global pooling layer extraction channel information descriptor M_avgThen, the channel information descriptor M is described_avgInputting the data into the subsequent two convolution layers for further processing to obtain M, and finally multiplying the M and the characteristic F channel by channel to obtain the output characteristic of the attention mechanism.

2.1.2) for the bottleneck layer structure module in the ith multi-scale attention block, the bottleneck layer structure module is composed of two bottleneck layers in series. The input of which is the output characteristic F of the U-shaped structure module in the multi-scale attention block_i,0Obtaining a characteristic F after the processing of the bottleneck layer structure module_i,2；

2.1.3) for the residual module in the ith multi-scale attention block, the residual module is the point-by-point addition of the output characteristics of the previous multi-scale attention block and the output characteristics of the bottleneck layer structure. The input of which is the output characteristic F of the previous multi-scale attention block_i-1And the output characteristics F of the bottleneck layer structure module in the multi-scale attention block_i,2Obtaining a characteristic F after processing by a residual error module_i；

2.2) feature F output for n multiscale attention blocks_iI-1, 2,3, …, n using Concat ligation, yields cascade characteristic F_c：

F_c＝Concat(F₁,F₂,...,F_n)

Where Concat (. cndot.) represents the operation of concatenating the features of the n multi-scale attention block outputs.

As a further preferable mode, the step 3) is specifically:

3.1) cascading feature F_cInputting the convolution layer with convolution kernel of 1, reducing the number of parameters and obtaining optimized characteristic F_d:

F_d＝Conv_1×1(F_c)

Wherein，Conv_1×1(. cndot.) represents a convolution operation with a convolution kernel of 1.

As a further preferred mode, the step 4) of obtaining the reconstructed image through the image deep learning upsampling module

The image deep learning up-sampling module consists of a convolution layer with convolution kernel of 3 and a sub-pixel convolution layer. Optimized feature F_dObtaining a reconstructed image through an image deep learning up-sampling module

The specific process comprises the following steps:

4.1) optimized feature F using a convolution layer pair with convolution kernel 3_dRearranging to obtain a characteristic F_e：

F_e＝Conv_3×3(F_d)

Wherein, Conv_3×3(. cndot.) represents a convolution operation with a convolution kernel of 3.

4.2) feature F after rearrangement_eInputting into a sub-pixel convolution layer, enlarging it to corresponding scale, and obtaining reconstructed image

Wherein H_Sp(. cndot.) denotes a sub-pixel convolution operation.

As a further preferred mode, the step 5) of obtaining the final reconstruction image I_SRThe method comprises the following specific steps:

5.1) for low resolution images I_LRObtaining a reconstructed image after interpolation by using a bicubic linear interpolation algorithm

5.2) reconstructing the image obtained by the image deep learning up-sampling module

And the reconstructed image after the interpolation of the bicubic linear interpolation algorithm

Fusing to obtain the final reconstructed image I_SR：

Although the super-resolution reconstruction method using the interpolation algorithm is high in reconstruction speed, redundant information is added, and the super-resolution reconstruction effect is poor; the image super-resolution reconstruction method based on the deep learning lacks a reasonable guide in the reconstruction process, so that partial detail information in the reconstructed image is lost. According to the method, the reconstruction result of the interpolation algorithm is used as the guidance of the reconstruction process of the image super-resolution reconstruction method based on the deep learning, so that the image super-resolution reconstruction effect can be improved, and the redundant information generated by the interpolation algorithm can be removed.

The invention provides an image super-resolution reconstruction method of a multi-scale attention cascade network, which comprises the steps of extracting shallow features of a low-resolution image by using convolution operation; then, inputting the shallow layer characteristics into a characteristic extraction subnet to obtain cascade characteristics; further, the cascade characteristic passes through a convolution layer with convolution kernel of 1 to obtain an optimized characteristic vector; inputting the optimized feature vector into an image deep learning up-sampling module to obtain a reconstructed image

At the same time, for low resolution images I_LRObtaining a reconstructed image using an interpolation algorithm

Finally, the image is reconstructed

And

fusing to obtain the final high-resolution reconstructed image I_SR. By applying the method, the problem of low detail definition of the reconstructed image of the existing image super-resolution reconstruction method is solved, and the impression is improved; the method also solves the problem that the existing image super-resolution reconstruction algorithm based on deep learning cannot fully extract the low-resolution image features. The method is suitable for super-resolution reconstruction of the image, and the reconstructed image obtained by using the method for super-resolution reconstruction of the image has high definition, more real texture and good perception effect.

Advantageous effects

Firstly, the invention adopts a group of multi-scale attention blocks to extract the characteristics of the low-resolution image, and can fully utilize the detail information of the low-resolution image; secondly, super-resolution reconstruction is realized by combining interpolation reconstruction and deep learning-based network reconstruction, and the reconstruction effect of the reconstructed image is improved.

Drawings

FIG. 1 is a flow chart of an image super-resolution reconstruction method based on a multi-scale attention cascade network according to the present invention;

FIG. 2 is a network structure diagram of the image super-resolution method based on the multi-scale attention cascade network of the present invention;

FIG. 3 is a block diagram of a multi-scale attention module designed by the present invention;

Detailed Description

Finally, the image is reconstructed

And

fusing to obtain the final high-resolution reconstructed image I_SR. The method is suitable for super-resolution reconstruction of images, and the super-resolution reconstruction is carried out by using the method, so that the obtained high-resolution images are high in definition, more real in texture and good in perception effect.

As shown in fig. 1, the present invention comprises the steps of:

1) low resolution image I_LRAs input to the multi-scale attention cascade network, a convolution operation is used to extract the low-resolution image I_LRMiddle extracted shallow feature F₀：

F₀＝H_sf(I_LR)

Wherein H_sf() Representing a convolution operation.

2) Shallow feature F₀Inputting a feature extraction subnet formed by a group of multi-scale attention blocks, cascading features output by each multi-scale attention block in the subnet by using Cancat operation to obtain cascading features F_c；

2.1.1) for the U-shaped structure module in the ith multi-scale attention block, the module is composed of non-local mean, 3 × 3 convolution, 5 × 5 convolution, 7 × 7 convolution, attention mechanism, 5 × 5 convolution, 3 × 3 convolution and non-local mean in series connection; furthermore, there is one Concat layer between the two 3 × 3 convolutions; there is one Concat layer between the two 5 x 5 convolution convolutions. The input of which is the output characteristic F of the previous multi-scale attention block_i-1The output of the U-shaped structural module is characteristic F_i,0；

Will be characterized by F_i-1Input to a U-shaped building block to obtain a feature F_i,0：

F_i,0＝H_u(F_i-1)

Wherein H_uRepresenting the extraction of features using a U-shaped structure module.

2.1.2) for the bottleneck layer structure module in the ith multi-scale attention block, the bottleneck layer structure module is composed of two bottleneck layers in series. The input of which is the output characteristic F of the U-shaped structure module in the multi-scale attention block_i,0The output of the bottleneck layer structure module is a characteristic F_i,2；

Each bottleneck layer structure module consists of two bottleneck layers. The feature input of the first bottleneck layer is the feature output F of the previous multi-scale attention block_i-1The characteristic output is F_i,1：

F_i,1＝H_b(F_i-1)

Wherein H_b(. cndot.) represents the first bottleneck layer operation.

Further extracting the characteristics F of the U-shaped structural module_i,0And the output F of the first bottleneck layer_i,1Inputting the data into a second bottleneck layer for detail fusion to obtain a characteristic F_i,2：

F_i,2＝H_c(F_i,0,F_i,1)

Wherein H_c(. cndot.) represents a second bottleneck layer operation.

2.1.3) for the residual module in the ith multi-scale attention block, the module is the output characteristic of the previous multi-scale attention block and the output characteristic of the bottleneck layer structureAdding point by point. The input of which is the output characteristic F of the previous multi-scale attention block_i-1And the output characteristics F of the bottleneck layer structure module in the multi-scale attention block_i,2The output of the residual block is a feature F_i；

Output characteristics F of a previous multi-scale attention block_i-1And feature F_i,2Inputting the residual error into a residual error module to obtain a characteristic F_i：

F_i＝F_i-1+F_i,2

F_c＝Concat(F₁,F₂,...,F_n)

F_d＝Conv_1×1(F_c)

Wherein, Conv_1×1(. cndot.) represents a convolution operation with a convolution kernel of 1.

4) Feature vector F to be optimized_dObtaining a reconstructed image in an input image deep learning up-sampling module

F_e＝Conv_3×3(F_d)

Wherein H_Sp(. cndot.) represents a subpixel convolution layer operation.

5) To I_LRObtaining a reconstructed image using an interpolation algorithm

Will be provided with

And

fusing to obtain the final reconstructed image I_SR。

Fusing to obtain the final reconstructed image I_SR：

Wherein, I_SRAnd obtaining a final image super-resolution reconstruction result.

The invention has wide application in the field of image restoration, such as putting large-size photo billboards, reducing image transmission pressure, enlarging thumbnails and the like. The present invention will now be described in detail with reference to the accompanying drawings.

2) Shallow feature F₀Inputting a feature extraction subnet formed by a group of multi-scale attention blocks, cascading features output by each multi-scale attention block in the subnet to obtain cascading features F_c；

3) Will cascade feature F_cReducing the number of parameters by a convolution layer with convolution kernel of 1 to obtain optimized characteristic vector F_d；

5) For low resolution image I_LRObtaining a reconstructed image by adopting a bicubic linear interpolation algorithm

Will be provided with

And

fusing to obtain the final reconstructed image I_SR。

The method is realized based on a PyTorch deep learning framework under NVIDIA GeForce GTX 1080Ti and Ubuntu 16.0464 bit operating systems.

The invention provides an image super-resolution reconstruction method based on a multi-scale attention cascade network. The method is suitable for super-resolution reconstruction of images, and the reconstructed images obtained by using the method for super-resolution reconstruction are high in definition, more real in texture and good in perception effect.

Claims

1. An image super-resolution reconstruction method based on a multi-scale attention cascade network is characterized by comprising the following steps:

step 1) low resolution image I_LRAs input to the multiscale attention cascade network, pair I_LRPerforming a convolution operation to extract shallow feature F₀；

Step 2) shallow feature F₀Inputting a feature extraction subnet formed by n multi-scale attention blocks, cascading features output by each multi-scale attention block in the subnet to obtain cascading features F_c；

Step 3) cascading characteristics F_cObtaining optimized characteristic vector F by convolution layer with convolution kernel of 1_dOptimized feature vector F_dThe data training and feature extraction can be more effectively and intuitively carried out, and the specific expression is as follows:

F_d＝Conv_1×1(F_c)

wherein, Conv_1×1(. -) represents a convolution operation with a convolution kernel of 1;

step 4) optimizing the feature vector F_dObtaining a reconstructed image in an input image deep learning up-sampling module

Step 5) for the low-resolution image I_LRObtaining a reconstructed image by adopting a bicubic linear interpolation algorithm

Will be provided with

And

fusing to obtain the final reconstructed image I_SRThe method comprises the following steps:

2. the image super-resolution reconstruction method based on the multi-scale attention cascade network as claimed in claim 1, wherein: the feature extraction subnet described in step 2 is composed of n multi-scale attention blocks, wherein the input of the ith multi-scale attention block is the feature output F of the previous multi-scale attention block_i-1The output characteristic is F_i；

Furthermore, each multi-scale attention block consists of a U-shaped structure module, a bottleneck layer structure module and a residual error module;

for the U-shaped structure module in the ith multi-scale attention block, the input of the U-shaped structure module is the output characteristic F of the previous multi-scale attention block_i-1After processing by the U-shaped structure module, the characteristic F is obtained_i,0；

For a bottleneck layer structure module in the ith multi-scale attention block, the bottleneck layer structure module is formed by connecting two bottleneck layers in series; output characteristic F of previous multi-scale attention block_i-1Inputting a first bottleneck layer, inputting the output of the first bottleneck layer into a second bottleneck layer, and simultaneously receiving the output characteristic F of the U-shaped structure module in the multi-scale attention block by the second bottleneck layer_i,0Obtaining the output characteristic F of the bottleneck layer structure module through the process_i,2；

For the residual module in the ith multi-scale attention block, specifically: output characteristics F of previous multi-scale attention block_i-1And output characteristics F of the bottleneck layer structure_i,2Adding point by point to obtain characteristic F_i。

3. The image super-resolution reconstruction method based on the multi-scale attention cascade network as claimed in claim 2, wherein:

the U-shaped structure module is formed by connecting non-local mean values, 3 multiplied by 3 convolution, 5 multiplied by 5 convolution, 7 multiplied by 7 convolution, attention mechanism, 5 multiplied by 5 convolution, 3 multiplied by 3 convolution and non-local mean values in series; wherein, a Concat layer is contained between two 3 × 3 convolutions; two 5 × 5 convolution convolutions contain one Concat layer in between; the input to the first 5 x 5 convolution is the sum of the output of the first non-local mean and the output of the first 3 x 3 convolution, the input to the 7 x 7 convolution is the sum of the output of the first 5 x 5 convolution and the output of the first non-local mean, the input to the second 5 x 5 convolution is the Concat of the output of the first 5 x 5 convolution and the attention mechanism output, and the input to the second 3 x 3 convolution is the Concat of the output of the first 3 x 3 convolution and the output of the second 5 x 5 convolution.

4. The image super-resolution reconstruction method based on the multi-scale attention cascade network as claimed in claim 2, wherein: the non-local mean value is formed by connecting three convolutions with convolution kernel 1 in parallel and a convolution for feature fusion in series;

for the feature input F of non-local mean, firstly, the input features F are respectively input into three convolution layers in parallel to obtain three features F_x,F_y,F_zThen, the three features are subjected to feature fusion through Concat operation, and the fused features are input into a subsequent convolution layer to obtain features F_wAnd finally, the feature F_wAnd adding the input characteristic F point by point to obtain the output characteristic of the non-local mean value.

5. The image super-resolution reconstruction method based on the multi-scale attention cascade network as claimed in claim 2, wherein: the attention mechanism is formed by sequentially connecting a global pooling layer and two convolutions;

for the input feature F of the attention mechanism, first, the feature F is input to the global pooling layer extraction channel information descriptor M_avgThen, the channel information descriptor M is described_avgInput into the subsequent two convolution layersAnd further processing to obtain M, and finally multiplying the M and the characteristic F channel by channel to obtain the output characteristic of the attention mechanism.

6. The image super-resolution reconstruction method based on the multi-scale attention cascade network as claimed in claim 2, wherein: the bottleneck layer is formed by connecting two convolution layers in series.

7. The image super-resolution reconstruction method based on the multi-scale attention cascade network as claimed in claim 1, wherein: the cascade operation described in step 2 is specifically represented as follows:

F_c＝Concat(F₁,F₂,...,F_n)

where Concat (. cndot.) represents an operation of concatenating features of n multi-scale attention block outputs, F_iI 1,2,3, n denotes the characteristics of the n multi-scale attention block outputs.

8. The method for reconstructing image super resolution based on multi-scale attention cascade network as claimed in claim 1, wherein the step 4) of obtaining the reconstructed image in the image deep learning up-sampling module

The method specifically comprises the following steps:

F_e＝Conv_3×3(F_d)

Wherein, Conv_3×3(. h) represents a convolution operation with a convolution kernel of 3;

Wherein H_Sp(. cndot.) denotes a sub-pixel convolution operation.