CN114757830A

CN114757830A - Image super-resolution reconstruction method based on channel-diffusion double-branch network

Info

Publication number: CN114757830A
Application number: CN202210488529.7A
Authority: CN
Inventors: 张铭津; 彭晓琪; 张鹏; 郭杰; 李云松; 高新波
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2022-07-15
Anticipated expiration: 2042-05-06
Also published as: CN114757830B

Abstract

The invention discloses an image super-resolution reconstruction method of a channel-diffusion double-branch network, which mainly solves the problems of insufficient texture details and structural distortion of the existing method and comprises the following implementation steps: constructing a training sample set and a testing sample set; constructing a channel-diffusion double branch network: constructing a channel-diffusion double-branch network of a first convolution layer, D channel-diffusion residual modules, a second convolution layer and an up-sampling module which are connected in sequence; the channel-diffusion residual block comprises an adaptive convolution module and a channel attention branch and a diffusion branch which are connected with the adaptive convolution module and are arranged in parallel; the adaptive convolution module comprises a plurality of convolution layers and a plurality of nonlinear activation layers; the diffusion branch comprises a P-M diffusion layer, a convolution layer and a nonlinear activation layer; the channel attention branch comprises a pooling layer, a plurality of nonlinear activation layers and a plurality of convolution layers; and performing iterative training on the channel-diffusion double-branch network. The method can obtain clear and accurate super-resolution reconstructed images.

Description

Image super-resolution reconstruction method based on channel-diffusion double-branch network

Technical Field

The invention belongs to the technical field of image processing, relates to an image reconstruction method, and particularly relates to an image super-resolution reconstruction method based on a channel-diffusion double-branch network, which can be used in the technical fields of pedestrian re-identification and the like.

Background

In the image acquisition process, due to the limitation of factors such as imaging equipment, shooting distance, light and the like, the resolution of the shot picture is often too low, and the quality is poor. In order to obtain higher resolution images, super-resolution reconstruction techniques are generally employed. Image super-resolution reconstruction is a technique for generating a high-resolution image from a low-resolution image. In the field with strict requirements on imaging quality, such as pedestrian re-identification, the image is required to have higher resolution, and structural distortion and edge texture loss do not occur in the image so as to prevent identification errors. At present, three methods, namely interpolation-based method, reconstruction-based method and learning-based method, are mainly used as common super-resolution methods. The image restored by the interpolation method has the phenomena of blurring, sawtooth and the like. Compared with the method based on interpolation, the method based on reconstruction makes great progress, but the effect is still poor. The main idea of the learning-based super-resolution algorithm is to learn the correspondence between the low-resolution image and the high-resolution image, and to guide the super-resolution reconstruction of the images according to the correspondence.

In recent years, deep learning is rapidly developed, and many researchers combine the super-resolution reconstruction with the super-resolution reconstruction of images to achieve effective results. For example, in the patent document "super-resolution reconstruction system and method for single image" (patent application No. 202010218624.6, application publication No. CN 111402140 a) applied by the university of metrology in china, a super-resolution reconstruction method for single image is proposed, which includes: extracting features from the original low-resolution image through the two convolution layers by using an embedded network; reconstructing high-resolution residual error characteristics from the low-resolution characteristics by using two cascaded fine extraction blocks through a coarse-to-fine method; the reconstructed high-resolution residual error characteristics are sent to a reconstruction network to obtain a residual error image through deconvolution operation; and adding the up-sampled low-resolution image and the high-resolution residual image to obtain a finally reconstructed high-resolution image. Although the resolution of the reconstructed image is improved, all information in the image is processed identically, important high-frequency information such as texture details, structures and spatial positions is not recovered, and further improvement of the image reconstruction performance is limited.

Disclosure of Invention

The invention aims to provide an image super-resolution reconstruction method based on channel-diffusion double branches aiming at overcoming the defects of the prior art, and aims to strengthen the extraction capability of high-frequency information through a channel-diffusion double branch network so as to obtain a reconstructed image with more accurate structure and more complete edge details.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

(1) constructing a training sample set and a testing sample set:

(1a) acquiring N RGB images, and performing 1/4 downsampling on each RGB image to obtain N downsampled RGB images, wherein N is larger than or equal to 100;

(1b) respectively cutting the N RGB images into image blocks with the size of L multiplied by L to obtain H image blocks in total, and simultaneously cutting the down-sampled RGB image corresponding to each RGB image into the image blocks with the size of L multiplied by L

Obtaining H down-sampled image blocks, and using each clipped image block as the corresponding down-sampled clipped image blockSelecting M downsampled image blocks and labels corresponding to the image blocks to form a training sample set R, and forming other downsampled image blocks and labels corresponding to the image blocks to form a test sample set E, wherein L is larger than or equal to 192, and M is larger than or equal to 1/2H;

(2) building a channel-diffusion double-branch network model O:

constructing a channel-diffusion double-branch network model O of a first convolution layer, D channel-diffusion residual modules, a second convolution layer and an up-sampling module which are connected in sequence; the channel-diffusion residual block comprises an adaptive convolution module and a channel attention branch and a diffusion branch which are connected with the adaptive convolution module and are arranged in parallel; the adaptive convolution module comprises a plurality of convolution layers and a plurality of nonlinear activation layers; the diffusion branch comprises a P-M diffusion layer, a convolution layer and a nonlinear activation layer; the channel attention branch comprises a pooling layer, a plurality of nonlinear active layers and a plurality of convolution layers;

(3) performing iterative training on a channel-diffusion double-branch network model O:

(3a) the number of initialization iterations is S, the maximum number of iterations is S, S is more than or equal to 10000, and the channel-diffusion double branch network model of the S-th iteration is O_s，O_sThe weight and the bias parameter are w respectively_s、b_sAnd let s be 1, O_s＝O；

(3b) Taking a training sample set R as an input of a channel-diffusion double-branch network O, and performing feature extraction on each training sample by using a first convolution layer; n extracted by D channel-diffusion residual module pairs₁Performing feature mapping on the amplitude feature map to obtain n₁A non-linear characteristic diagram, a second convolution layer pair n₁Extracting the features of the nonlinear feature map, and extracting n from the first convolution layer₁Adding the amplitude characteristic diagrams element by element; the up-sampling module performs up-sampling and dimension transformation on the added images to obtain M super-resolution reconstruction images, wherein n is₁The number of convolution kernels of the first convolution layer;

(3c) computing a loss function using the L1 norm, and computing O from each reconstructed image and its corresponding training sample label_sLoss value L of_sSeparately calculating L by the chain rule_sTo networkWeight parameter omega in_sAnd bias parameter b_sPartial derivatives of

And

and according to

For omega_s、b_sUpdating to obtain a channel-diffusion double-branch network model of the iteration;

(3d) judging whether S is more than or equal to S, if so, obtaining a trained channel-diffusion double-branch network model O, otherwise, making S be S +1, and executing the step (3 b);

(4) acquiring an image super-resolution reconstruction result:

and (3) performing forward propagation by taking the test sample set E as the input of the trained channel-diffusion double-branch network model O to obtain reconstructed images corresponding to all the test samples.

Compared with the prior art, the invention has the following advantages:

the channel-diffusion double-branch network model O constructed by the method comprises D channel-diffusion residual modules, each channel-diffusion residual module comprises an adaptive convolution module and a channel attention branch and a diffusion branch which are connected with the adaptive convolution module and are arranged in parallel, in the process of training the model and obtaining an image super-resolution reconstruction result, the adaptive convolution has a larger receptive field so as to learn the characteristics of an image more comprehensively, the channel attention branch can distribute different weights to different channel characteristics so as to strengthen the expression of important semantics, and the diffusion branch can distribute different weights to different space information so as to strengthen the recovery of important areas.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a schematic diagram of a channel-diffusion double branch network according to the present invention;

FIG. 3 is a schematic diagram of a channel-diffusion residual module according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an adaptive convolution module according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a P-M diffusion layer according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples.

Referring to fig. 1, the present invention includes the steps of:

step 1: constructing a training sample set and a testing sample set:

Obtaining H downsampled image blocks, taking each cropped image block as a label of the corresponding downsampled cropped image block, selecting M downsampled image blocks and corresponding labels thereof to form a training sample set R, and forming the rest downsampled image blocks and corresponding labels thereof to form a test sample set E, wherein L is more than or equal to 192, and M is more than or equal to 1/2H;

step 2: a channel-diffusion double-branch network O is built, and the structure of the channel-diffusion double-branch network O is shown in figure 2;

constructing a channel-diffusion double-branch network of a first convolution layer, D channel-diffusion residual modules, a second convolution layer and an up-sampling module which are connected in sequence; the channel-diffusion residual error module comprises an adaptive convolution module and a channel attention branch and a diffusion branch which are connected with the adaptive convolution module and arranged in parallel, wherein D is more than or equal to 10, and in the embodiment, D is more than or equal to 10; the adaptive convolution module comprises a plurality of convolution layers and a plurality of nonlinear activation layers; the diffusion branch comprises a P-M diffusion layer, a convolution layer and a nonlinear activation layer; the channel attention branch comprises a pooling layer, a plurality of nonlinear activation layers and a plurality of convolution layers;

the structure of the channel-diffusion residual error module of the present embodiment is shown in fig. 3;

let the input of the channel-diffusion residual block be M_D-1Firstly, a group of finer features are generated by self-adaptive convolution and are marked as W; and simultaneously inputting W into the channel attention branch and the diffusion branch to respectively generate two groups of attention weighted feature maps, carrying out weighting operation on the two groups of attention weighted feature maps and the W to recalibrate the features, and adding the calibrated features and the W to obtain the output of the channel-diffusion residual error module.

The structure of the adaptive convolution module of the present embodiment is shown in fig. 4;

the adaptive convolution structure comprises two branches which are arranged in parallel, the first branch comprises a third convolution layer, a first nonlinear activation function, a weighted image generation module and a fourth convolution layer which are arranged in parallel, a fifth convolution layer and a second nonlinear activation function which are connected in sequence, the weighted image generation module comprises a sixth convolution layer and a seventh convolution layer which are arranged in parallel and a third nonlinear activation function connected with the sixth convolution layer and the seventh convolution layer, and the second branch comprises an eighth convolution layer, a fourth nonlinear activation function, a ninth convolution layer and a fifth nonlinear activation function which are connected in sequence; in the weighted image generation module, the feature map is downsampled before the sixth convolutional layer to obtain an image with a smaller size, so that the convolutional layer focuses on larger spatial information to obtain more complete related features, the feature map is upsampled to obtain an image with the same size as the input feature map, the feature map is upsampled before the seventh convolutional layer to obtain an image with a smaller size, so that the convolutional layer focuses on detailed information to obtain finer features, the feature map is downsampled to obtain an image with the same size as the input feature map, the two images are added, the added images are subjected to weight calculation through a Sigmoid function, and the feature map output by the fourth convolutional layer is recalibrated to obtain more comprehensive features.

The structure of the P-M diffusion layer of this embodiment is shown in FIG. 5;

inspired by P-M diffusion, a P-M diffusion layer is designed, the layer is an improved residual block, the input tensor is W, and the P-M diffusion equation is as follows:

wherein

For the diffusion coefficient, t is the diffusion step, k is the shape definition constant,

can be expressed as:

where the constant k is a threshold. It is easy to find that in the flat or smooth areas where the gradient is small

Coefficient of diffusion

Will approach 1 and in areas of rich texture or structural detail

The coefficients will be close to zero. Thus, with such spatially varying diffusion coefficients, the P-M diffusion mechanism can preserve image detail while removing noise.

There are the following identities:

W_αα+W_ββ＝W_xx+W_yy (4)

wherein ,W_αα and W_ββSecond order partial derivatives of W in the direction of image gradient and in the direction perpendicular to the image features (edges), respectivelyNumber, W_xx and W_yyThe second partial derivatives of W along the coordinate axes x and y, respectively.

In addition, W_ββCan be expressed as:

wherein ,W_x and W_yFirst partial derivatives of W in the x and y directions, W_xyIs W_xPartial derivatives in the y-direction.

Derived using equation (4):

can be obtained by substituting formula (2):

the vicinity of the edge is where the gradient mode is large, and the gradient direction is not smoothed as much as possible in order to better retain the edge. Therefore, let W_ααThe coefficient being zero, i.e.

If the diffusion step Δ t is set to 1, it can be obtained by substituting equation (6):

the above equation can be expressed as a residual learning block because the right side of the equation can be considered as aw. In each block, Δ W is added to the input tensor W_i。W_x、W_y、W_xy、

And

the residual error results are calculated from the twelfth convolution layer, the thirteenth convolution layer, the fourteenth convolution layer, the fifteenth convolution layer, and the sixteenth convolution layer, respectively, according to equation (7).

Passing W through P-M diffusion layer to obtain W + Δ W, and passing W through a weight of Ω_DA1 x 1 convolution layer, and then rescaling the attention map to [0,1 ] using sigmoid function σ (·)]As shown in the following formula:

M_DA＝σ(Ω_DA(W+ΔW)) (8)

wherein

Is a DA mask, used to spatially recalibrate W,

wherein f_DA(. o) means the multiplication of an input W by an element of the obtained DA map, W_DAIs the result of the recalibration of the DA module. Each M_DA(i, j) corresponds to the DA weight of the spatial location (i, j) of W.

And step 3: performing iterative training on the channel-diffusion double-branch network O:

(3a) the number of initialization iterations is S, the maximum number of iterations is S, S is more than or equal to 10000, and the channel-diffusion double-branch network model of the S-th iteration is O_s，O_sThe weight and bias parameters of the middle learnable parameter are w respectively_s、b_sAnd let s be 1, O_s＝O；

(3b) Taking a training sample set R as an input of a channel-diffusion double-branch network O, and performing feature extraction on any training sample by using a first convolution layer; n extracted from D channel-diffusion residual module pairs₁Performing feature mapping on the amplitude feature map; n obtained by second convolution layer to characteristic mapping₁Extracting the features of the nonlinear feature map, and extracting n from the first convolution layer₁Adding the amplitude characteristic diagrams element by element; the up-sampling module comprises PixelShuffle and a convolutional layer, PixelShuffle to n₁The amplitude characteristic diagram is obtained after 4 times of upsampling

Reconstructing the image, and setting the number of the channels of the convolution kernel to be 1, namely reconstructing the image

Converting the reconstructed image into dimension transformation to obtain a super-resolution reconstructed image, wherein n is₁Performing the above processing on the M images for the number of convolution kernels of the first convolution layer and the second convolution layer;

(3c) computing a loss function using the L1 norm, and computing O from each reconstructed image and its corresponding training sample label_sLoss value L of_sSeparately calculating L by the chain rule_sFor weight parameter omega in network_sAnd bias parameter b_sPartial derivatives of

And

and according to

For omega_s、b_sIs updated according to

For omega_s、b_sThe updating formulas for updating are respectively as follows:

wherein ,

representing the reconstructed image, I representing the labels of the samples in the training sample set, w_s、b_sRepresents O_sWeight, bias parameter, w of all learnable parameters_s'、b_s' indicates updated learnable parameters,/_rIndicates the learning rate, L_sIs a function of the loss as a function of,

representing a derivative operation.

(3d) Judging whether S is more than or equal to S, if so, obtaining a trained channel-diffusion double-branch network model O, otherwise, making S equal to S +1, and executing the step (3 b);

and 4, step 4: acquiring an image reconstruction result:

and performing forward propagation by taking the test sample set E as the input of the trained channel-diffusion double-branch network model O to obtain reconstructed images corresponding to all the test samples.

The technical effects of the present invention can be further illustrated by the following simulation experiments

1. Simulation conditions and contents:

the hardware platform of the simulation experiment is as follows: the processor is an Intel (R) Core i9-9900K CPU, the main frequency is 3.6GHz, the memory is 32GB, and the display card is NVIDIA GeForce RTX 2080 Ti. The software platform of the simulation experiment is as follows: ubuntu 16.04 operating system, python version 3.7, pitorch version 1.7.1.

The RGB image dataset used in the simulation experiment was the DIV2K dataset. The training data Set used in the simulation experiment is the DIV2K data Set, and the test Set is the Set5 data Set.

The peak snr of the prior art data Set at Set5 was 37.78dB, and the peak snr of the invention data Set at Set5 was 38.14dB, respectively, with the results shown in table 1. Compared with the prior art, the peak signal-to-noise ratio is remarkably improved.

TABLE 1

Method	Prior Art	The invention
			PSNR	37.78dB	38.14dB
SSIM	0.9042	0.9612

Claims

1. An image super-resolution reconstruction method based on a channel-diffusion double-branch network is characterized by comprising the following steps:

(1) constructing a training sample set and a testing sample set:

(1b) respectively cutting N RGB images into image blocks of L multiplied by L size to obtain H image blocks in total, and simultaneously cutting the down-sampled RGB image corresponding to each RGB image into image blocks of L multiplied by L size

Obtaining H down-sampled image blocks, and using each clipped image block as a corresponding down-sampled and clipped imageSelecting M downsampled image blocks and labels corresponding to the image blocks to form a training sample set R, and forming other downsampled image blocks and labels corresponding to the image blocks to form a test sample set E, wherein L is larger than or equal to 192, and M is larger than or equal to 1/2H;

(2) constructing a channel-diffusion double-branch network O:

constructing a channel-diffusion double-branch network O comprising a first convolution layer, D channel-diffusion residual modules, a second convolution layer and an up-sampling module which are sequentially connected; the channel-diffusion residual block comprises an adaptive convolution module and a channel attention branch and a diffusion branch which are connected with the adaptive convolution module and are arranged in parallel; the adaptive convolution module comprises a plurality of convolution layers and a plurality of nonlinear activation layers; the diffusion branch comprises a P-M diffusion layer, a convolution layer and a nonlinear activation layer; the channel attention branch comprises a pooling layer, a plurality of nonlinear active layers and a plurality of convolution layers, and D is more than or equal to 10;

(3) performing iterative training on the channel-diffusion double-branch network O:

(3b) Taking a training sample set R as an input of a channel-diffusion double-branch network O, and performing feature extraction on each training sample by using a first convolution layer; n extracted by D channel-diffusion residual module pairs₁Performing feature mapping on the amplitude feature map; n obtained by second convolution layer to characteristic mapping₁Extracting the characteristic of the nonlinear characteristic diagram, and extracting n from the first convolution layer₁Adding the amplitude characteristic diagrams element by element; the up-sampling module adds the obtained n₁Performing dimension transformation after up-sampling the characteristic map to obtain a super-resolution reconstruction image, wherein n is₁The number of convolution kernels of the first convolution layer and the second convolution layer;

(3c) calculating a loss function by adopting an L1 norm, and calculating O through each reconstructed image and a corresponding training sample label_sLoss value L of_sSeparately calculating L by the chain rule_sFor weight parameter omega in network_sAnd bias parameter b_sPartial derivatives of

And

and according to

(4) obtaining an image reconstruction result:

2. The method for reconstructing an image with super resolution based on a channel-diffusion double branch network according to claim 1, wherein the channel-diffusion double branch network O in step (2), wherein:

constructing a channel-diffusion double-branch network O of a first convolution layer, D channel-diffusion residual modules, a second convolution layer and an up-sampling module which are connected in sequence; the channel-diffusion residual block comprises an adaptive convolution module and a channel attention branch and a diffusion branch which are connected with the adaptive convolution module and are arranged in parallel; the adaptive convolution module comprises a plurality of convolution layers and a plurality of nonlinear activation layers; the diffusion branch comprises a P-M diffusion layer, a convolution layer and a nonlinear activation layer; the channel attention branch comprises a pooling layer, a plurality of nonlinear activation layers and a plurality of convolution layers;

the number n of convolution kernels of the first convolution layer and the second convolution layer₁64, the convolution kernel size is 3 x 3;

the number D of the channel-diffusion residual modules is 10; the number of convolution layers in the self-adaptive convolution module is 7, the number of the nonlinear activation function layers is 5, the self-adaptive convolution module comprises two branches which are arranged in parallel, the first branch comprises a third convolution layer, a first nonlinear activation function, a weighted image generation module, a fourth convolution layer, a fifth convolution layer and a second nonlinear activation function which are sequentially connected, the weighted image generation module comprises a sixth convolution layer and a seventh convolution layer which are arranged in parallel and a third nonlinear activation function connected with the sixth convolution layer and the seventh convolution layer, the second branch comprises an eighth convolution layer, a fourth nonlinear activation function, a ninth convolution layer and a fifth nonlinear activation function which are sequentially connected; the specific parameters are as follows: the third convolution layer, the eighth convolution layer, the fifth convolution layer, the sixth convolution layer, the seventh convolution layer, and the ninth convolution layer have a convolution kernel size of 1 × 1, and the fourth convolution layer, the fifth convolution layer, the sixth convolution layer, the seventh convolution layer, and the ninth convolution layer have a convolution kernel size of 3 × 3; the first nonlinear activation function, the second nonlinear activation function and the fifth nonlinear activation function are realized by a ReLU function, and the third nonlinear activation function is realized by a Sigmoid function;

the number of the convolutional layers contained in the channel attention branch is 2, the number of the nonlinear active layers is 2, and the specific structure of the channel attention branch is a pooling layer, a tenth convolutional layer, a sixth nonlinear active layer, an eleventh convolutional layer and a seventh nonlinear active layer which are sequentially cascaded; the specific parameters are as follows: the convolution kernels of the tenth convolution layer and the eleventh convolution layer are 1 x 1 in size, the pooling layer is set to be maximum pooling, the sixth nonlinear active layer is realized by a ReLU function, and the seventh nonlinear active layer is realized by a Sigmoid function;

the specific structure of the diffusion branch comprises a P-M diffusion layer, a convolution layer and a nonlinear activation layer which are sequentially connected, wherein the P-M diffusion layer comprises a twelfth convolution layer, a thirteenth convolution layer, a fourteenth convolution layer, a fifteenth convolution layer and a sixteenth convolution layer which are arranged in parallel; the specific parameters are as follows: the convolution kernel size of the convolution layer between the P-M diffusion layer and the nonlinear active layer is 1 x 1, the convolution kernel size of the twelfth convolution layer, the thirteenth convolution layer and the fourteenth convolution layer is 3 x 3, the convolution kernel size of the fifteenth convolution layer and the sixteenth convolution layer is 5 x 5, and the nonlinear active function is realized by a Sigmoid function;

the upsampling module is implemented by PixelShuffle, and the amplification parameter is 4.

3. The method for reconstructing image super resolution based on channel-diffusion double branch network of claim 1, wherein the L1 norm calculation loss function L in step (3c)_sAnd according to

wherein ,

representing the reconstructed image, I representing the labels of the samples in the training sample set, w_s、b_sRepresents O_sWeight, bias parameter, w of all learnable parameters_s'、b_s' indicates updated learnable parameters,/_rIndicates the learning rate, L_sIs a function of the loss of the signal,

representing a derivative operation.