CN111260585A

CN111260585A - Image recovery method based on similar convex set projection algorithm

Info

Publication number: CN111260585A
Application number: CN202010060321.6A
Authority: CN
Inventors: 武宇喆; 牛毅; 石光明
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-01-19
Filing date: 2020-01-19
Publication date: 2020-06-09

Abstract

The invention discloses an image recovery algorithm based on a convex set projection algorithm, which is used for solving the problem that a large amount of noise exists in a characteristic graph of the conventional image recovery network. The method comprises the following implementation steps: 1) constructing a training sample set and a testing sample set; 2) constructing a displayed characteristic denoising module; 3) constructing an image recovery network model PL-DSR based on a convex set projection algorithm; 4) training a PL-DSR network model; 5) testing a PL-DSR network model; 6) a restored image is generated. The invention constructs a displayed feature denoising module based on the traditional convex set projection algorithm, is used for inhibiting noise in a feature map, introduces the noise into a classic EDSR network, and constructs an image recovery network PL-DSR based on the convex set projection algorithm. The network can effectively inhibit noise in the characteristic diagram, extracts the characteristics with clearer structurization, further improves the recovery effect and the visual sense experience of people while reducing the complexity of the model, and can be used for finishing post-production and processing of photography and film and television works.

Description

Image recovery method based on similar convex set projection algorithm

Technical Field

The invention belongs to the technical field of digital image processing, and particularly designs an image recovery method based on a convex set projection algorithm, which is used for recovering degraded images and can be used for finishing post-production processing of photographic and movie works.

Background

An important branch of the computer vision field is image generation, which includes image restoration, image coloring, image semantic segmentation, style conversion of images or videos, and the like. In the field of computer vision, image restoration is a morbid inverse problem, that is, under the condition of specifying a low-resolution image, a degradation model of the image is established by using the prior knowledge of a degradation process, and an original high-resolution image is restored.

The conventional image restoration model mainly includes: 1) interpolation-based algorithms (nearest neighbor, bilinear, bicubic, etc.); 2) example learning based methods (dictionary learning). The interpolation-based restoration algorithm does not fully utilize the prior knowledge between the low-resolution image pair and the high-resolution image pair, so the quality of the obtained restored image is poor. Although the prior knowledge is well utilized by designing a low-resolution dictionary and a high-resolution dictionary in dictionary learning, the most effective information cannot be extracted due to the manual design of the characteristics of the algorithm, so that the recovery effect is influenced to a certain extent.

In recent years, with the rise of deep learning, many scholars apply deep neural networks to the field of image restoration, such as SRCNN, VDSR, ESPCN, SRResNet, EDSR, and the like. The neural network model has the capability of self-adaptively extracting the features, and the defect of manually extracting the features in the traditional dictionary learning is overcome, so that the image recovery quality is improved.

Due to space limitations, we introduce only two newer classical network models, SRResNet and EDSR:

he et al was used for the first time in the paper "Photo-iterative single image super-iterative using a genetic adaptive network" to construct a residual block in a ResNet network, and a SRResNet was proposed. However, residual block in ResNet is used to solve high-level vision problems, and if it is directly applied to low-level vision problems such as image restoration, it will cause certain impact. EDSR was proposed by Lim et al in an Enhanced deep residual network for single image super-resolution paper with the BN layer in the original residual block removed. The network can reduce the amount of model calculation and obtain better effect than SRResNet. In order to further study the factors of performance improvement, we can respectively visualize the feature maps extracted by the two networks, and find that the features extracted by EDSR contain less noise than SRResNet, and the structured high-frequency information is richer, which indicates that the noise level in the features affects the recovery result. However, none of the existing image restoration networks takes into account the influence of the noise level in the features on the restoration result.

Disclosure of Invention

The invention aims to provide an image recovery network PL-DSR based on a convex set projection algorithm aiming at the defects in the prior art. Compared with the existing image recovery network, the PL-DSR can extract more abundant features, and the image recovery effect is improved while the network parameter quantity is reduced.

The technical idea of the invention is as follows: firstly, a displayed feature denoising module is constructed based on a traditional convex set projection algorithm and is used for suppressing noise in a feature map. Then, the module is introduced into a classic EDSR network, and an image recovery network PL-DSR based on a convex set projection algorithm is constructed. The network can effectively inhibit noise in the characteristic diagram, extracts the characteristics with clearer structuralization, and further improves the recovery effect and the visual and sensory experience of people while reducing the complexity of the model. The method comprises the following specific steps:

(1) constructing a training sample set and a testing sample set:

(1a) acquiring a high-resolution (HR) image in a DIV2K public data set, obtaining N image blocks with the resolution size of S multiplied by S through the HR, and obtaining a low-resolution (LR) image block corresponding to each image block through bicubic downsampling. Wherein N is more than or equal to 200000, and S is 19;

(1b) HR images in the Set5, Set14, BSD100, Urban100, and Manga109 public data sets are acquired, and bicubic downsampling is performed on the HR images to obtain LR images corresponding to the HR images.

(2) Constructing a displayed feature denoising module:

(2a) constructing a core idea of a characteristic denoising module:

the core idea of designing a feature denoising module is to generalize a traditional soft threshold algorithm (softthreshold di) through a networkng). From a traditional signal processing perspective, convolution is a "transform" process, and denoising features is essentially denoising in the transform domain. The soft threshold algorithm is a classical method for transform domain denoising. Soft threshold function G_st(x) Expressed as:

where x is the original coefficient and τ represents the threshold, is a non-negative constant defined as

σ_nAnd σ represents the standard deviation of the zero-mean gaussian noise distribution and the noise-free laplacian transform coefficient, respectively.

(2b) Designing a soft threshold module:

the key point of designing a soft threshold module (ST block) is that the network can adaptively learn the threshold corresponding to each feature map to complete the task of feature denoising. The whole module is constructed by global pooling, an automatic encoder and threshold generation, and feature denoising is carried out, wherein:

global pooling, including a global average potential posing (GAP) layer, for extracting the average energy of each channel of the input feature map;

an auto-encoder, comprising two simple fully-connected (FC) layers, for extracting principal components of the mean vector obtained after global pooling, and then generating a corresponding scaling factor vector;

generating a soft threshold value, namely multiplying the mean vector by a scaling factor to obtain a corresponding soft threshold value factor;

characteristic denoising: the generated soft threshold and the input feature map are subjected to a soft threshold function G_stAnd completing feature denoising.

(2c) Designing an adaptive soft threshold module:

in fact, there are two drawbacks to applying ST block directly to an EDSR: 1) ST block is not a lightweight module and the extracted features may be redundant. To solve this problem, the number of channels in the first FC layer is reducedIs less than original

r_cRepresents the reduction rate; 2) ST block does not consider the spatial correlation of the feature map, and the learned threshold is not optimal. Therefore, an adaptive soft threshold module is proposed to obtain a better threshold.

(3) Constructing an image recovery network model PL-DSR based on a convex set projection algorithm:

(3a) construction of PL-DSR model:

the EDSR network model is re-interpreted from the traditional image processing perspective, and is considered to be very similar to the traditional convex set projection algorithm, but only one step of denoising in a transform domain is omitted. Specifically, the method comprises the following steps: 1) performing a 'transform' operation on the input image by using the first convolution layer view for feature extraction; 2) the EDSR passes the extracted features through a series of residual blocks, each of which is composed of conv-relu-conv, and then the functions of the three layers are respectively regarded as a process of 'inverse transformation-non-negative inhibition of a spatial domain-transformation'; 3) cascading multiple residual blocks is considered as an iterative iteration of the process. Secondly, the ASTblock is applied to an EDSR network to obtain a PL-DSR network model, denoising displayed on the characteristic diagram is completed, the more clear structured characteristic can be extracted, and the recovery effect better than that of the EDSR is obtained while the complexity of the network is reduced.

(3b) Defining a loss function of the image recovery network model;

definition of L₁Norm recovery loss function L:

wherein N is the total number of training samples,

and

recovery maps respectively representing the ith sampleAn image and an HR image. The smaller the loss value, the better the network recovery effect.

(4) Training the PL-DSR model:

and (4) taking the LR-HR image blocks in the training sample set as the input of the network, and performing K times of iterative training on the PL-DSR model to obtain a trained model. Wherein K is more than or equal to 50000;

(5) the PL-DSR model was tested:

and taking the test sample set as the input of the trained PL-DSR model to obtain a corresponding recovery image.

(6) And (3) generating a recovery image:

and taking the LR image to be restored as the input of the PL-DSR model to obtain the restored image.

Compared with the prior art, the invention has the following advantages:

1. the invention constructs a displayed characteristic denoising module based on the idea of the traditional convex set projection algorithm, namely generalizing the characteristic denoising module for the classical soft threshold algorithm through a network. The module is simple in structure, can effectively inhibit noise in the features, enables the network to extract the features with clearer structure, and effectively overcomes the defects of high noise degree, fuzzy texture details and the like in the existing network feature map.

2. The invention constructs an image recovery network PL-DSR based on a convex set projection algorithm based on an EDSR network model, namely, a constructed displayed feature denoising module is applied to the EDSR network, the relation between the PL-DSR and the convex set projection algorithm is explained, and a deep learning technology is closely connected with the traditional technology. With the denoising module shown, PL-DSR uses fewer residual blocks and channels to obtain performance equivalent to EDSR. Compared with the prior art, the recovery image retains more high-frequency information such as textures, edges and the like.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a schematic diagram of a denoising module structure constructed according to the present invention;

FIG. 3 is a schematic diagram of an image restoration network structure based on a convex-like set projection algorithm constructed by the present invention;

FIG. 4 is a graph comparing the visualization of features of the present invention with those of the prior art;

FIG. 5 is a graph comparing the performance of the present invention with an EDSR reference network obtained at different residual blocks and channel numbers;

FIG. 6 is a graph comparing model complexity and recovery performance of the present invention with that of the prior art;

fig. 7 is a graph comparing the image restoration results of the present invention and the prior art.

Detailed Description

The embodiments and effects of the present invention will be described in detail below with reference to the accompanying drawings.

Referring to fig. 1, the implementation steps of the invention are as follows:

step 1) constructing a training sample set and a testing sample set:

Step 2), constructing a displayed feature denoising module:

referring to fig. 2, the feature denoising module shown is constructed to include a soft threshold module (ST block) and an adaptive soft threshold module (AST block). The ST block comprises four parts of global pooling, automatic encoder and threshold generation and feature denoising, and the AST block introduces a space self-adaptive mechanism based on the ST block and learns a better threshold.

(2a)ST block：

Referring to fig. 2(a), the four sections are described in detail as follows:

assuming X ∈ R^W×H×CIs the input characteristic diagram, wherein W, H and C represent the width, height and channel number, respectively. x is the number of_c∈R^W×H×CDenotes the c-th channel of X, X ═ X₁，…，x_c，…，x_C]. First, X is absolute and passed through a globallaverage posing (GAP) layer to obtain a vector U of length C:

u_cis the c-th component of U. Then, extracting the principal component V epsilon R of U by utilizing two Full Connection (FC) layers^C:

V＝W₂×ReLU(W₁×U)

Wherein, W₁∈R^W×H×CAnd W₂∈R^W×H×CCoefficients of the two FC layers are respectively expressed, ReLU is a nonlinear activation function, and ReLU (x) is max (0, x). Secondly, obtaining a corresponding scaling factor vector Z epsilon R by passing V through a sigmoid layer^C：

Z＝sigmoid(V)

Then, the c-th components U of U and Z are combined_cAnd z_cMultiplying to obtain a feature map x_cThreshold τ of_c：

τ_c＝u_c·z_c

Then the threshold vector T ═ τ corresponding to the input feature map X₁，…，τ_c，…，τ_C]. Finally using a soft threshold function G_stThe output characteristic diagram Y is obtained₁，…，y_c，…，y_C]：

y_c(i，j)＝G_st(x_c(i，j)，τ_c)

(2b)AST block：

Referring to fig. 2(b), a spatial adaptive mechanism is adopted, specifically, a spatial correlation matrix Θ e R is obtained by a simple 3 × 3 convolutional layer and sigmoid activation^W×H. The module is used for inputting a threshold value tau of a characteristic diagram X at a point (i, j, c)_c(i, j) is:

τ_c(i，j)＝θ(i，j)·u_c·z_c

step 3) constructing an image recovery network model PL-DSR based on a convex set projection algorithm:

(3a) construction of PL-DSR model:

referring to fig. 3, the PL-DSR network model includes a feature extraction section (first convolutional layer), a nonlinear mapping section (a plurality of residual blocks each having a composition of conv-relu-conv-AST block), and a recovery section (upsampling block and output convolutional layer). For each module, generalized explanation is given from the traditional image processing perspective, and comparison is made with a convex set projection algorithm, and the relation between the two is shown. Specifically, the method comprises the following steps:

1) performing 'transformation' operation on the input image by taking the convolution views for feature extraction;

2) the function of each residual block is respectively regarded as the process of 'inverse transformation-non-negative inhibition of the space domain-transformation domain denoising';

3) cascading multiple residual blocks is considered as an iterative iteration of the process.

The detailed structure of each part is respectively as follows:

a feature extraction section: input layer → first winding layer

The nonlinear mapping part: the output of the first convolution layer → the first residual block → the second residual block → the third residual block → the fourth residual block → the fifth residual block → the sixth residual block → the seventh residual block → the eighth residual block → the ninth residual block → the tenth residual block → the eleventh residual block → the twelfth residual block → the thirteenth residual block → the fourteenth residual block → the fifteenth residual block → the sixteenth residual block → the seventeenth residual block → the eighteenth residual block → the nineteenth residual block → the twentieth residual block → the twenty-first residual block → the twenty-second residual block → the twenty-third residual block → the twenty-fourth residual block → the twenty-fifth residual block → the twenty-sixth residual block → the twenty-seventh residual block → the twenty-eighth residual block → the twenty-ninth residual block → the twenty-third residual block → the thirty-third residual block

And (3) recovering part: output of the thirty-second residual block → first upsampled layer → convolutional layer → second upsampled layer → convolutional layer

Wherein the parameter settings of each layer are the same: kernel size 3, stride 1, channel 128, and activation function is ReLU. The AST block parameters are set as follows: reduction factor r_cThe input channel of the 3 × 3 convolutional layer is 128, and the output channel is 1.

(3b) Defining a loss function of the image recovery network model;

definition of L₁Norm image restoration loss function L:

wherein N is the total number of training samples,

and

representing the restored image and the HR image of the ith sample, respectively.

Step 4) training the PL-DSR model:

and (4) taking the LR-HR image blocks in the training sample set as the input of the network, and performing K times of iterative training on the PL-DSR to obtain a trained model. Wherein K is more than or equal to 50000, and the specific implementation steps are as follows:

(4a) initializing network parameters, setting iteration times D, wherein the maximum iteration time is K which is more than or equal to 50000, and setting D to be 1;

(4b) the LR image blocks and the HR image blocks are simultaneously input into a network to obtain a recovery image block SR output by the network;

(4c) calculating the loss between the SR image block and the HR image block according to an image recovery loss function L, and keeping the structural characteristics of the original HR image block;

(4d) updating the parameters of the network through the recovered loss values by using a gradient descent method;

(4e) judging whether D is equal to K, if so, obtaining a trained PL-DSR network model; otherwise, let D be D +1, and perform step (4 b).

Step 5) testing the PL-DSR model:

Step 6) generating a recovery image:

The effects of the present invention can be further illustrated by the following simulations

1. Conditions of the experiment

The hardware test platform of this experiment is: intel Core i7 CPU, master frequency 3.60GHz, memory 8 GB; the software simulation platform comprises: a Ubuntu 16.0464 bit operating system; software simulation language: python; using a deep learning framework: pytrch.

2. Analysis of experimental content and results

The experimental contents are as follows: the same low resolution image was restored using the present invention and the method of led et al, Lim et al, under the above experimental conditions, five simulation experiments were performed, respectively.

Referring to fig. 4, three kinds of extracted feature maps of the network are visualized. Fig. 4(a) is an input LR image, fig. 4(b) is a feature map generated by SRResNet, fig. 4(c) is a feature map generated by EDSR, and fig. 4(d) is a feature map generated by the method of the present invention. As can be seen from the comparison result, the feature denoising module of the method can effectively suppress the noise in the feature map, so that the network can extract the features with clearer structure and clearer texture.

To verify the impact of the Soft Threshold (ST) module and the Adaptive Soft Threshold (AST) module on the recovery performance, we constructed two networks PL-DSR _ ST and PL-DSR _ AST, respectively, while comparing the recovery performance of the EDSR reference network. The test set used was DIV2K validation, and the peak signal-to-noise ratio PSNR value was calculated as follows:

the results are shown in Table 1.

TABLE 1 reconstruction effect PNSR (dB) comparison table of three reference networks

It can be seen that after the ST block is added, the recovery performance of the network is superior to that of an EDSR reference network, and the effectiveness of the feature denoising module is further verified. In addition, the PL-DSR _ AST network achieved the highest PSNR value, which indicates that the introduction of the spatial adaptive mechanism may allow the network to learn a better threshold, thereby improving the recovery quality.

Referring to fig. 5, a graph comparing the performance of the method of the present invention and the EDSR network model at different residual block numbers and channel numbers is verified. Fig. 5(a) is the performance at different residual block numbers when the number of channels is fixed to 64, and fig. 5(b) is the performance at different channel numbers when the number of residual blocks is fixed to 32. It can be seen that the method of the present invention is superior to the EDSR network model no matter how many residual block numbers and channel numbers are taken.

Referring to fig. 6, a graph comparing the complexity and performance of the model of the present invention method to that of the prior art is shown. It can be seen that the method of the present invention not only has good recovery performance, but also has low model complexity. Compared with the prior art, the method can better balance the model complexity and the recovery performance.

Referring to fig. 7, a comparison graph of the image restoration result of the method of the present invention and the prior art is compared. It can be seen that the image restored by the method of the invention retains more texture features, has clearer edges and is closer to the original HR image.

In conclusion, the displayed feature denoising module is designed based on the traditional convex set projection algorithm, and is introduced into the classic EDSR network to construct the image recovery network PL-DSR based on the convex set projection algorithm. The network can effectively inhibit noise in the characteristic diagram, extracts the characteristics with clearer structuralization, and further improves the recovery effect while reducing the complexity of the model. The visual sense organ experience of people is improved, and the method can be used for finishing post-production processing of photographic and film and television works.

The above description is only one specific example of the present invention and does not constitute any limitation of the present invention. It will be apparent to persons skilled in the relevant art that various modifications and changes in form and detail can be made therein without departing from the principles and arrangements of the invention, but these modifications and changes are still within the scope of the invention as defined in the appended claims.

Claims

1. An image restoration method based on a convex set projection algorithm is characterized by comprising the following steps:

(1) constructing a training sample set and a testing sample set:

(2) Constructing a displayed feature denoising module:

(2a) constructing a core idea of a characteristic denoising module:

the core idea of designing a feature denoising module is to generalize a conventional soft threshold algorithm (softthreshold) through a network. From a traditional signal processing perspective, convolution is a "transform" process, and denoising features is essentially denoising in the transform domain. The soft threshold algorithm is a classical method for transform domain denoising. Soft threshold function G_st(x) Expressed as:

(2b) Designing a soft threshold module:

the key point of designing the soft threshold module is that the network can adaptively learn the threshold corresponding to each feature map, and the task of feature denoising is completed. The whole module is constructed by global pooling, an automatic encoder and threshold generation, and feature denoising is carried out, wherein:

global pooling, including a global pooling layer, for extracting the average energy of each channel of the input feature map;

the automatic encoder comprises two simple fully-connected layers and is used for extracting principal components of the mean vector obtained after global pooling and then generating a corresponding scaling factor vector;

(2c) Designing an adaptive soft threshold module:

the above soft threshold module has two drawbacks: 1) the module is not a lightweight module and the extracted features may be redundant. To solve this problem, the number of channels in the first fully-connected layer is reduced to the original number

r_cRepresents the reduction rate; 2) the soft threshold module does not take into account the spatial correlation of the feature map and the learned threshold is not optimal. Therefore, an adaptive soft threshold module is proposed to obtain a better threshold.

(3a) construction of PL-DSR model:

the EDSR network model is re-interpreted from the traditional image processing perspective, and is considered to be very similar to the traditional convex set projection algorithm, but only one step of denoising in a transform domain is omitted. Specifically, the method comprises the following steps: 1) performing a 'transform' operation on the input image by using the first convolution layer view for feature extraction; 2) the EDSR passes the extracted features through a series of residual blocks, each of which is composed of conv-relu-conv, and then the functions of the three layers are respectively regarded as a process of 'inverse transformation-non-negative inhibition of a spatial domain-transformation'; 3) cascading multiple residual blocks is considered as an iterative iteration of the process. Secondly, the self-adaptive soft threshold module is applied to the EDSR network to obtain a PL-DSR network model, denoising displayed on the characteristic diagram is completed, the more clear structured characteristics can be extracted, and the recovery effect better than that of the EDSR is obtained while the complexity of the network is reduced.

(3b) Defining a loss function of the image recovery network model;

definition of L₁Norm recovery loss function L:

wherein N is the total number of training samples,

and

representing the restored image and the HR image of the ith sample, respectively. The smaller the loss value, the better the network recovery effect.

(4) Training the PL-DSR model:

(5) the PL-DSR model was tested:

(6) And (3) generating a recovery image:

2. The image restoration method based on the convex-like set projection algorithm according to claim 1, wherein the specific structures of the soft threshold module and the adaptive soft threshold module in step (2) are respectively:

(2a) a soft threshold module:

the construction of the soft threshold module comprises four parts of global pooling, automatic encoder, threshold generation and feature denoising, and the detailed description is as follows:

assuming X ∈ R^W×H×CIs the input characteristic diagram, wherein W, H and C represent the width, height and channel number, respectively. x is the number of_c∈R^W×H×CDenotes the c-th channel of X, X ═ X₁，…，x_c，…，x_C]. Firstly, absolute value of X is carried out and a vector U with the length of C is obtained through a global pooling layer:

u_cis the c-th component of U. Then, extracting the principal component V epsilon R of U by utilizing two Full Connection (FC) layers^C：

V＝W₂×ReLU(W₁×U)

Z＝sigmoid(V)

τ_c＝u_c·z_c

y_c(i，j)＝G_st(x_c(i，j)，τ_c)

(2b) An adaptive soft threshold module:

aiming at the defects of a soft threshold module, a space self-adaptive mechanism is adopted, and specifically, a space correlation matrix theta epsilon is obtained through a simple 3 multiplied by 3 convolutional layer and sigmoid activation^W×H. The module is used for inputting a threshold value tau of a characteristic diagram X at a point (i, j, c)_c(i, j) is:

τ_c(i，j)＝θ(i，j)·u_c·z_c。

3. the image restoration method based on the convex-like set projection algorithm as claimed in claim 1, wherein the PL-DSR network model in step (3a) has a specific structure:

the PL-DSR network model includes a feature extraction section (first convolutional layer), a nonlinear mapping section (a plurality of residual blocks each having a conv-relu-conv-AST block composition), and a recovery section (upsampling block and output convolutional layer). For each module, generalized explanation is given from the traditional image processing perspective, and comparison is made with a convex set projection algorithm, and the relation between the two is shown. Specifically, the method comprises the following steps:

The detailed structure of each part is respectively as follows:

a feature extraction section: input layer → first winding layer

And (3) recovering part: the output of the thirty-second residual block → the first upsampled layer → the convolutional layer → the second upsampled layer → the convolutional layer.

4. The image restoration method based on the convex-like set projection algorithm as claimed in claim 1, wherein the training of the PL-DSR network model in step (4) is implemented by the steps of: