CN110570353B

CN110570353B - Super-resolution reconstruction method for generating single image of countermeasure network by dense connection

Info

Publication number: CN110570353B
Application number: CN201910797707.2A
Authority: CN
Inventors: 李素梅; 陈圣
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2023-05-12
Anticipated expiration: 2039-08-27
Also published as: CN110570353A

Abstract

The invention belongs to the field of video and image processing, and aims to further improve the reconstruction effect and reconstruction precision of a high-resolution image and promote the improvement of a structure and a loss function of a generated countermeasure network. The invention is mainly applied to image processing occasions.

Description

Super-resolution reconstruction method for generating single image of countermeasure network by dense connection

Technical Field

The method belongs to the field of video and image processing, and relates to improvement of an image super-resolution reconstruction algorithm, fusion of a deep learning theory and image super-resolution reconstruction, a dense residual convolution neural network and realization and application of a generation countermeasure network in the field of high-resolution image reconstruction. In particular to a super-resolution reconstruction method for generating a single image of an countermeasure network based on dense connection.

Background

Image super-resolution refers to the process of obtaining a corresponding high resolution image by using a single or multiple sequences of low resolution degraded images. In many practical applications in the field of image processing, it is often desirable to obtain a high resolution raw image, because the high resolution image means a higher pixel density, and can provide more abundant high frequency detail information, thereby creating a good basis for post-processing of the image and accurate extraction and utilization of the image information. However, in real situations, due to the limitations of hardware imaging devices and illumination conditions, interference caused by human or natural factors, different types of noise may be introduced in the imaging, transmission, storage, etc., and these factors may directly affect the quality of the image, so that it is often difficult to obtain the desired high-resolution image. Therefore, how to improve the quality of the acquired image and obtain a high-resolution image meeting the application requirements becomes a key research topic in image processing. Meanwhile, as a practical technology with stronger specialty, the image super-resolution reconstruction has very wide application prospect in the fields of biomedical [1], satellite remote sensing [2], medical images, public safety [3], national defense military and science and technology, and is increasingly valued. For example: the super-resolution reconstruction technology is adopted in the high-definition digital television signal application system, so that the transmission cost of signals can be further reduced, and meanwhile, the definition and quality of pictures can be ensured. Multiple frames of images about the same region can be obtained in military images and satellite observation images, and image observation higher than system resolution can be achieved by adopting a super-resolution reconstruction technology based on the multiple frames of images, so that target observation accuracy is improved. The super-resolution technology is utilized in a medical imaging system (CT, magnetic Resonance Imaging (MRI)), so that the image quality can be improved, the details of a lesion target can be clearly presented, and the treatment of a patient is assisted. In public places such as banks, traffic intersections, markets and the like, more detail information can be grabbed through super-resolution reconstruction of key parts of the monitoring images, and important clues are provided for processing a plurality of public security events.

Image super-resolution reconstruction is an image processing method with great practical application value, and the concept essence of the image super-resolution reconstruction is related research originating from the optical field, wherein super-resolution means to restore image information beyond the diffraction limit of a spectrum. Toraldo di Francia explicitly proposes the concept of super-resolution in the literature on radar research, whereas the principle of super-resolution complex for images was originally proposed by Harris and Goodman in a method of spectrum extrapolation that becomes Harris-Goodman. Since the original super-resolution research of the image is mainly performed on a single frame image, the super-resolution effect of the image is greatly limited, and although many scholars propose many methods for image restoration work, the methods only obtain better simulation effect under certain premise assumption, and the effect is not ideal in practical application. Tsai and Huang in 1984 first proposed a super-resolution reconstruction method based on multi-frame or sequential low-resolution images, and provided a reconstruction method based on frequency domain approximation, so that the research of multi-frame image super-resolution reconstruction has made great progress and development. Through decades of research and exploration, the field of image super-resolution has emerged with a wide variety of specific reconstruction methods. The super-resolution reconstruction of images can be classified into a reconstruction method based on a single frame image and a reconstruction method based on a multi-frame sequence image according to the number of processed original low-resolution images. The former mainly uses the prior information of a single frame image to recover the lost high-frequency information when the image is acquired. The latter not only utilizes the prior information of the single frame image, but also considers the complementary information among different frame images to provide more complete and sufficient characteristic data for the recovery of the image high-frequency information, so that the super-resolution recovery effect of the super-resolution image is often obviously better than that of the former. However, in most practical situations, there is sometimes a great difficulty in acquiring multiple frames of acquired images of the same scene, and the super-resolution research based on multiple frames of images is also based on processing single frames of images, so the super-resolution research based on single frames of images is always a research hotspot in the field of image super-resolution. According to a specific implementation method, super-resolution reconstruction of an image can be divided into a frequency domain method and a spatial domain method. The frequency domain method is to remove aliasing of a spectrum in a frequency domain, thereby improving spatial resolution accordingly. The frequency domain methods which are popular at present comprise an energy continuous degradation method and an anti-aliasing reconstruction method. The frequency domain method has the advantages of simple theory, low operation complexity and easy realization of parallel processing, and has the defects that the theoretical premise of the method is too ideal, can not be effectively applied to most practical occasions, can only be applied to a simpler degradation model, and contains limited prior knowledge of an airspace. The airspace method has wider application range and strong capacity of containing space priori constraint, and mainly comprises an iterative back projection method, a set theory method, a statistical recovery method and the like. The peak signal-to-noise ratio (PSNR) and the Structural Similarity (SSIM) are key parameters for measuring and comparing the final reconstruction effect as the most key evaluation index in the field of image super-resolution reconstruction. The peak signal-to-noise ratio is the pixel value deviation between the high-resolution image obtained by the accumulated calculation and reconstruction of pixel points and the original real high-resolution image, so that the deviation degree of the high-resolution image and the original real high-resolution image on the whole pixel value is reflected, and the measurement unit is dB. The structural similarity is focused on comparing the similarity of the reconstructed image and the original image in the aspects of texture features, structural features and the like, and the measurement result is a real number ranging from 0 to 1, and generally is closer to 1, so that the reconstruction method has better recovery performance on the image structure and the texture, and can maintain the structural similarity of the reconstructed image and the original high-resolution image to a greater extent. In addition, when the reconstruction effect of the reconstructed image is subjectively evaluated, a method for amplifying the local key areas of the image and observing and comparing the restoration degree of different algorithms on image details and high-frequency information is often adopted to evaluate the advantages and disadvantages of various algorithms.

Currently, the image super-resolution technology is mainly divided into three research directions: interpolation-based methods [16,17], reconstruction-based methods [18,19,20], and learning-based methods [21,22,23 ]. Among which interpolation-based methods are typically included bilinear interpolation, bicubic interpolation, etc. The large-class method is simple and easy to implement, has relatively low complexity, and has relatively poor recovery effect on high-frequency effective information. The reconstruction-based method mainly comprises a convex set projection method, a Bayesian analysis method, an iterative back projection method, a maximum posterior probability method, a regularization method, a mixing method and the like. The learning-based methods mainly comprise an Example-based method, a neighborhood embedding method (neighbor embedding), a support vector regression method (SVR) and a sparse representation method (sparse presentation).

Because the image reconstruction method based on learning can often obtain higher-layer information of more images, the method is greatly beneficial to recovering the high-frequency information of the images, and therefore, more ideal reconstruction results are easier to obtain. In particular, in recent years, with the advent of artificial intelligence wave, deep learning theory has been increasingly applied in the field of classical image processing, and excellent results superior to those of the conventional algorithms are continuously obtained. Therefore, the application of deep learning in the field of image super-resolution is intensively studied by related scientific researchers, and a large amount of research results are obtained. In the European computer vision conference (ECCV) of 2014, dong Chaodeng from the university of hong Kong Chinese proposed the idea of applying Convolutional Neural Network (CNN) to the field of image super-resolution reconstruction for the first time, and by constructing a simple three-layer CNN network SRCNN [4], the mapping from low resolution image to high resolution image from end to end was realized, the feature extraction of the image, the nonlinear mapping of the low resolution and high resolution image, and the construction and output of the final image were all completed by the network, and the model parameters of the network were obtained by learning based on a large number of data sets. The SRCNN [4] has an experimental effect obviously superior to that of the traditional super-resolution algorithm, proves that the deep learning has a good application prospect in the super-resolution field, and also indicates a new direction for the super-resolution research of images. After SRCNN [4], the method aims at the problems of shallower SRCNN [4] network layer number, weaker feature extraction and mapping capability, insufficient receptive field, low convergence speed and the like. Kim et al, korean university, propose an extremely deep super resolution reconstruction network (VDSR 5) comprising 20 convolutional layers, which greatly increases the receptive field of the network and enhances the learning ability of the network. And a global residual error structure is introduced into the network, so that a learning target of the network is changed into a residual error image, thereby greatly reducing the learning difficulty of the network and accelerating the convergence rate. Meanwhile, kim et al try to utilize a recursive structure in DRCN [6] (deep recursive convolutional network) in order to control the number of parameters of the network, ensure that the network depth is increased by widely using a recursive block without introducing new parameters to be learned, and promote the reconstruction effect. In order to realize the scale expansion of the low-resolution image and effectively reduce the calculated amount of the network, dong Chaodeng people firstly introduce a deconvolution network in the improved FSRCNN (frequency-division multiplexing network) 11 network, and the scale expansion of the image is realized by the self parameter learning of the network. As the depth of the network increases, the learning effect of the network is seriously affected by the decline of the learning efficiency caused by the network degradation problem, he Kaiming et al propose a local residual structure in ResNet [7], and by creating shortcut connection in a local residual block, the effect degradation caused by the network is effectively avoided, and the training speed and the learning effect of the network are further enhanced. Based on fully analyzing ResNet [7], DRCN [6], VDSR [5] and other networks, tai et al, the advantages of local and global residuals and recursion structures are fused, and a deep recursion residual network (DRRN [8 ]) is provided, so that the reconstruction effect is remarkably improved. The srcn, the DRCN and the DRRN all require preprocessing outside the network, and cannot realize the end-to-end reconstruction from the low resolution image to the high resolution image, which reduces the efficiency of the network. In order to achieve end-to-end image reconstruction Wenzhe Shi [9] et al introduce a sub-pixel convolution layer (sub-pixel convolutional layer), the upsampling process is incorporated into the network, greatly improving the efficiency of the model. In 2017, the Jia-Bin Huang [10] et al uses a deep Laplacian pyramid to realize rapid and accurate image superdivision, the article combines the traditional Laplacian pyramid and a convolutional neural network, the network realizes gradual amplification of a low-resolution image, parameter sharing among pyramids of different levels is realized through recursion, and the precision is effectively improved while the calculated amount is reduced. With extensive research of deep learning, the image reconstruction algorithm based on the convolutional neural network has greatly improved the precision and the speed. However, the reconstruction result has poor effect in regions such as repeated textures, boundaries, corners and the like, and cannot meet subjective vision of human eyes. SRGAN [11] emerged in 2016, authors adopted an architecture based on generating an antagonism network, and introduced a visual loss function. From the quantitative evaluation results, the evaluation value obtained by SRGAN was not very high. But subjectively the high resolution image generated by SRGAN appears more realistic. The occurrence of SRGAN has raised the research hot tide of generating an antagonism network in the field of image superdivision, SRPGAN is proposed by Bingzhe Wu et al [12] in 2017, a more stable vision loss function based on a discrimination network is constructed, and a Charbonnier loss function is used as the content loss of a model. SRPGAN greatly improves SSIM values of the reconstruction results. Wang [13] improved the SRGAN, using Residual-in-Residual Dense Block (RRDB) after removal of BN layer as the generation network, the visual loss of the SRGAN was based on the output profile after activation function, in which model the authors calculated the visual loss using the profile before activation. The improved model has improved performance in luminance and repetitive texture regions.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to realize the capability of quickly and accurately learning the high-frequency characteristics of the image by utilizing a dense residual structure which can effectively recover the eye comfort result and fully utilize the tight connection between the residual by utilizing the generation countermeasure network, so that the reconstruction effect and the reconstruction precision of the high-resolution image are further improved, and meanwhile, the improvement of the structure and the loss function of the generation countermeasure network is promoted to a certain extent, and the deep application and development of the structure and the loss function in the field of super-resolution reconstruction of the image are promoted. The technical scheme adopted by the invention is that the method for reconstructing super-resolution of single image of dense connection generation countermeasure network comprises two parts of a generation network and a countermeasure network, wherein the generation network adopts a basic framework of residual dense network RDN, 5 dense connecting block DCB blocks are used as basic modules, the countermeasure network adopts deep convolution to generate a framework of the countermeasure network DCGAN discriminator, the low-resolution image is input into the generation network to be processed, the obtained output is sent to the countermeasure network to be judged, the judgment result is fed back to the generation network through a loss function, and the method is circulated until the countermeasure network is judged to be qualified, the generation network can generate clear images, and then the trained generation network is utilized to complete super-resolution reconstruction of the low-resolution image.

Training set preparation and data preprocessing are needed:

firstly, carrying out downsampling processing on an original high-resolution color image to obtain a corresponding low-resolution image, simulating the low-resolution image obtained under the real condition by using the downsampling processing on the high-resolution image by taking the low-resolution image as input, and then carrying out downsampling processing on the high-resolution image by utilizing a bicubic interpolation formula:

I _lr ＝W(x)*I _hr

wherein I is _lr For downsampled low resolution images, I _hr For high resolution images, W (x) is a bicubic interpolation weight matrix, according to I _lr And I _hr The distance x of the corresponding pixel point is calculated:

then downsampling the resulting low resolution image I _lr Carrying out data normalization processing on the high-resolution image to obtain a normalized image matrix I _lrb :I _lrb ＝I _lr /255，I _hrb :I _hrb ＝I _hr And 255, randomly dicing the low-resolution image and the corresponding high-resolution image, and finally, the diced low-resolution image is used for inputting a cascade residual error network, the diced high-resolution image is used as a label of the network, and training of the neural network is completed by using the manufactured training set.

The first two layers of the generated network basic framework are shallow layer feature extraction layers, and the size and the number of cores are (3, 64); the middle is a feature extraction layer and consists of 5 DCB modules, the output of each module is sent to a connection layer concat (Concatenated layer) layer, and the concat layer is followed by a bottleneck layer with the core size and number of (1, 64); then, taking the output of the bottleneck layer and the output of the first layer as residual errors; finally, an up-sampling layer is provided, and the core size, step size and number are (6,2,2,3).

Each DCB block comprises four convolution layers Conv1, 2,3 and 4 and a bottleneck layer Conv5, after each convolution layer, a cascade operation is carried out to realize dense connection in residual errors, and the bottleneck layer at the tail end of the DCB is a local feature fusion layer used for fusing a large number of feature graphs;

the convolution kernel size of the four convolution layers in the DCB is set to 3×3, and the kernel size of the last bottleneck layer is set to 1×1, assuming that the input and output of the D-th DCB block are D, respectively _d-1 And D _d ，D _c Denoted as 4 th concat [29]]Layer output, then:

D _c ＝f _cat4 f _cr4 (f _cat3 f _cr3 (f _cat2 f _cr2 (f _cat1 f _cr1 (D _d-1 )))) (1)

wherein f _cri Representing the convolution of the ith (i=1, 2,3, 4) convolution layer with the ReLU layer, reLU activation operation, f _cati Concat [29] representing the ith (1, 2,3, 4) convolution layer]Cascading operation, usingf _bo Representing convolution operations in the bottleneck layer, the output of the DCB is expressed as:

D _d ＝f _bo (D _c ) (2)

the bottleneck layer in DCB is a local feature fusion operation for adaptively fusing D _d-1 The characteristics of the model and the output of all the convolution layers in the current model.

The deep generation countermeasure network DCGAN (Deep Convolutional Generative Adversarial Networks) uses long step convolution to replace an up-sampling layer, a normalization layer normalizes the output of a feature layer, an activation function is adjusted in a discriminator to prevent gradient sparseness, the countermeasure network based on DCGAN is composed of a convolution block, 6 CBL blocks and a Dense connection, the full connection layer Dense1024 with 1024 output and the full connection layer Dense1 with 1 output are realized by the convolution layer by adopting the LeakyReLU as the activation function delta in the CBL blocks, finally, an output value is obtained through a sigmoid function, the convolution kernel in the network is 3 multiplied by 3, and the filling is 1.

Loss function

The three-part weighting combination is formed by the following three parts:

the first part is l _{im age} Is an L1 norm loss function based on pixel points,

wherein G is ⁱ (x) An image X representing the input ith low resolution image and obtained by the generator and having its resolution increased ⁱ N represents the number of images for the corresponding original image; convolutional neural network VGG16[20 ]]Content loss function/of (2) _VGG [23]，

Training the model to obtain the result G ⁱ (x) And (3) withOriginal sharp image X ⁱ Respectively send into pretrained VGG16[20 ]]In the network of (2), the Euclidean distance between the feature graphs obtained by the kth convolution layer is calculated, phi _k,j Represents VGG16[20 ]]The j Zhang Tezheng plot of the kth convolutional layer output, N represents the total amount of feature plots of the kth convolutional layer output, and the content loss function ensures that the content of the two images is similar. Countering losses l _D ，/>

The invention has the characteristics and beneficial effects that:

the loss function of SRGAN is improved, using the L1 norm for generating the loss and the L2 norm for perceiving the loss. The intensive residual structure is provided as a generating network, so that the intensive residual structure not only can fully extract high-frequency abstract features of pictures, but also can reserve low-level features, and the result meets the visual requirement better. As shown in fig. 2, the test results on the reference data set indicate that the model achieves more excellent results in both objective index and subjective visual effect compared with the SRGAN.

Description of the drawings:

FIG. 1 is based on generating an image super-resolution reconstruction model of an antagonism network.

Figure 2 shows a comparison of 4-fold reconstruction results at different loss functions.

Our reconstruction results at a magnification of 3.4 are compared with LapSRN, VDSR, SRGAN. We use a color box to highlight sub-regions that contain rich detail. We have enlarged the sub-areas in the following boxes to show more detail. From the sub-region image, our method has a strong ability to recover high frequency details and sharp edges.

Fig. 4 dense connectivity module (DCB) architecture.

The specific structure of the CBL cell of fig. 5.

Detailed Description

In contrast to SRGAN, the inventive generation network employs Residual-in-Residual Dense Block to extract high-level features. In contrast to SRPGAN, the content loss function employs a feature-based 1-norm. In contrast to ESRGAN, the generation network uses a global feature fusion layer before upsampling and the activation function of the RRDB module uses relu. Experimental results show that the generated picture has better visual effect.

As a classical topological structure in the artificial neural network, the convolutional neural network has very wide application in the fields of pattern recognition, image and voice information analysis processing and the like. In the field of super-resolution reconstruction of images, dong Chaodeng first proposes SRCNN [4] networks, and after Convolutional Neural Networks (CNNs) are successfully applied to the recovery and reconstruction of high-resolution images, many improved CNNs are successively proposed and are obviously improved on key reconstruction effect evaluation indexes. However, the reconstruction result has poor effect in regions such as repeated textures, boundaries, corners and the like, and cannot meet subjective vision of human eyes. SRGAN [11] emerged in 2016, authors adopted an architecture based on generating an antagonism network, and introduced a perceptual loss function. From the quantitative evaluation results, the evaluation value obtained by SRGAN was not very high. But subjectively the high resolution image generated by SRGAN appears more realistic.

Super Resolution Generation Antagonism Networking (SRGAN) is an pioneering task that is capable of generating realistic textures during super resolution of a single image. However, since the loss function employs an L2 norm based on pixel points, details that create an illusion tend to be accompanied by unpleasant artifacts. To this end we propose generating an antagonism network based on dense connections, as shown in fig. 1.

The model combines a Residual Dense Network (RDN) 14 basic framework model with a deep convolution generating countermeasure network (DCGAN) 28 network. The generation network references the basic framework of RDN [14], uses 5 Dense Connection Block (DCB) blocks as basic modules, and references the DCGAN discriminator [28] network framework in the antagonism network. Specific implementation procedures will be described below. The input and output of the model are color images.

The dense connection-based image super-resolution reconstruction method for generating an countermeasure network mainly relates to the following steps: the generation network of the model is based on a densely connected residual structure, and rapid and accurate learning of the high-frequency characteristics of the input image is realized by utilizing the tight connection among the residual. Our antagonism network is modeled by the antagonism network of DCGAN [28]. The generating loss function of the generating network is adjusted to be L1 norm, the L1 cost function can obtain real texture characteristics conforming to subjective characteristics of human eyes, the perception loss function based on VGG is still based on L2 norm, and the combination of the two loss functions ensures that a reconstruction result is very close to a target image in low-layer pixel values, high-layer abstract characteristics and the whole. The loss function of the countermeasure network removes the original log taking operation, and ensures that the generator obtains the same distribution as the original data. The final reconstruction effect is greatly improved by generating the network and the game against the network, the workflow of the network is introduced in the following scheme implementation link, the detailed structure of the generated network is displayed, and the final reconstruction effect is compared and analyzed.

Training samples: the disclosed database VOC2012[24] is used herein for training of networks, the dataset being one benchmark test applied to classification recognition and detection of visual objects, the photo set comprising 20 catalogs. The data set has good image quality and complete labeling, and is very suitable for testing the performance of an algorithm. From this dataset 16,700 images were chosen for training of the network and 100 images were used for the verification set of the network. The experiment achieved 4-fold upsampling and a 22 x 22 low resolution patch was processed as an input to the network by using bicubic interpolation to randomly clip the 88 x 88 clear color image.

Test sample: the test data Set is Set5[25], set14[26] and BSD100[27], the model directly processes three-channel input (RGB) images, and results show that the model can be reconstructed to obtain results meeting human eyes subjectivity when sampling factors 2,4 and 8, objective evaluation indexes are greatly superior to those of the prior GAN network, and the test data Set has great practical application value.

The following describes the method in detail with reference to the technical scheme:

after the model construction is completed, a proper optimization algorithm is needed to be selected to minimize the loss function to obtain optimal parameters, and the model isThe adaptive time of day estimation method (Adam: adaptive Moment Estimation) is used to update the weights and bias of the model, and the Adam algorithm is different from the conventional random gradient descent SGD. The random gradient descent keeps a single learning rate updating all weights, and the learning rate does not change during training. While Adam designs independent adaptive learning rates for different parameters by computing first and second moment estimates of the gradient. The algorithm parameters include: step size ε (default 0.001), exponential decay rate ρ of moment estimation ₁ And ρ ₂ (defaults to 0.9 and 0.999), a small constant delta (defaults to 10) for numerical stabilization ^-8 ). Our implementation is based on Pytorch. We trained 3 models, scaling factors of 2,4,8, respectively.

The method comprises the following specific steps:

1 training set making and data preprocessing

Firstly, we perform downsampling processing on an original high-resolution color image to obtain a corresponding low-resolution image, and simulate the low-resolution image acquired under the real condition by using the downsampling processing as an input. Then, the high-resolution image is subjected to downsampling processing by using a bicubic interpolation formula:

I _lr ＝W(x)*I _hr

wherein I is _lr For downsampled low resolution images, I _hr For high resolution images, W (x) is a bicubic interpolation weight matrix, which can be based on I _lr And I _hr The distance x of the corresponding pixel point is calculated:

since the image data is to be fed into the neural network for training, the downsampled low resolution image I is required _lr Carrying out data normalization processing on the high-resolution image to obtain a normalized image matrix I _lrb :I _lrb ＝I _lr /255，I _hrb :I _hrb ＝I _hr 255, then randomly dicing the low resolution image and the corresponding high resolution image, in our embodiment, all lowThe size of the cut of the resolution image is set to 22×22, and the corresponding high resolution image is cut into small blocks of a predetermined size according to the magnification, for example, the size of the cut of the high resolution image is 44×44 in the case of 2-magnification reconstruction, and 88×88 in the case of 4-magnification reconstruction. And finally, the cut blocks of the manufactured low-resolution image are used for inputting a cascading residual error network, the cut blocks of the high-resolution image are used as labels of the network, and the training set is utilized to complete training of the neural network.

2 generating a network and dense connectivity Module (DCB) structural analysis and training procedure

The basic framework of the generated network is the same as RDN (14), namely, the first two layers are shallow layer feature extraction layers, and the size and the number of cores are (3, 64); the middle is a feature extraction layer, which consists of 5 DCB modules, the output of each module is sent to a concat [29] layer, and the concat [29] layer is followed by a bottleneck layer with the core size and number of (1, 64); then, taking the output of the bottleneck layer and the output of the first layer as residual errors; finally, an up-sampling layer is provided, and the core size, step size and number are (6,2,2,3).

Let us now explain the details of the DCB block. As shown in fig. 4. Our DCB blocks each contain four convolutional layers (

Conv

1,2,3, 4) and one bottleneck layer (Conv 5). After each convolution layer there is a cascading operation to achieve dense connections in the residual, which means that the output profiles of all previous convolution layers are concatenated and fused. The bottleneck layer at the end of the DCB is a local feature fusion layer used for fusing a large number of feature graphs.

For each layer setting in the DCB, we set the convolution kernel size of all four convolution layers to 3 x 3 and the kernel size of the last bottleneck layer to 1 x 1. Let the input and output of the D-th DCB block be D respectively _d-1 And D _d We can then represent the relationship between them as follows. First, D _c Denoted as 4 th concat [29]]Layer output:

wherein f _cri Representing the convolution of the ith (i=1, 2,3, 4) convolution layer with the ReLU layer, reLU activation operation, f _cati Concat [29] representing the ith (1, 2,3, 4) convolution layer]And (5) cascading operation. Next, we use f _bo Representing convolution operations in the bottleneck layer, the output of the DCB may be expressed as:

D _d ＝f _bo (D _c ) (2)

in effect, the bottleneck layer in DCB is a local feature fusion operation for adaptively fusing D _d-1 The characteristics of the model and the output of all the convolution layers in the current model. Through feature fusion, not only feature mapping of different layers is fused, but also the calculation complexity is effectively reduced by setting the growth rate of the bottleneck layer to 64. We use a 1 x 1 convolutional layer to control the output of information.

3 fight network structure analysis and training process

Compared with the traditional network structure, the GAN can obtain a clearer sample. The advent of GAN has been widely studied, creating a vast array of excellent networks. The generation network herein is based on the more influential DCGAN [28].

Compared with original GAN, DCGAN [28] almost completely uses convolution layer to replace full-connection layer, the whole network has no pooling layer and up-sampling layer, and uses long step convolution to replace up-sampling layer, and normalizes the output of feature layer together by normalization layer, thereby accelerating network convergence, improving training stability, adjusting activation function in discriminator, and preventing gradient sparseness. Although DCGAN 28 has a good architecture, the training process between the generating network and the judging network can not be balanced well, and the training is unstable. The antagonism network based on DCGAN [28] in the model is composed of a convolution block, 6 CBL blocks and a Dense connection, the structure of the CBL blocks is shown in a figure five, the LeakyReLU is adopted as an activation function delta, the expression of the LeakyReLU is identical to that of the PReLU, the alpha in the formula is not a leachable coefficient any more, but a fixed small constant of 0.2, the Dense1024 and the Dense1 in figure 2 are realized by the convolution layer, the output value is finally obtained through a sigmoid function, the size of the convolution kernel in the network is 3 multiplied by 3, and the filling is 1.

4 loss function

The loss function is used for measuring the difference between the data distribution obtained by the model and the real data distribution, and most models in the image reconstruction field adopt a mean square error function as the loss function. The objective evaluation index of the result obtained by using the pixel point-based function reconstruction is higher, but the phenomenon of losing high-frequency information and being too smooth exists. This is because the visual perception of the human eye is not absolute in sensitivity to errors, and the perceived result is affected by many factors, such as the human eye being more sensitive to brightness and less focused on other details. The loss function is improved, new loss function

Is formed by three parts through weighted combination.

The first part is l _image ，

Is an L1 norm loss function based on pixel points. Wherein G is ⁱ (x) An image X representing the input ith low resolution image and obtained by the generator and having its resolution increased ⁱ For the corresponding original image, n represents the number of images. The second part is based on VGG16[20 ]]Content loss function/of (2) _VGG [23]，/>

Training the model to obtain the result G ⁱ (x) With the original clear image X ⁱ Respectively send into pretrained VGG16[20 ]]In the network of (2), the Euclidean distance between the feature graphs obtained by the kth convolution layer is calculated, phi _k,j Represents VGG16[20 ]]J Zhang Tezheng of the kth convolutional layer output, N representing the kth convolutional layer outputThe total amount of feature map, the content loss function ensures that the content of both images is similar. Countering losses l _D ，

The loss function is not logarithmic here compared to conventional GAN, and the counter-loss ensures that the generator gets the same distribution as the original data.

5 evaluation of reconstruction Effect

The subjective and objective quality evaluation is carried out on the obtained results, 15 scoring staff are selected for the subjective quality evaluation of the image to respectively score the reconstruction results obtained by different algorithms on the set5, the set14 and the BSD100, the subjective quality of the image is measured by adopting a subjective quality scoring method (MOS: mean Opinion Score), and the scoring staff needs to subjectively give a score between 1 and 5 scores to the results obtained by each method, wherein the score 5 represents a clear picture with good quality; a score 1 represents a very blurred picture, severely impeding viewing. The grader objectively grades 12 versions of the model at set5, set14 and BSD100, and the objective evaluation index we use the peak signal-to-noise ratio (PSNR) and the Structural Similarity Index (SSIM) as evaluation criteria, PSNR measures the differences between images based mainly on the differences between corresponding image pixels, and Table 1 shows all the comparison results under 2,4, 8-fold sampling factors. Table 2 shows the MOS index at four times amplification.

Table 1 different algorithms reconstruct the resulting MOS values at x 4 magnification

Table 2 various algorithms were run on three test sets for x 2, x 4, x 8 rate reconstruction to obtain average PSNR/SSIM values

/>

Our results can be seen from the objective quality assessment of table 1, the PSNR, SSIM of the restored pictures are somewhat different than CNN-based networks, but exceed SRGAN of GAN-based networks. From table 2 it can be seen that our subjective quality assessment (MOS) exceeds the previous framework.

The conventional anti-loss function in the loss function of the generation network is the Minimum Square Error (MSE), and the objective evaluation index of the result obtained by using the pixel point-based function reconstruction is higher, but the phenomenon of losing high-frequency information and being too smooth exists. We use a 1-norm based loss function as the content loss function. We show the partial results of subjective and objective quality assessment of different loss functions in table 3.

Table 3X 4 subjective and objective evaluation index of different loss functions

From fig. 2, it can be seen that L1 has better perceived quality than the MSE reconstructed result, and from the locally amplified result, (b), (d), (f), (h), (j), (L) obtains more texture details, so that a result which is more similar to the original image in subjectivity can be generated, and the experimental result again proves that the subjective and objective evaluation index has a certain access.

We have also made a series of experiments to demonstrate the effectiveness of our proposed SISR framework and loss function. In fig. 3 we have compared with some advanced ideas. We only list the 4-fold amplified comparison of 3 of LapSRN [10], VDSR [5], and SRGAN [11 ]. To better show the effectiveness of our method we have selected small areas in the picture that are not easily recoverable to zoom in. From fig. 3 we can see that the picture we reconstruct is clearer at some texture details. For example, the color at the beak and the texture of the beak are clearer than those recovered by LapSRN, VDSR. Of course, we also compare other CNN-based approaches, such as SRCNN [4]. In contrast, our approach generates richer texture details over other advanced approaches.

In order to generate a result which is more similar to an original image subjectively, a super-resolution model based on generation of an antigen network image is built, a test result on a disclosed standard data set shows that the method obtains a more realistic result on general image super-division, the details of textures, colors and the like of the generated image are more in line with the viewing habit of human eyes, and the highest MOS value is obtained compared with the traditional super-division method based on a convolutional neural network. The continuous game of the discriminators and the generators in the GAN network enables details of the generated images to be continuously rich and more close to real images, but the obtained details are not guaranteed to be detail parts of the real images, and noise generated by the network per se can be doped, so that PSNR values are not high, the algorithms are not suggested to be adopted in the field of medical images, and the algorithms have great application value in most fields of image reconstruction. Next, intensive studies on the countermeasure training mechanism of GAN have been conducted, and it has been desired to obtain a model that is more excellent in subjective and objective performance.

Reference to the literature

[1]W.Shi,J.Caballero,C.Ledig,X.Zhuang,W.Bai,K.Bhatia,A.Marvao,T.Dawes,D.ORegan,and D.Rueckert.Cardiac image super-resolution with global correspondence using multi-atlas patchmatch[C].In MICCAI,2013.

[2]M.W.Thornton,P.M.Atkinson,and D.a.Holland.Subpixel mapping of rural land cover objects from fine spatial resolution satellite sensor imagery using super-resolution pixel-swapping[J].International Journal of Remote Sensing,27(3):473–491,2006.1

[3]W.Zou and P.C.Yuen.Very low resolution face recognition problem[J].IEEE Transactions on image processing,21(1):327–340,2012.1

[4]J.Kim,J.K.Lee,and K.M.Lee.Accurate image super resolution using very deep convolutional networks[C].In CVPR,2016.1,2,3,5,6,7,8

[5]C.Dong,C.C.Loy,K.He and X.Tang,"Image Super-Resolution Using Deep Convolutional Networks,"in IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.38,no.2,pp.295-307,Feb.2016.

[6]Kim J,Lee J K,Lee KM.Deeply-Recursive Convolutional Network for Image Super-Resolution[C].2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR),pp.1637-1645,June.2016.

[7]Kim J,Lee J K,Lee K M.Accurate image super-resolution using very deep convolutional networks[C].2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR),pp.1646-1654,June.2016.

[8]Tai Y,Yang J,Liu X.Image Super-Resolution via Deep Recursive Residual Network[C]//2017IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE Computer Society,pp.2790-2798,July.2017.

[9]Shi W,Caballero J,Huszár,Ferenc,et al.Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network[J].2016.

[10]Lai W S,Huang J B,Ahuja N,et al.Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution[C]//IEEE Conference on Computer Vision&Pattern Recognition.pp.5835-5843,July.2017.

[11]Ledig C,Theis L,Huszar F,et al.Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network[C].2017IEEE Conference on Computer Vision and Pattern Recognition(CVPR).pp.105-114,July.2017.

[12]Wu B,Duan H,Liu Z,et al.SRPGAN:Perceptual Generative Adversarial Network for Single Image Super Resolution.arXiv:1712.05927v2[cs.CV].pp.1-9,Dec.2017.

[13]Wang X,Yu K,Wu S,et al.ESRGAN:Enhanced Super-Resolution Generative Adversarial Networks.arXiv:1809.00219v2[cs.CV].pp.1-23,Sep.2018.

[14]Zhang Y,Tian Y,Kong Y,et al.Residual Dense Network for Image Super-Resolution[C].2018IEEE/CVF Conference on Computer Vision and Pattern Recognition.pp.2472-2481,June.2018.

[15]Simonyan K,Zisserman A.Very Deep Convolutional Networks for Large-Scale Image Recognition.arXiv:1409.1556v6[cs.CV].pp.1-14,Apr.2015.

[16]H.Chang,D.-Y.Yeung,and Y.Xiong.Super-resolution through neighbor embedding[C].In CVPR,2004.1,8

[17]C.G.Marco Bevilacqua,Aline Roumy and M.-L.A.Morel.Low-complexity single-image super-resolution based on nonnegative neighbor embedding[C].In BMVC,2012.1,2,6,8

[18]D.Glasner,S.Bagon,and M.Irani.Super-resolution from a single image[C].In ICCV,2009.1

[19]J.Yang,J.Wright,T.Huang,and Y.Ma.Image super resolution via sparse representation[J].IEEE Transactions on image processing,19(11):2861–2873,2010.1,5

[20]R.Zeyde,M.Elad,and M.Protter.On single image scale-up using sparse-representations[J].In Curves and Surfaces,pages 711–730.Springer,2012.1,5,8

[21]E.Perez-Pellitero,J.Salvador,J.Ruiz-Hidalgo,and B.Rosenhahn.PSyCo:Manifold span reduction for super resolution[C].In CVPR,2016.1,6,7,8

[22]S.Schulter,C.Leistner,and H.Bischof.Fast and accurate image upscaling with super-resolution forests[C].In CVPR,2015.1,5,6,7,8

[23]R.Timofte,V.D.Smet,and L.V.Gool.A+:Adjusted anchored neighborhood regression for fast super-resolution[C].In ACCV,2014.1,8

[24]http://cvlab.postech.ac.kr/～mooyeol/pascal_voc_2012/

[25]C.M.Bevilacqua,A.Roumy,and M.Morel.Low-complexity single image super-resolution based on non negative neighbor embedding[C].British Machine Vision Conference,2012

[26]R.Zeyde,M.Elad,M.Protter,On single image scale-up using sparse-representations[C].International conference on curves and surfaces.Springer,2010:711-730.

[27]D.Martin,C.Fowlkes,D.Tal,and J.Malik.Adatabase of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C].In ICCV,2001.5。

[28]Radford,Alec,L.Metz,and S.Chintala."Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks."Computer Science(2015).

[29]Ronneberger O,Fischer P,Brox T.U-Net:Convolutional Networks for Biomedical Image Segmentation[J].2015。

Claims

1. A dense connection generation countermeasure network single image super-resolution reconstruction method is characterized by comprising a generation network and a countermeasure network, wherein the generation network adopts a basic framework of a residual dense network RDN, 5 dense connecting block DCB blocks are used as basic modules, the countermeasure network adopts deep convolution to generate a countermeasure network DCGAN discriminator network framework, a low-resolution image is used as input to be sent to the generation network to be processed, the obtained output is sent to the countermeasure network to be judged, a judgment result is fed back to the generation network through a loss function, and the above steps are circulated until the countermeasure network is judged to be qualified, the generation network can generate a clear image, then the super-resolution reconstruction of the low-resolution image is completed by utilizing the trained generation network, the front two layers of the basic framework of the generation network are shallow layer feature extraction layers, and the core size and the core number are (3, 64); the middle is a feature extraction layer and consists of 5 DCB modules, the output of each module is sent to a connection layer concat layer, and the concat layer is followed by a bottleneck layer with the core size and number of (1, 64); then, taking the output of the bottleneck layer and the output of the first layer as residual errors; finally, an up-sampling layer is arranged, and the core size, the step length and the number are (6,2,2,3);

the method comprises the steps that a long step convolution is utilized to replace an up-sampling layer in a deep generation countermeasure network DCGAN, a normalization layer normalizes output of a feature layer, an activation function is adjusted in a discriminator to prevent gradient sparseness, the countermeasure network based on the DCGAN is composed of a convolution block, 6 CBL blocks and a Dense connection, a full-connection layer Dense1024 with 1024 output as 1024 and a full-connection layer Dense1 with 1 output are realized by the convolution layer by adopting a LeakyReLU as an activation function delta in the CBL blocks, an output value is finally obtained through a sigmoid function, the size of convolution kernels in the network is 3 multiplied by 3, and the filling is 1;

loss function

The three-part weighting combination is formed by the following three parts:

the first part is l ⁱ _mage Is an L1 norm loss function based on pixel points,

wherein G is ⁱ (x) An image X representing the input ith low resolution image and obtained by the generator and having its resolution increased ⁱ N represents the number of images for the corresponding original image; the content loss function lVGG of the convolutional neural network VGG16,

training the model to obtain the result G ⁱ (x) With the original clear image X ⁱ Respectively sending the feature images into a pre-trained VGG16 network, and calculating Euclidean distance phi between feature images obtained by a kth convolution layer _k,j The j Zhang Tezheng graph representing the kth convolutional layer output of VGG16, N representing the total amount of feature graphs output by the kth convolutional layer, the content loss function ensuring that the content of the two images is similar to combat loss l _D ，

2. The method for reconstructing super-resolution of single image of dense connection generation countermeasure network according to claim 1, wherein the training set is required to be produced and data preprocessed:

firstly, carrying out downsampling processing on an original high-resolution color image to obtain a corresponding low-resolution image, simulating the low-resolution image obtained under the real condition by using the downsampling processing on the high-resolution image by taking the low-resolution image as input and simultaneously utilizing a bicubic interpolation formula:

I _lr ＝W(x)*I _hr

wherein I is _lr For downsampled low resolution images, I _hr For high resolution images, W (x) is a bicubic interpolation weight matrix, according to I _lr And I _hr The distance x of the corresponding pixel point in the image is calculated

3. The method for reconstructing super-resolution of single image of dense connection generation countermeasure network according to claim 1, wherein each DCB block comprises four convolution layers Conv1, 2,3,4 and a bottleneck layer Conv5, after each convolution layer, there is a cascade operation to implement dense connection in residual, and the bottleneck layer at the end of DCB is a local feature fusion layer for fusing a large number of feature graphs;

the convolution kernel size of the four convolution layers in the DCB is set to 3×3, and the kernel size of the last bottleneck layer is set to 1×1, assuming that the input and output of the D-th DCB block are D, respectively _d-1 And D _d ，D _c Denoted as output of the 4 th concat layer, then:

D _c ＝f _cat4 f _cr4 (f _cat3 f _cr3 (f _cat2 f _cr2 (f _cat1 f _cr1 (D _d-1 )))) ₍₁₎

wherein f _cri Indicating convolution of the ith convolution layer with the ReLU layer, reLU activation operation, i=1, 2,3,4, f _cati Concat concatenation operation representing the ith convolutional layer, using f _bo Representing convolution operations in the bottleneck layer, the output of the DCB is expressed as:

D _d ＝f _bo (D _c ) (2)