CN115375537A

CN115375537A - Nonlinear sensing multi-scale super-resolution image generation system and method

Info

Publication number: CN115375537A
Application number: CN202210748206.7A
Authority: CN
Inventors: 杨爱萍; 魏子浩; 王子麒; 周雅然; 李磊磊; 王金斌
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-11-22

Abstract

The invention discloses a nonlinear sensing multi-scale super-resolution image generation system and a method, comprising a nonlinear sensing multi-scale super-resolution convolutional neural network, wherein the nonlinear sensing multi-scale super-resolution convolutional neural network further comprises a shallow feature extraction sub-network, a multiple cascade residual error nesting sub-network and a reconstruction sub-network; the multi-cascade residual embedding set is connected with a plurality of residual embedding sets through a global multi-cascade mechanism, each residual block is provided with a nonlinear sensing module, a multi-scale feature map selected by the nonlinear sensing module is obtained, the total output of the convolutional layer is obtained, and a super-resolution image with a specified size is obtained through a reconstruction sub-network formed by the sub-pixel convolutional layer and the reconstruction convolutional layer. Compared with the prior art, the method can well restore the structural information of the image, and shows the superiority under the condition of large amplification factor of all data sets; and the texture detail information of the image can be better recovered, and the overall quality of the reconstructed image is higher.

Description

Nonlinear perception multi-scale super-resolution image generation system and method

Technical Field

The invention belongs to the field of computer image processing, and particularly relates to a super-resolution generation method for an image or video.

Background

The natural image comprises a plurality of types of features, and the features of different types correspond to different scales. For example, smooth regions and strong edge regions correspond to larger scales, while features of texture regions correspond to smaller scales.

The Super-Resolution (SR) technique is to reconstruct a corresponding high-Resolution image from an observed low-Resolution image, and has important application value in the fields of monitoring equipment, satellite images, medical images and the like. Super-resolution of an image refers to the recovery of a corresponding high-resolution (HR) image from a low-resolution (LR) image.

The existing super-resolution method based on the convolutional neural network usually only adopts a larger or smaller receptive field, performs feature extraction on a single scale, cannot acquire multi-scale features, and is difficult to reconstruct high-frequency information of an image. Most of the existing multi-scale super-resolution learning networks adopt a linear fusion mode of directly stacking or adding feature maps, extracted feature information has high redundancy, the powerful self-adaptive visual information integration capability of neurons cannot be simulated truly, and the network characterization capability and the super-resolution image reconstruction performance are severely limited.

Disclosure of Invention

Aiming at the problems that the prior most end-to-end super-resolution networks only consider single scale or adopt a linear mode to fuse multi-scale features, so that the detail texture of a restored image is easily lost and the like, the invention provides a system and a method for generating a super-resolution network image of a nonlinear sensing image.

The invention is realized by the following technical scheme:

a nonlinear perception multi-scale super-resolution image generation system comprises a nonlinear perception multi-scale super-resolution convolution neural network, wherein the nonlinear perception multi-scale super-resolution convolution neural network further comprises a shallow layer feature extraction sub-network, a multiple cascade residual error nesting sub-network and a reconstruction sub-network; wherein:

inputting a training data set as the nonlinear perception multi-scale hyper-resolution convolutional neural network, wherein the shallow layer feature extraction sub-network consists of a convolutional layer and is used for completing the rough extraction of input features to obtain a feature map; the multiple cascade residual error nested group sub-network is connected with a plurality of residual error nested groups through a global multiple cascade mechanism, and extracts the deep nonlinear multi-scale features of the image by taking the feature map as input and learning further nonlinear mapping; the multiple cascade residual embedded group is connected with a plurality of residual embedded groups through a global multiple cascade mechanism, each residual embedded group comprises a plurality of residual blocks and a convolution layer, and a residual structure is formed by adopting jump connection; each residual block is provided with a nonlinear sensing module, convolution kernels with different sizes are utilized to obtain the characteristics of the image in different scales, weights are given to the characteristic graphs of the branches of the different convolution kernels to obtain total output, and the multi-scale characteristic graph selected by the nonlinear sensing module is obtained;

the reconstruction subnetwork is composed of a sub-pixel convolution layer and a reconstruction convolution layer, and obtains a super-resolution image with a specified size.

A nonlinear sensing multi-scale super-resolution image generation method specifically comprises the following steps:

step 1, acquiring a total training set and a test set;

step 2, constructing a nonlinear sensing multi-scale hyper-resolution convolution neural network to realize image super-resolution processing, wherein the specific process is as follows:

in a nonlinear sensing multiscale hyper-resolution convolutional neural network, a training data set is used as input, a shallow layer feature extraction sub-network consisting of a convolutional layer conv3 multiplied by 3 finishes the rough extraction of input features, the obtained feature map is input into a multiple cascade residual error nested group sub-network to learn further nonlinear mapping, and the deep nonlinear multiscale features of an image are extracted, the multiple cascade residual error nested group is connected with a plurality of residual error nested groups through a global multiple cascade mechanism, each residual error nested group comprises a plurality of residual error blocks and a convolutional layer conv1 multiplied by 1, a residual error structure is formed by adopting jump connection, each residual error block is provided with a nonlinear sensing module, the nonlinear sensing module utilizes convolutional cores with different sizes to obtain the features of different scales of the image, and the total output is obtained by giving weights to feature maps of branches of different convolutional cores, so that the multiscale feature map selected by the nonlinear sensing module is obtained;

and obtaining the super-resolution image with the specified size through a reconstruction sub-network consisting of the sub-pixel convolution layer and the reconstruction convolution layer.

Compared with the prior art, the invention can achieve the following beneficial technical effects:

1) The method obtains competitive results for all data sets on all scales, and compared with the recent advanced super-resolution network, the method obtains objective indexes (objective evaluation indexes comprise peak signal to noise ratio (PSNR), structural Similarity (SSIM) values and a large improvement on the comparison of algorithm parameter quantity similar to the performance of the model;

2) The invention adopts a nonlinear perception learning method, can well recover the structural information of the image, and shows the superiority under the condition of large amplification factor of all data sets;

3) The method can better restore the texture detail information of the image, and the overall quality of the reconstructed image is higher.

Drawings

FIG. 1 is a schematic diagram of a nonlinear sensing multi-scale super-resolution image generation system architecture according to the present invention;

FIG. 2 is a schematic overall flow chart of the method for generating a nonlinear-sensing multi-scale super-resolution image according to the present invention;

FIG. 3 is a schematic diagram of a nonlinear perceptual multi-scale hyper-convolutional neural network (NPMNet);

FIG. 4 is a diagram of a multiple cascaded residual nested group;

FIG. 5 is a schematic diagram of a non-linear sensing module;

FIG. 6 is a diagram showing the comparison of the detail of "Commic" image;

FIG. 7 is a schematic diagram showing comparison between super-resolution image generation effects of "ground" images;

fig. 8 is a schematic diagram showing comparison between super-resolution image generation effects of "building" images.

Detailed Description

The technical solution of the present invention is described in detail below with reference to the accompanying drawings and specific embodiments.

The relevant basic principle of the invention is briefly described as follows:

LR image I _x Modeled as the output of the degradation process, the expression is as follows:

wherein,

as a degenerate mapping function, I _y For the corresponding HR image, δ is a parameter of the degradation process (e.g., scaling factor or noise). In general, the degradation process (i.e., γ and δ) is unknown, and only LR images are provided. In this case, also known as blind SR, the researcher needs to extract from the LR image I _x Middle-recovery ground real HR image I _y HR approximation of

The following were used:

wherein,

as a super-resolution model, θ is

The parameter (c) of (c).

As shown in fig. 1, a non-linear perceptual multi-scale super-resolution image generation system architecture diagram according to the present invention includes a non-linear perceptual multi-scale super-resolution convolutional neural network, which further includes a shallow layer feature extraction sub-network 100, a multiple cascade residual nesting sub-network 200, and a reconstruction sub-network 300; wherein:

inputting a training data set as the nonlinear sensing multi-scale hyper-resolution convolutional neural network, wherein the shallow layer feature extraction sub-network 100 consists of convolutional layers conv3 multiplied by 3 and is used for completing the rough extraction of input features to obtain a feature map; the multiple cascade residual nested group sub-network 200 is connected with a plurality of residual nested groups through a global multiple cascade mechanism, and extracts the deep nonlinear multi-scale features of the image by taking the feature map as input and learning further nonlinear mapping; the reconstruction subnetwork 300, which obtains the super-resolution image of the specified size, is composed of a sub-pixel convolution layer and a reconstruction convolution layer.

The multiple cascade residual embedded group is connected with a plurality of residual embedded groups through a global multiple cascade mechanism, each residual embedded group comprises a plurality of residual blocks and a convolution layer conv1 multiplied by 1, and a residual structure is formed by adopting jump connection; the output of one convolutional layer conv1 × 1 of each residual block performs a feature map stacking operation, connecting with the input of the next residual block.

And each residual block is provided with a nonlinear sensing module, convolution kernels with different sizes are used for obtaining the characteristics of the image with different scales, the multi-scale characteristics are scaled discriminatively through a cross-channel second-order attention mechanism, the nonlinear fusion of multi-scale information is completed, and finally the multi-scale characteristic diagram selected by the nonlinear sensing module is obtained to obtain the total output of the convolution layer.

Fig. 2 is a schematic flow chart of a nonlinear perceptual multi-scale super-resolution map processing method according to the present invention. The process specifically comprises the following steps:

step 1, acquiring a total training set and a test set: for the training data set, a DIV2K data set is adopted, wherein the DIV2K data set is a newly proposed data set containing high-quality pictures and comprises 800 training pictures, so that the network can learn rich characteristics; for the test Set, set5, set14, BSD100 and Ubran100 four reference data sets were used to evaluate the super-resolution reconstruction performance of the network. Down-sampling the high-resolution image to a specified size by adopting a bicubic linear interpolation mode to manufacture a low-resolution data set;

step 2, constructing a nonlinear perception multi-scale hyper-resolution convolutional neural network (NPMNet), and realizing super-resolution image generation based on the convolutional neural network:

fig. 3 is a schematic diagram of a nonlinear perceptual multi-scale hyper-division convolutional neural network. Fig. 4 is a schematic diagram of a multiple cascaded residual nested group.

The goal of the super-resolution network is to recover more useful information in order to build a more efficient residual learning mechanism.

The nonlinear-aware multiscale hyper-convolutional neural network (NPMNet) of the present invention consists of three sub-networks. Firstly, taking a training data set as input, and finishing the rough extraction of input features by a shallow feature extraction sub-network consisting of convolution layer conv3 multiplied by 3; then, inputting the feature map into a sub-network of a multi-cascade residual nested group (MC-RNG) to learn further nonlinear mapping, and extracting deep nonlinear multi-scale features of the image; the circles with the letter C represent feature map stacking operations;

the multiple cascade residual nested group is a global multiple cascade mechanism connecting several residual nested groups (LRNG), and the structure of each residual nested group (LRNG) adopts a local residual nesting mechanism. Each residual nested group comprises a plurality of residual blocks (residual blocks 1, 1.,. Residual blocks D, 1.. Residual blocks D) and a convolution layer conv1 x 1 (as a bottleneck layer), and a residual structure is formed by adopting skip connection; the output of one convolution layer conv1 x 1 of each residual block performs a feature map stacking operation, connecting with the input of the next residual block. Each residual block contains the residual structure of the non-linear perceptual module (NPMM).

The output of the residual mosaic group (LRNG) in the multiple cascaded residual mosaic group comprises the output of the first residual mosaic group (LRNG) and the output of the nth residual mosaic group (LRNG), respectively expressed as:

wherein,

represents the output of the first residual nested group (LRNG), the input of which is the output F of the shallow feature extraction network ₀ ，

The output of the n-th residual embedding group (LRNG) is shown, the input of which is the output of the convolution layer with the previous convolution kernel of 1 × 1, H _LRNG,n (. Cng) represents a mapping relation included in the nth residual nesting group (LRNG), which is mainly composed of a residual nesting mechanism, and N represents the total number of the residual nesting groups (LRNG);

the output of the d-th residual block in the n-th residual nesting group (LRNG) is represented as:

wherein the input of the d-th residual block in the n-th residual embedding group (LRNG) is

For the output of the d-th residual block in the n-th residual nesting group (LRNG),

the operation of the residual block, D, represents the total number of residual blocks in each residual nested group (LRNG).

The residual nested group (LRNG) uses a skip-join to obtain the output of a residual block, expressed as:

wherein, W _n The parameters of the last convolutional layer of the nth residual nested group (LRNG) make the backbone network focus more on more useful feature information.

Unlike dense connections to convolutional layer-by-convolutional layer features in a densely connected convolutional neural network (DenseNet), the global multiple cascade mechanism reduces the network computational burden by multiplexing the hierarchical features of the integrated residual block.

Fig. 5 is a schematic diagram of the nonlinear sensing module. For the nonlinear sensing module, firstly, convolution kernels with different sizes are used for obtaining the features of the image with different scales, and then the multi-scale features are scaled with more discriminative power through a cross-channel second-order attention mechanism, so that the nonlinear fusion of multi-scale information is completed.

Taking a branch of a nonlinear perceptual module (NPMM) including two convolution kernels of different sizes as an example, assuming that an input is X including C channels, first different convolution operations Conv3 × 3 and Conv5 × 5 are performed on the two branches to obtain two sets of feature maps of H × W size including C channels

And

the method comprises the steps of combining two branch feature maps by adding element by element according to feature information of two images with different scalesObtaining a channel feature map M, and then generating global second-order statistics S of each channel feature map by using global covariance pooling _GCP 。

Learning a feature vector Z of a mutual mapping relation between two scale features by adopting a convolution layer with the convolution kernel size of 1 multiplied by 1, and improving the efficiency by dimension reduction; then, the mapping relation feature vector Z obtains two groups of feature vectors with different scales and different receptive fields through full connection layer learning, and softmax operation is carried out on the two sub-feature vectors in the same dimension to obtain weight vectors a and b; weighting the feature maps of the branches of different convolution kernels to obtain a c-th feature map output O, which is expressed as follows:

wherein, a _c Is the c-th weight vector a, b _c Is the c weight vector b;

by weighting the profiles of the branches of the respective different convolution kernels, to obtain a total output O = [ O = ₁ ,O ₂ ,···,O _C ]Obtaining a multi-scale characteristic diagram selected by the nonlinear sensing module;

and finally, obtaining a high-resolution image with a specified size through a reconstruction sub-network consisting of the sub-pixel convolution layer and the reconstruction convolution layer.

The nonlinear perception module (NPMM) is utilized to realize the self-adaptive extraction of the image multi-scale information and the process of cross-channel learning of the multi-scale information by utilizing a second-order attention mechanism, and the effect of enhancing the key multi-scale characteristics is achieved.

In the step 2, the algorithm for realizing the image super-resolution processing based on the convolutional neural network adopts L ₁ Loss function guided network training, L ₁ Or the MSE loss makes the model more focused on the average loss at the pixel level to achieve higher PSNR and SSIM indices.

The beneficial effects of the invention are verified as follows:

1) Objective performance comparison: the objective evaluation indexes comprise peak signal to noise ratio (PSNR), structural Similarity (SSIM) values and algorithm parameter quantity comparison similar to the performance of the model. The average PSNR value and SSIM value of each method were calculated on the four reference data sets Set5, set14, BSD100, and Ubran100, respectively, and the results are shown in tables 1 and 2. As shown in table 1, is an example of an objective evaluation index. As shown in table 2, a comparative example of the parameter values at a scale of 4 is shown.

TABLE 1

TABLE 2

The comparison result shows that under the condition that the parameter quantity is less than 2million, NPMNet obtains competitive results on all data sets on all scales, and the objective index is greatly improved compared with the advanced super-resolution network in recent years. In particular, the SSIM index, NPMNet, is a great advance over existing methods, even for very deep models. This shows that NPMNet can recover the structural information of the image well by using a nonlinear perception learning method. In addition, NPMNet achieved the highest index at 4 x magnification of all data sets, indicating its superiority at higher magnification.

In order to more intuitively compare the reconstruction effect of each super-resolution algorithm, images which are enlarged by 3 times and difficult to recover are selected from Set14, BSD100 and Ubran100 for subjective visual comparison. FIG. 6 is a schematic diagram showing the comparison of the detail of the "Commic" image. The comparison result of the subjective effect image shows that other networks are fuzzy for the high-frequency detail recovery of the finger and the nail part, and the texture recovered by the NPMNet is clearer and more complete than most of comparison algorithms.

As shown in fig. 7, the "ground" image contrast effect is illustrated. The texture recovered by NPMNet is much clearer than most contrast algorithms. As shown in fig. 8, a diagram of the contrast effect of the "architectural" image is shown. It is clearly observed that NPMNet can recover more transverse striation lines, specifically two more transverse striations than the other models. In summary, the NPMNet proposed herein can better recover the texture detail information of the image, and the reconstructed image has higher overall quality.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the embodiments of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The system is characterized by comprising a nonlinear sensing multiscale super-resolution convolutional neural network, wherein the nonlinear sensing multiscale super-resolution convolutional neural network is further composed of a shallow feature extraction sub-network, a multiple cascade residual embedding sub-network and a reconstruction sub-network; wherein:

inputting a training data set as the nonlinear perception multi-scale hyper-resolution convolutional neural network, wherein the shallow layer feature extraction sub-network consists of a convolutional layer and is used for completing the rough extraction of input features to obtain a feature map; the multiple cascade residual nested group sub-network is connected with a plurality of residual nested groups through a global multiple cascade mechanism, takes the feature map as input, and extracts the nonlinear multi-scale features of the deep layer of the image through learning further nonlinear mapping; the multiple cascade residual embedded group is connected with a plurality of residual embedded groups through a global multiple cascade mechanism, each residual embedded group comprises a plurality of residual blocks and a convolution layer, and a residual structure is formed by adopting jump connection; each residual block is provided with a nonlinear sensing module, convolution kernels with different sizes are utilized to obtain the characteristics of the image in different scales, weights are given to the characteristic graphs of the branches of the different convolution kernels to obtain total output, and the multi-scale characteristic graph selected by the nonlinear sensing module is obtained;

2. The system as claimed in claim 1, wherein the multiple cascade residual nested groups are connected to a plurality of residual nested groups through a global multiple cascade mechanism, each residual nested group comprises a plurality of residual blocks and a convolution layer conv1 x 1, and the residual structure is formed by jump connection; the output of one convolution layer conv1 x 1 of each residual block performs a feature map stacking operation, connecting with the input of the next residual block.

3. The non-linear perception multi-scale super resolution image generation system of claim 1, wherein an output of one convolution layer conv1 x 1 of each residual block performs a feature map stacking operation, connected to an input of a next residual block.

4. A method for generating a nonlinear perception multi-scale super-resolution image is characterized by specifically comprising the following steps:

step 1, acquiring a total training set and a test set;

5. The method of claim 1, wherein the output of one convolutional layer conv1 x 1 of each residual block is connected to the input of the next residual block for feature map stacking.

6. The method for generating super-resolution images at non-linear perception multi-scale according to claim 2 or 3, characterized in that the method comprises the following specific steps:

the output of the residual mosaic group comprises the output of the first residual mosaic group and the output of the nth residual mosaic group (LRNG), respectively expressed as:

wherein,

represents the output of the first residual nested group, the input of which is the output F of the shallow feature extraction network ₀ ，

Represents the output of the n-th residual block set, whose input is the output of the convolutional layer with the previous convolutional kernel of 1 × 1, H _LRNG,n (. The) represents the mapping relation contained in the nth residual nested group, which mainly consists of a residual nesting mechanism, and N represents the total number of the residual nested groups;

the output of the d-th residual block in the n-th residual nested group is represented as:

wherein the input of the d-th residual block in the n-th residual nested group is

For the output of the d-th residual block in the n-th residual nested group,

d, calculating residual blocks, wherein D represents the total number of the residual blocks in each residual embedding group;

the residual nested group adopts jump connection to obtain the output of the residual block, which is expressed as:

wherein, W _n Is the parameter of the last convolutional layer of the nth residual nest.

7. The method for generating the nonlinear-sensing multi-scale super-resolution image according to claim 3, wherein an input of the nonlinear sensing module is X, which includes C channels, different convolution operations are first performed on branches of convolution kernels of different sizes to obtain a plurality of groups of multi-size feature maps including C channels, which correspond to the number of the branches, which include feature information of images of different scales, the branch feature maps are combined by element-by-element addition to obtain channel feature maps, and then global covariance pooling is used to generate a second order of global statistics of each channel feature map;

learning a mapping relation characteristic vector Z between two scale characteristics by adopting a convolution layer with the convolution kernel size of 1 multiplied by 1, then obtaining two groups of characteristic vectors with different scales and different receptive fields by the mapping relation characteristic vector Z through learning of a full connection layer, and performing softmax operation on each branch characteristic vector in the same dimension to obtain weight vectors a and b; weighting the feature maps of the branches of the different convolution kernels to obtain a c-th feature map output O, which is expressed as follows:

wherein, a _c Is the c-th weight vector a, b _c For the c-th weight vector b,

obtaining the multi-scale characteristic diagram selected by the nonlinear sensing module to obtain the total output O = [ O ] of the convolutional layer ₁ ,O ₂ ,···,O _C ]。