CN112150400A

CN112150400A - Image enhancement method and device and electronic equipment

Info

Publication number: CN112150400A
Application number: CN202011081277.3A
Authority: CN
Inventors: 段一平; 陶晓明; 高润东; 韩超诣
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2020-12-29
Anticipated expiration: 2040-10-10
Also published as: CN112150400B

Abstract

The invention provides an image enhancement method, an image enhancement device and electronic equipment, which relate to the technical field of image processing and comprise the steps of obtaining an image to be processed; down-sampling an image to be processed to obtain a first image; processing the first image by using the bottom layer target image enhancement model to obtain a first restored image and a feature map of at least one size of the first image; the method comprises the steps of up-sampling a first restoration image to obtain a second restoration image, and splicing the second restoration image and an image to be processed to obtain a fusion image; and processing the feature map of at least one size of the fused image and the first image by using the top-layer target image enhancement model to obtain a target restoration image. The method utilizes the bottom layer and top layer target image enhancement models to capture the multi-scale and multi-level characteristics of the image to be processed, expands the depth of the neural network, improves the expression capability of the neural network, and effectively relieves the technical problem of poor subjective perception quality of the image enhancement method in the prior art.

Description

Image enhancement method and device and electronic equipment

Technical Field

The present invention relates to the field of image processing technology, and in particular, to an image enhancement method, an image enhancement apparatus, an electronic device, and a computer-readable medium having a processor-executable non-volatile program code.

Background

With the development of internet technology, mobile multimedia data mainly including images and videos is rapidly increasing, and when images are transmitted under the condition of limited bandwidth, image compression must be applied to significantly save coding bit rate. However, commonly used image compression methods often exhibit effects such as blocking, ringing, etc., which affect the picture quality, which can seriously degrade the user experience quality, and in addition, these phenomena can degrade the accuracy of the classification and recognition tasks. Related research indicates that compression quality enhancement can improve the performance of classification and identification. Therefore, there is an urgent need to develop a quality enhancement method of a compressed image.

In the prior art, a U-type network has become an almost unique network infrastructure in the field of image deblurring, and a pixel-by-pixel two-norm loss function is generally adopted. However, research has shown that the subjective perception quality of an image cannot be effectively described by using a pixel-by-pixel two-norm loss function, which is called "perception gap", that is, an image with a higher PSNR (Peak Signal to Noise Ratio) index (or a lower pixel-by-pixel two-norm loss function value) does not necessarily better conform to the subjective perception of a human.

In summary, the image enhancement method in the prior art has a technical problem of poor subjective perception quality.

Disclosure of Invention

The invention aims to provide an image enhancement method, an image enhancement device and electronic equipment, so as to relieve the technical problem of poor subjective perception quality of the image enhancement method in the prior art.

In a first aspect, an embodiment of the present invention provides an image enhancement method, including: acquiring an image to be processed; down-sampling the image to be processed to obtain a first image; processing the first image by using a bottom layer target image enhancement model to obtain a first restored image and a feature map of at least one size of the first image; the first restoration image is up-sampled to obtain a second restoration image, and the second restoration image and the image to be processed are spliced to obtain a fusion image, wherein the space size of the second restoration image is the same as that of the image to be processed; and processing the feature map of at least one size of the fused image and the first image by using a top-layer target image enhancement model to obtain a target restoration image.

In an optional embodiment, the underlying target image enhancement model includes, connected in sequence: a first encoder and a first decoder, the first encoder comprising at least one coding convolutional block, the first decoder comprising at least one decoding convolutional block, the number of coding convolutional blocks and decoding convolutional blocks in the underlying target image enhancement model being the same; the top layer target image enhancement model comprises the following components in sequential connection: a second encoder and a second decoder, the second encoder comprising at least one coding convolutional block, the second decoder comprising at least one decoding convolutional block, the number of coding convolutional blocks and decoding convolutional blocks in the top-level target image enhancement model being the same; and a first feature map in the bottom layer target image enhancement model is subjected to up-sampling and then spliced with a second feature map in the top layer target image enhancement model, and the spliced feature map is input into a second decoder in the top layer target image enhancement model, wherein the first feature map represents a feature map output by a coding volume block connected with the first decoder, the second feature map represents a feature map output by a coding volume block connected with the second decoder, and the feature map obtained by up-sampling the first feature map has the same spatial size as the second feature map.

In an optional embodiment, in the underlying target image enhancement model, a feature map input by a first target decoding convolution block is a feature map obtained by splicing a first target feature map output by a last decoding convolution block of the first target decoding convolution block with a second target feature map obtained by the first encoder, where the first target decoding convolution block is any one of decoding convolution blocks in the first decoder except for a decoding convolution block connected to the first encoder, and the second target feature map is a feature map having the same spatial size as the first target feature map.

In an optional embodiment, in the top-level target image enhancement model, a feature map input by a second target decoding convolution block is a feature map obtained by splicing a third target feature map output by a last decoding convolution block of the second target decoding convolution block with a fourth target feature map obtained by the second encoder, where the second target decoding convolution block is any one of decoding convolution blocks in the second decoder except for a decoding convolution block connected to the second encoder, and the fourth target feature map is a feature map having the same spatial size as the third target feature map.

In an alternative embodiment, the encoded convolutional block comprises, connected in sequence: a first convolution layer, a first activation function layer, a first residual convolution block, and a second residual convolution block; the decoding convolution block comprises the following connected in sequence: a third residual convolution block, a fourth residual convolution block, a deconvolution layer, and a second activation function layer; the residual convolution block comprises the following components connected in sequence: a second convolution layer, a third activation function layer, and a third convolution layer.

In an alternative embodiment, the feature map of at least one dimension of the first image comprises: a signature of the coded convolutional block output connected to the first decoder; processing the feature map of at least one size of the fused image and the first image by using a top-layer target image enhancement model to obtain a target restoration image, wherein the processing comprises the following steps: processing the fused image by using the second encoder to obtain a feature map of the fused image; splicing the fusion image feature map and a feature map output by a coding volume block connected with the first decoder to obtain a spliced feature map; and restoring the splicing characteristic graph by using the second decoder to obtain the target restored image.

In an alternative embodiment, the method further comprises: acquiring a training image set, wherein the training image set comprises a plurality of training image pairs, and the training image pairs are image pairs formed by training fuzzy images and training clear images corresponding to the training fuzzy images; performing downsampling on a target training blurred image to obtain a first training image, wherein the target training blurred image is any training blurred image in the training image set; processing the first training image by using a bottom layer initial image enhancement model to obtain a first training restoration image and a feature map of at least one size of the first training image; the first training restoration image is up-sampled to obtain a second training restoration image, and the second training restoration image and the target training fuzzy image are spliced to obtain a training fusion image, wherein the spatial size of the second training restoration image is the same as that of the target training fuzzy image; processing the feature map of at least one size of the training fusion image and the first training image by using a top-layer initial image enhancement model to obtain a target training restoration image; calculating a function value of a target loss function based on the target training restoration image and a target training clear image corresponding to the target training fuzzy image, wherein the target loss function is a characteristic domain two-norm loss function; and adjusting model parameters of the bottom layer initial image enhancement model and the top layer initial image enhancement model based on the function values to obtain the bottom layer target image enhancement model and the top layer target image enhancement model.

In a second aspect, an embodiment of the present invention provides an image enhancement apparatus, including: the first acquisition module is used for acquiring an image to be processed; the first down-sampling module is used for down-sampling the image to be processed to obtain a first image; the first processing module is used for processing the first image by utilizing a bottom layer target image enhancement model to obtain a first restoration image and a feature map of at least one size of the first image; the first up-sampling module is used for up-sampling the first restoration image to obtain a second restoration image, and splicing the second restoration image and the image to be processed to obtain a fused image, wherein the spatial size of the second restoration image is the same as that of the image to be processed; and the second processing module is used for processing the feature maps of at least one size of the fusion image and the first image by using the top-layer target image enhancement model to obtain a target restoration image.

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, where the memory stores a computer program operable on the processor, and the processor executes the computer program to implement the steps of the method in any one of the foregoing embodiments.

In a fourth aspect, an embodiment of the present invention provides a computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any one of the foregoing embodiments.

The image enhancement method provided by the invention comprises the following steps: acquiring an image to be processed; down-sampling an image to be processed to obtain a first image; processing the first image by using the bottom layer target image enhancement model to obtain a first restored image and a feature map of at least one size of the first image; the method comprises the steps of up-sampling a first restoration image to obtain a second restoration image, and splicing the second restoration image and an image to be processed to obtain a fusion image, wherein the space size of the second restoration image is the same as that of the image to be processed; and processing the feature map of at least one size of the fused image and the first image by using the top-layer target image enhancement model to obtain a target restoration image.

According to the image enhancement method provided by the embodiment of the invention, when the image to be processed is restored, the multi-scale and multi-level characteristics of the image to be processed are captured by using the bottom layer target image enhancement model and the top layer target image enhancement model, and the use of the bottom layer target image enhancement model and the top layer target image enhancement model expands the depth of a neural network and improves the expression capability of the neural network, so that the image quality enhancement effect is improved, and the technical problem of poor subjective perception quality existing in the image enhancement method in the prior art is effectively solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of an image enhancement method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a bottom-layer target image enhancement model and a top-layer target image enhancement model according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a residual convolutional block, an encoded convolutional block and a decoded convolutional block according to an embodiment of the present invention;

FIG. 4 is an original input blurred image provided by an embodiment of the present invention;

FIG. 5 is a Whyte restored image provided by the embodiment of the present invention;

fig. 6 is a Sun restored image according to an embodiment of the present invention;

FIG. 7 is a Nah restored image according to an embodiment of the present invention;

FIG. 8 is a Tao restored image according to an embodiment of the present invention;

FIG. 9 is a restored image obtained by the method of the present invention according to an embodiment of the present invention;

FIG. 10 is a functional block diagram of an image enhancement apparatus according to an embodiment of the present invention;

fig. 11 is a schematic view of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

When images are transmitted under the condition of limited bandwidth, image compression must be applied to remarkably save the coding bit rate, and commonly used image compression methods usually have the effects of blocking effect, ringing and the like, so that the image quality is influenced, the user experience is reduced, and therefore, the research of a quality enhancement method of compressed images is urgently needed. In the prior art, a U-type network is a common network basic framework in the field of image deblurring, but because the U-type network adopts a pixel-by-pixel two-norm as a loss function, the subjective perception quality of an image cannot be effectively described, so that the processed image does not necessarily better conform to the subjective perception of a person. Embodiments of the present invention provide an image enhancement method to alleviate the above-mentioned technical problems.

Example one

Fig. 1 is a flowchart of an image enhancement method according to an embodiment of the present invention, and as shown in fig. 1, the method specifically includes the following steps:

step S102, acquiring an image to be processed.

And step S104, down-sampling the image to be processed to obtain a first image.

Human vision depends on the concept of scale, different scales are needed for observing different objects and obtaining different information, under different scales, images contain different features, and the images can show more texture details under high resolution, but the distance of each area or object of the images at the pixel level becomes far, so that the neural network can model the interaction relation of the two in a higher layer; conversely, the overall structure of the image is more compact under a small resolution, the distances between each region of the image and the object at the pixel level become closer, and the neural network can model the interaction between the two at an earlier layer. Therefore, under the large resolution and the small resolution, the neural network can respectively capture the information of different levels of the image, that is, the features of different levels of the image can be extracted, and the accuracy of image feature description is increased.

In the embodiment of the present invention, after the image to be processed is obtained, the image to be processed is first downsampled to obtain the first image, for example, if the image to be processed is a blurred image of H × W, the downsampled image may be obtained to obtain an image of (H/2) × (W/2).

And step S106, processing the first image by using the bottom layer target image enhancement model to obtain the first restoration image and a feature map of at least one size of the first image.

After downsampling, the first image is more compact in overall structure compared with an image to be processed, deblurring processing is performed on the first image by using the bottom layer target image enhancement model, a first restoration image corresponding to the first image can be obtained, the space size of the first restoration image is the same as that of the first image, and the first restoration image is clearer than that of the first image in subjective perception. The above-mentioned bottom layer target image enhancement model is obtained by training a bottom layer initial image enhancement model, and is used for restoring a blurred image, in the image restoration process, an operation of feature extraction needs to be performed on an image (first image) input by the model, and an operation of image restoration needs to be performed according to the extracted features, so as to obtain an output image (first restored image), and when the feature extraction operation is performed, a feature map of at least one size of the first image can be obtained, and the structure of the bottom layer target image enhancement model will be specifically described below.

And S108, performing up-sampling on the first restoration image to obtain a second restoration image, and splicing the second restoration image and the image to be processed to obtain a fusion image.

Generally, when the image enhancement processing is performed by using the deep learning method, the object of feature extraction is often only the image to be processed, but in the embodiment of the present invention, when the image to be processed is restored, firstly, the small-scale image (the first image obtained after down sampling) is restored once to obtain a first restored image, then the first restoration image is up-sampled to obtain a second restoration image, the space size of the second restoration image is the same as that of the image to be processed, then the second restoration image and the image to be processed are spliced to obtain a fusion image, so that the data source of the top-layer target image enhancement model in the following text is more comprehensive, or it can be understood that the image enhancement operation of the top-layer target image enhancement model is supplemented on the restoration result of the bottom-layer target image enhancement model, rather than directly restoring the blurred to-be-processed image.

And step S110, processing the feature map of at least one size of the fusion image and the first image by using the top-layer target image enhancement model to obtain a target restoration image.

After the top layer target image enhancement model acquires the fused image, the fused image is deblurred by combining with the feature map of at least one size of the first image output by the bottom layer target image enhancement model, thereby obtaining a target restoration image, wherein the top layer target image enhancement model is obtained by training a top layer initial image enhancement model and is used for restoring the blurred image, an operation of feature extraction is required to be performed on the image (fused image) input by the model in the image restoration process, and an operation of image restoration combining the extracted feature map and at least one size of feature map of the first image, and then obtaining an output image (target restoration image), and obtaining a feature map of at least one size of the fused image when performing the feature extraction operation, wherein the structure of the top layer target image enhancement model will be described in detail below.

The process of image enhancement (deblurring) of an image to be processed by using the image enhancement method provided by the embodiment of the present invention is briefly described above, and a detailed description is given below of a specific structure of a network model used for image enhancement.

In an alternative embodiment, the underlying target image enhancement model comprises, connected in sequence: the first encoder comprises at least one encoding convolutional block, the first decoder comprises at least one decoding convolutional block, and the number of the encoding convolutional blocks and the number of the decoding convolutional blocks in the underlying target image enhancement model are the same.

The top layer target image enhancement model comprises the following components in sequential connection: a second encoder comprising at least one coding convolutional block, and a second decoder comprising at least one decoding convolutional block, the number of coding convolutional blocks and decoding convolutional blocks in the top-level target image enhancement model being the same.

And a first feature map in the bottom layer target image enhancement model is subjected to up-sampling and then spliced with a second feature map in the top layer target image enhancement model, and the spliced feature map is input into a second decoder in the top layer target image enhancement model, wherein the first feature map represents a feature map output by a coding volume block connected with the first decoder, the second feature map represents a feature map output by a coding volume block connected with the second decoder, and the feature map obtained after the up-sampling of the first feature map is the same as the spatial dimension of the second feature map.

In the embodiment of the present invention, the encoder-decoder structure may be referred to as a basic backbone network, the basic backbone network receives a blurred image as an input and outputs a sharp deblurred image with the same spatial size as the input image, the encoder is responsible for extracting features of the input image to perform processing, and the decoder is responsible for restoring the sharp image according to the extracted features.

Specifically, the bottom layer target image enhancement model and the top layer target image enhancement model are both in an encoder-decoder structure, and encoders and decoders in the bottom layer target image enhancement model and the top layer target image enhancement model are in mirror symmetry, that is, the number of encoding convolution blocks included in the encoders is the same as the number of decoding convolution blocks included in corresponding decoders, the number of convolution blocks in the target image enhancement model is more, the network can be deepened, and network models of different depths have different performances; the second encoder of the top-level target image enhancement model comprises 4 encoded convolutional blocks, and the second decoder comprises 4 decoded convolutional blocks.

As can be seen from the above description of the bottom layer target image enhancement model and the top layer target image enhancement model, the bottom layer/top layer target image enhancement model is a full convolution network, and there is no full connection layer, so there is no limitation on the size of an input image, a network architecture under the same parameter setting can process images of different resolutions, and it is not necessary to determine network structure parameters specifically for a specific size, and therefore, the image enhancement method provided by the embodiment of the present invention has better universality.

In the embodiment of the invention, in order to perform deblurring processing on an image to be processed to obtain a target restoration image, a bottom layer target image enhancement model and a top layer target image enhancement model are specifically used, and when the top layer target image enhancement model performs deblurring processing on a fused image, a feature map of at least one size of a first image needs to be extracted by combining the bottom layer target image enhancement model. Fig. 2 shows a schematic structural diagram of a bottom layer target image enhancement model and a top layer target image enhancement model, as shown in fig. 2, a first feature map output by a last coding volume block (a coding volume block connected to a first decoder) in the bottom layer target image enhancement model is not only input to the first decoder, but also needs to be subjected to upsampling to obtain a feature map, and to be spliced with a second feature map output by a last coding volume block (a coding volume block connected to a second decoder) in the top layer target image enhancement model, and then the spliced feature map is input to the second decoder in the top layer target image enhancement model, that is, an intermediate feature map with a small resolution is subjected to upsampling to align with an image with a large resolution in spatial dimension, and then is subjected to splicing with a feature map of an image with a large resolution, and sending the spliced new feature graph into a decoder of a large-resolution network so as to realize the association of the bottom layer target image enhancement model and the top layer target image enhancement model.

In an optional embodiment, in the underlying target image enhancement model, the feature map input by the first target decoding convolution block is a feature map obtained by splicing a first target feature map output by a last decoding convolution block of the first target decoding convolution block with a second target feature map obtained by the first encoder, where the first target decoding convolution block is any one of decoding convolution blocks in the first decoder except for a decoding convolution block connected to the first encoder, and the second target feature map is a feature map having the same spatial size as that of the first target feature map.

In the top-level target image enhancement model, a feature map input by a second target decoding convolution block is a feature map obtained by splicing a third target feature map output by a last decoding convolution block of the second target decoding convolution block and a fourth target feature map obtained by a second encoder, wherein the second target decoding convolution block is any one of decoding convolution blocks except for a decoding convolution block connected with the second encoder in a second decoder, and the fourth target feature map is a feature map with the same spatial size as the third target feature map.

In the bottom layer/top layer target image enhancement model, firstly, the size of a feature map is reduced by gradually down-sampling through a multilayer convolutional neural network at an encoder end, the number of feature channels is increased, the feature map necessary for a deblurring task is extracted from an original input image, then the features are input into a decoder, the size of the feature map is increased by gradually up-sampling through the multilayer convolutional network at the decoder end, the number of the feature channels is reduced, decoding is carried out on the processed feature map, and a clear image which is identical to the input size and is subjected to deblurring operation is generated.

In the embodiment of the invention, the idea of jump connection is introduced: and splicing feature maps with the same spatial size at an encoder end and a decoder end in the bottom layer/top layer target image enhancement model together to be used as the input of a corresponding decoding volume block at the decoder end. Therefore, the decoder end can fully and comprehensively utilize the characteristics of information of different levels (the characteristics of a low level (relative to the integral structure, the encoder end is the bottom layer part of the network) in the encoder and the characteristics of a high level (relative to the integral structure, the decoder end is the top layer part of the network) in the decoder) to complete characteristic fusion, help the neural network to better reconstruct a clear image, effectively relieve the gradient diffusion phenomenon, improve the training stability of a depth model and accelerate the convergence of the depth neural network model. If jump connection is not introduced, the model can generate gradient diffusion or degradation phenomenon due to deeper depth, so that the network performance is reduced.

Taking fig. 2 as an example, the spatial size of the feature map output by the coding volume block 3 in the underlying target image enhancement model is the same as the spatial size of the feature map output by the decoding volume block 1, and therefore, the feature map input by the decoding volume block 2 is the feature map obtained by splicing the feature map output by the decoding volume block 1 and the feature map output by the coding volume block 3. And the feature maps input by the other decoding convolution blocks are analogized in sequence, and are not repeated here, and because the spatial size of the feature map output by the coding convolution block 4 is unique in the underlying target image enhancement model, the decoding convolution block 1 only inputs the feature map output by the coding convolution block 4.

In an alternative embodiment, as shown in fig. 3, the encoded convolutional block comprises, connected in sequence: a first convolution layer, a first activation function layer, a first residual convolution block, and a second residual convolution block; decoding the convolutional block comprises sequentially connecting: a third residual convolution block, a fourth residual convolution block, a deconvolution layer, and a second activation function layer; the residual convolution block comprises the following components connected in sequence: in fig. 3, ResBlock denotes a residual convolution block, conv denotes a convolution layer, and deconv denotes an inverse convolution layer.

In the embodiment of the present invention, the coded convolution block first performs a feature channel number transformation or a feature map size transformation by using a first convolution layer, then passes through a first activation function layer, optionally, a reconstructed Linear Unit (Linear rectification function) selectable by an activation function, and then further feature extraction is completed by using a first residual convolution block and a second residual convolution block; and decoding the convolution block, firstly, performing feature extraction on a third residual convolution block and a fourth residual convolution block, then, completing the transformation of the number of feature channels and the transformation of the size of a feature map through a deconvolution layer, and finally, processing through a second activation function layer, wherein any residual convolution block in the above is a structure in which one activation function layer exists between two convolution layers, preferably, the activation function selects ReLU, and the linear rectification function can well relieve the problem of gradient disappearance.

The network structures of the bottom layer target image enhancement model and the top layer target image enhancement model in the embodiment of the present invention are described in detail above, and the following describes a process of obtaining a target restoration image.

In an alternative embodiment, the feature map of at least one dimension of the first image comprises: a signature of the output of the encoded convolution block coupled to the first decoder.

In the step S110, the feature map of at least one size of the fused image and the first image is processed by using the top-layer target image enhancement model to obtain the target restored image, which specifically includes the following steps:

and step 1101, processing the fused image by using a second encoder to obtain a feature map of the fused image.

And step S1102, splicing the fusion image feature map and the feature map output by the coding volume block connected with the first decoder to obtain a spliced feature map.

And S1103, restoring the spliced feature map by using a second decoder to obtain a target restored image.

Optionally, after the image to be processed and the second restored image are spliced to obtain a fused image, the second encoder in the top-layer target image enhancement model performs feature extraction on the fused image to obtain a fused image feature map (a feature map output by the last encoded convolution block), then splices the fused image feature map and a feature map output by the encoded convolution block connected to the first decoder to obtain a spliced feature map, and finally restores the spliced feature map by using the second decoder, so that the target feature map can be obtained.

The last section describes a processing procedure for obtaining a target restoration image when neither the bottom layer target image enhancement model nor the top layer target image enhancement model includes skip connection, and if skip connection is included, after the mosaic feature map is obtained, when the second decoder performs deblurring processing, the final target restoration image can be obtained by not only performing restoration processing on the mosaic feature map but also combining feature maps output by each coding convolution block in the second encoder.

For convenience of description, in the image enhancement method provided in the embodiment of the present invention, the used neural network model may be regarded as introducing a multi-scale structure on two basic backbone networks, and a user may design a multi-scale network structure on a plurality of basic backbone networks according to the design idea.

In the above, the process of obtaining the target restored image by performing the image enhancement processing on the image to be processed by using the bottom layer target image enhancement model and the top layer target image enhancement model is described in detail, and how to train the bottom layer initial image enhancement model and the top layer initial image enhancement model is described below.

In an alternative embodiment, the method of the present invention further comprises the steps of:

step S201, a training image set is obtained, wherein the training image set comprises a plurality of training image pairs, and the training image pairs are image pairs formed by training fuzzy images and training clear images corresponding to the training image pairs.

Step S202, down-sampling is carried out on the target training blurred image to obtain a first training image, wherein the target training blurred image is any training blurred image in the training image set.

Step S203, the first training image is processed by using the underlying initial image enhancement model, so as to obtain a feature map of at least one size of the first training restored image and the first training image.

Step S204, the first training restoration image is up-sampled to obtain a second training restoration image, and the second training restoration image and the target training fuzzy image are spliced to obtain a training fusion image, wherein the spatial size of the second training restoration image is the same as that of the target training fuzzy image.

Step S205, processing the feature map of at least one size of the training fusion image and the first training image by using the top-layer initial image enhancement model to obtain a target training restoration image.

In the above steps S201 to S205, the process of processing the target training blurred image to obtain the target training restored image is the same as the process of performing the image enhancement processing on the image to be processed to obtain the target restored image, and is not described herein again.

And step S206, calculating a function value of a target loss function based on the target training restoration image and the target training clear image corresponding to the target training fuzzy image, wherein the target loss function is a characteristic domain two-norm loss function.

And step S207, adjusting model parameters of the bottom layer initial image enhancement model and the top layer initial image enhancement model based on the function values to obtain a bottom layer target image enhancement model and a top layer target image enhancement model.

After the target training restoration image is obtained, in order to evaluate the performance of the bottom layer initial image enhancement model and the top layer initial image enhancement model, a VGG16 network fixed by parameters pre-trained on an ImageNet data set can be selected to perform feature extraction on the target training restoration image and the target training clear image, a function value of a target loss function is calculated, the function value is then propagated reversely to train model parameters of the bottom layer initial image enhancement model and the top layer initial image enhancement model, and when the model parameter training is converged, the training is ended to obtain the bottom layer target image enhancement model and the top layer target image enhancement model. The network for feature extraction may also use VGG19, which is not specifically limited by the embodiment of the present invention.

The target loss function used in the embodiment of the present invention is a characteristic domain two-norm loss function, and the euclidean distance of each element is calculated between the characteristic representation of the target training clear image and the characteristic representation of the target training restoration image, and as can be seen from the above description, the target loss function can be represented as:

where m denotes the width of the image, n denotes the height of the image, C denotes the number of feature extraction functions used in the calculation of the target loss function, f_kRepresents the kth feature extraction function, a_kThe weighting coefficients of the kth feature extraction function are represented, S represents a target training clear image, O represents a target training restoration image, and (i, j) represents the ith row and jth column elements.

If the VGG16 network is used to perform feature extraction on the target training recovery image and the target training clear image, and it is known that the bottom layer convolutional layer of the VGG16 tends to capture low-layer image semantic information such as edges, and the middle layer convolutional layer tends to capture image semantic information such as textures and colors, so that different feature map calculation feature loss functions extracted from different layers can be complementary to each other, so that the network has a good deblurring effect on both edges and textures, and therefore, the maximum value of the number of the feature extraction functions used in the calculation of the target loss function is the number of the convolutional layers in the VGG 16.

In summary, the image enhancement method provided by the embodiment of the present invention is different from the calculation of the two-norm loss function pixel by pixel in the prior art, but uses the loss function calculation at the feature level, so that the subjective perception quality can be reflected better; in addition, the method introduces a multi-scale and multi-level feature combination strategy, captures multi-scale and multi-level features of the image to be processed, expands the depth of the neural network, and improves the expression capability of the neural network and the effect of enhancing the image quality, thereby effectively relieving the technical problem of poor subjective perception quality of the image enhancement method in the prior art.

The inventor also carries out performance verification on the image enhancement method provided by the invention, and adopts a public data set GoPro which comprises 2103 training data sets and 1111 testing data. The neural network is optimized by adopting an adaptive momentum estimation optimization algorithm (Adam) in the training stage, and the Adam algorithm is widely adopted in deep learning training. The model was trained for a total of 3000 rounds with an initial learning rate set to 0.0001 and then narrowed to 0.3 times the current learning rate for each 1000 rounds of training. From the experimental results, 3000 rounds of training enabled the model to converge sufficiently. In each iteration, we sample two blurred images and randomly crop 256 × 256 image regions as a batch (while in the test our input is the original size, i.e. 720 × 1280), because the original input is 720 × 1280 resolution, if the original resolution of 720 × 1280 is used as input in the training stage, then the display memory requirement cannot be met, and the experimental results show that a 256 × 256 cropped image block contains enough information for the neural network to learn the mapping relationship from the blurred images to the clear images. The input image is normalized to the range 0,1 by dividing by 255 and then subtracting 0.5 to the range-0.5, -0.5. All trainable parameters of the model are initialized using the Xavier initialization method. All experiments were performed on one NVIDIA Titan X. Generally, a complete training process takes about 7-8 days.

The algorithms proposed by Whyte, Sun, Nah and Tao are used as comparison algorithms, and the automatic evaluation indexes are shown in table 1 below, and the visual perception effect is shown in comparison diagrams as shown in fig. 4 to 9, fig. 4 is an original input blurred image provided by the embodiment of the present invention, fig. 5 is a whiyte restored image provided by the embodiment of the present invention, fig. 6 is a Sun restored image provided by the embodiment of the present invention, fig. 7 is a Nah restored image provided by the embodiment of the present invention, fig. 8 is a Tao restored image provided by the embodiment of the present invention, and fig. 9 is a restored image obtained by the method of the present invention.

TABLE 1 Algorithm comparison

It can be seen from table 1 that the method of the present invention is comparable to the comparative algorithm in automated evaluation indexes (PSNR is even better than the comparative algorithm). Besides, one advantage of the method of the present invention over the comparison algorithm is that it takes less time to process a picture (0.26s VS 1.6s), which makes the method of the present invention more practical in some scenes that are very sensitive to time loss (such as processing of video streams, etc.).

In terms of visual effect, the first row of fig. 4 contains three test set partial data of randomly sampled self-training data, and the images contain complex blur patterns, such as severe camera shake (e.g. the second image) and object motion (e.g. the vehicle driving in the middle of the third image), firstly the blur patterns are severe, for example, the license plate number of the first image and the text part of the second image can hardly be recognized, secondly the blur patterns are not isolated from each other, but are unevenly distributed on one image, for example, the third image has both camera shake and object motion blur patterns (the road in the image is still but there is also blur and the vehicle in the image also has blur in motion).

Observing the output result of the Whyte algorithm, it can be found that the Whyte algorithm can hardly recover a clear image, the Whyte algorithm can not recover a picture with good definition on three pictures, and the visual effect is poor, for example, the first picture has black stripes, which seriously affects the sensory quality, and the second picture has distortion in color and false color at the edge of an object, which defects cause the output image quality to be even inferior to the input blurred image. Even on the second graph where camera shake dominates (traditional algorithms can typically only model translation and rotation of the camera), the Whyte algorithm fails to produce high quality images that are sufficiently sharp, which represents a drawback of traditional algorithms in terms of true blur data processing capability.

The Sun algorithm cannot restore an effective clear image, and hardly plays a role in removing blur, because the Sun model is trained on an artificially synthesized data set, and the artificially synthesized data is too simple compared with real blurred data and cannot well represent data distribution of the real world, the Sun algorithm does not play a good role in processing blurred images of the real world.

Nah, there is still a considerable degree of blurring compared to the method of the present invention, and it has been found in experiments that the multi-scale loss function used in Nah algorithm causes the model to be over-fitted, because the scaling of the sharp image (ground channel) causes new distortion, which affects the data distribution, so the model is over-fitted to the output result with low resolution, and the model is in a training state with not improved performance early.

The model of Tao, however, produces a relatively clear restored image. In contrast, the results of the method of the present invention, which are similar to the results of Tao, can produce a clearer image, but the overall appearance of the results of the method of the present invention is sharper, e.g., the text of the blue square area of the second row, which is clearer than the results of Tao, and the moving vehicle on the road in the third column image, which is more visually clear. Moreover, as described above, the method of the present invention is less time-consuming in processing an image, which makes the method of the present invention more suitable for actual production scenarios, because in actual production life, the application scenarios of image deblurring are often some scenarios that need fast reaction, such as monitoring equipment or video stream processing, in which the processing speed of the image and the quality of the image are equally important, and the method of the present invention reduces the delay time by more than 1s compared with the model of Tao.

Example two

An embodiment of the present invention further provides an image enhancement apparatus, which is mainly used for executing the image enhancement method provided in the first embodiment, and the image enhancement apparatus provided in the embodiment of the present invention is specifically described below.

Fig. 10 is a functional block diagram of an image enhancement apparatus according to an embodiment of the present invention, and as shown in fig. 10, the apparatus mainly includes: a first acquisition module 10, a first down-sampling module 20, a first processing module 30, a first up-sampling module 40, a second processing module 50, wherein:

the first acquiring module 10 is used for acquiring an image to be processed.

The first downsampling module 20 is configured to downsample the image to be processed to obtain a first image.

The first processing module 30 is configured to process the first image by using the underlying target image enhancement model to obtain a first restored image and a feature map of at least one size of the first image.

The first upsampling module 40 is configured to upsample the first restoration image to obtain a second restoration image, and splice the second restoration image and the image to be processed to obtain a fused image, where a spatial size of the second restoration image is the same as a spatial size of the image to be processed.

And the second processing module 50 is configured to process the feature map of at least one size of the fused image and the first image by using the top-layer target image enhancement model to obtain a target restoration image.

The image enhancement device provided by the invention comprises: a first obtaining module 10, configured to obtain an image to be processed; the first downsampling module 20 is configured to downsample an image to be processed to obtain a first image; a first processing module 30, configured to process the first image by using the underlying target image enhancement model to obtain a first restored image and a feature map of at least one size of the first image; the first up-sampling module 40 is configured to up-sample the first restoration image to obtain a second restoration image, and splice the second restoration image and the image to be processed to obtain a fused image, where a spatial size of the second restoration image is the same as a spatial size of the image to be processed; and the second processing module 50 is configured to process the feature map of at least one size of the fused image and the first image by using the top-layer target image enhancement model to obtain a target restoration image.

According to the image enhancement device provided by the embodiment of the invention, when the image to be processed is restored, the multi-scale and multi-level characteristics of the image to be processed are captured by using the bottom layer target image enhancement model and the top layer target image enhancement model, and the depth of the neural network is expanded and the expression capacity of the neural network is improved by using the bottom layer target image enhancement model and the top layer target image enhancement model, so that the image quality enhancement effect is improved, and the technical problem of poor subjective perception quality existing in the image enhancement method in the prior art is effectively solved.

Optionally, the bottom layer target image enhancement model includes sequentially connected: the first encoder comprises at least one encoding convolutional block, the first decoder comprises at least one decoding convolutional block, and the number of the encoding convolutional blocks and the number of the decoding convolutional blocks in the underlying target image enhancement model are the same.

Optionally, in the underlying target image enhancement model, the feature map input by the first target decoding convolution block is a feature map obtained by splicing a first target feature map output by a last decoding convolution block of the first target decoding convolution block with a second target feature map obtained by the first encoder, where the first target decoding convolution block is any one of decoding convolution blocks in the first decoder except for a decoding convolution block connected to the first encoder, and the second target feature map is a feature map having the same spatial size as the first target feature map.

Optionally, in the top-level target image enhancement model, the feature map input by the second target decoding convolution block is a feature map obtained by splicing a third target feature map output by a last decoding convolution block of the second target decoding convolution block with a fourth target feature map obtained by the second encoder, where the second target decoding convolution block is any one of decoding convolution blocks in the second decoder except for a decoding convolution block connected to the second encoder, and the fourth target feature map is a feature map having the same spatial size as the third target feature map.

Optionally, the encoding convolutional block includes sequentially connected: a first convolution layer, a first activation function layer, a first residual convolution block, and a second residual convolution block; decoding the convolutional block comprises sequentially connecting: a third residual convolution block, a fourth residual convolution block, a deconvolution layer, and a second activation function layer; the residual convolution block comprises the following components connected in sequence: a second convolution layer, a third activation function layer, and a third convolution layer.

Optionally, the feature map of at least one size of the first image includes: a signature of the coded convolutional block output connected to a first decoder; the second processing module 50 is specifically configured to:

and processing the fused image by using a second encoder to obtain a feature map of the fused image.

And splicing the fused image feature map and the feature map output by the coding volume block connected with the first decoder to obtain a spliced feature map.

And restoring the splicing characteristic diagram by using a second decoder to obtain a target restoration image.

Optionally, the apparatus further comprises:

and the second acquisition module is used for acquiring a training image set, wherein the training image set comprises a plurality of training image pairs, and the training image pairs are image pairs consisting of training fuzzy images and training clear images corresponding to the training fuzzy images.

And the second downsampling module is used for downsampling the target training blurred image to obtain the first training image, wherein the target training blurred image is any training blurred image in the training image set.

And the third processing module is used for processing the first training image by using the bottom layer initial image enhancement model to obtain the first training restored image and the feature map of at least one size of the first training image.

And the second up-sampling module is used for up-sampling the first training restoration image to obtain a second training restoration image, and splicing the second training restoration image and the target training fuzzy image to obtain a training fusion image, wherein the spatial size of the second training restoration image is the same as that of the target training fuzzy image.

And the fourth processing module is used for processing the feature maps of at least one size of the training fusion image and the first training image by using the top-layer initial image enhancement model to obtain a target training restoration image.

And the calculation module is used for calculating a function value of a target loss function based on the target training clear image corresponding to the target training restored image and the target training blurred image, wherein the target loss function is a characteristic domain two-norm loss function.

And the adjusting module is used for adjusting the model parameters of the bottom layer initial image enhancement model and the top layer initial image enhancement model based on the function values to obtain a bottom layer target image enhancement model and a top layer target image enhancement model.

EXAMPLE III

Referring to fig. 11, an embodiment of the present invention provides an electronic device, including: a processor 60, a memory 61, a bus 62 and a communication interface 63, wherein the processor 60, the communication interface 63 and the memory 61 are connected through the bus 62; the processor 60 is arranged to execute executable modules, such as computer programs, stored in the memory 61.

The memory 61 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 63 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.

The bus 62 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 11, but that does not indicate only one bus or one type of bus.

The memory 61 is used for storing a program, the processor 60 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 60, or implemented by the processor 60.

The processor 60 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 60. The Processor 60 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 61, and the processor 60 reads the information in the memory 61 and, in combination with its hardware, performs the steps of the above method.

The image enhancement method, the image enhancement device, and the computer program product of the electronic device provided by the embodiments of the present invention include a computer-readable storage medium storing a non-volatile program code executable by a processor, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings or the orientations or positional relationships that the products of the present invention are conventionally placed in use, and are only used for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the devices or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Furthermore, the terms "horizontal", "vertical", "overhang" and the like do not imply that the components are required to be absolutely horizontal or overhang, but may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the present invention, it should also be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly and may, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An image enhancement method, comprising:

acquiring an image to be processed;

down-sampling the image to be processed to obtain a first image;

processing the first image by using a bottom layer target image enhancement model to obtain a first restored image and a feature map of at least one size of the first image;

the first restoration image is up-sampled to obtain a second restoration image, and the second restoration image and the image to be processed are spliced to obtain a fusion image, wherein the space size of the second restoration image is the same as that of the image to be processed;

and processing the feature map of at least one size of the fused image and the first image by using a top-layer target image enhancement model to obtain a target restoration image.

2. The method of claim 1,

the bottom layer target image enhancement model comprises the following components in sequential connection: a first encoder and a first decoder, the first encoder comprising at least one coding convolutional block, the first decoder comprising at least one decoding convolutional block, the number of coding convolutional blocks and decoding convolutional blocks in the underlying target image enhancement model being the same;

the top layer target image enhancement model comprises the following components in sequential connection: a second encoder and a second decoder, the second encoder comprising at least one coding convolutional block, the second decoder comprising at least one decoding convolutional block, the number of coding convolutional blocks and decoding convolutional blocks in the top-level target image enhancement model being the same;

and a first feature map in the bottom layer target image enhancement model is subjected to up-sampling and then spliced with a second feature map in the top layer target image enhancement model, and the spliced feature map is input into a second decoder in the top layer target image enhancement model, wherein the first feature map represents a feature map output by a coding volume block connected with the first decoder, the second feature map represents a feature map output by a coding volume block connected with the second decoder, and the feature map obtained by up-sampling the first feature map has the same spatial size as the second feature map.

3. The method of claim 2,

in the bottom layer target image enhancement model, a feature map input by a first target decoding convolution block is a feature map obtained by splicing a first target feature map output by a last decoding convolution block of the first target decoding convolution block with a second target feature map obtained by the first encoder, wherein the first target decoding convolution block is any one of decoding convolution blocks except for a decoding convolution block connected with the first encoder in the first decoder, and the second target feature map is a feature map with the same spatial size as the first target feature map.

4. The method of claim 2,

in the top-level target image enhancement model, a feature map input by a second target decoding convolution block is a feature map obtained by splicing a third target feature map output by a last decoding convolution block of the second target decoding convolution block with a fourth target feature map obtained by the second encoder, wherein the second target decoding convolution block is any one of decoding convolution blocks except for a decoding convolution block connected with the second encoder in the second decoder, and the fourth target feature map is a feature map with the same spatial size as the third target feature map.

5. The method of claim 2,

the coding convolution block comprises the following components which are connected in sequence: a first convolution layer, a first activation function layer, a first residual convolution block, and a second residual convolution block;

the decoding convolution block comprises the following connected in sequence: a third residual convolution block, a fourth residual convolution block, a deconvolution layer, and a second activation function layer;

the residual convolution block comprises the following components connected in sequence: a second convolution layer, a third activation function layer, and a third convolution layer.

6. The method of claim 2, wherein the feature map for at least one dimension of the first image comprises: a signature of the coded convolutional block output connected to the first decoder;

processing the feature map of at least one size of the fused image and the first image by using a top-layer target image enhancement model to obtain a target restoration image, wherein the processing comprises the following steps:

processing the fused image by using the second encoder to obtain a feature map of the fused image;

splicing the fusion image feature map and a feature map output by a coding volume block connected with the first decoder to obtain a spliced feature map;

and restoring the splicing characteristic graph by using the second decoder to obtain the target restored image.

7. The method of claim 6, further comprising:

acquiring a training image set, wherein the training image set comprises a plurality of training image pairs, and the training image pairs are image pairs formed by training fuzzy images and training clear images corresponding to the training fuzzy images;

performing downsampling on a target training blurred image to obtain a first training image, wherein the target training blurred image is any training blurred image in the training image set;

processing the first training image by using a bottom layer initial image enhancement model to obtain a first training restoration image and a feature map of at least one size of the first training image;

the first training restoration image is up-sampled to obtain a second training restoration image, and the second training restoration image and the target training fuzzy image are spliced to obtain a training fusion image, wherein the spatial size of the second training restoration image is the same as that of the target training fuzzy image;

processing the feature map of at least one size of the training fusion image and the first training image by using a top-layer initial image enhancement model to obtain a target training restoration image;

calculating a function value of a target loss function based on the target training restoration image and a target training clear image corresponding to the target training fuzzy image, wherein the target loss function is a characteristic domain two-norm loss function;

and adjusting model parameters of the bottom layer initial image enhancement model and the top layer initial image enhancement model based on the function values to obtain the bottom layer target image enhancement model and the top layer target image enhancement model.

8. An image enhancement apparatus, comprising:

the first acquisition module is used for acquiring an image to be processed;

the first down-sampling module is used for down-sampling the image to be processed to obtain a first image;

the first processing module is used for processing the first image by utilizing a bottom layer target image enhancement model to obtain a first restoration image and a feature map of at least one size of the first image;

the first up-sampling module is used for up-sampling the first restoration image to obtain a second restoration image, and splicing the second restoration image and the image to be processed to obtain a fused image, wherein the spatial size of the second restoration image is the same as that of the image to be processed;

and the second processing module is used for processing the feature maps of at least one size of the fusion image and the first image by using the top-layer target image enhancement model to obtain a target restoration image.

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any of claims 1 to 7 when executing the computer program.

10. A computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any of claims 1 to 7.