CN113160198A

CN113160198A - Image quality enhancement method based on channel attention mechanism

Info

Publication number: CN113160198A
Application number: CN202110474102.7A
Authority: CN
Inventors: 颜成钢; 肇恒润; 孙垚棋; 张继勇; 李宗鹏; 张勇东
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2021-07-23

Abstract

The invention discloses an image quality enhancement method based on a channel attention mechanism, which comprises the steps of firstly processing high-quality original images into corresponding low-quality images to obtain a low-quality-high-quality image contrast group; then, an image enhancement network model is built, and the image enhancement network model is trained through the obtained low-quality image; and finally, inputting the low-quality image into the trained image enhancement network model to obtain a high-quality image. The method of the invention adopts the neural network model with the residual error network module and the channel attention module as the image enhancement model to enhance the image, and can output high-quality images with richer details by utilizing low-quality images through the cooperation of the residual error network and the channel attention network model.

Description

Image quality enhancement method based on channel attention mechanism

Technical Field

The invention relates to the field of digital image processing and computer vision, in particular to an image quality enhancement method based on a channel attention mechanism.

Background

Compared with the characters, the images can provide more vivid, easily understood and artistic information, and are an important source for people to forward and exchange information. Image enhancement techniques use low quality, detail-deficient images to produce high quality, detail-rich images.

Currently, image enhancement techniques can be divided into three categories: spatial domain based, frequency domain based and learning based methods. The spatial domain-based method directly acts on image pixels to process the image. The frequency domain based method is to modify the transform coefficient of the image in a certain transform domain of the image, and then to inverse transform the image to the original space domain to obtain an enhanced image. With the rapid development of deep learning, image enhancement algorithms based on learning are more of a focus of research in recent years. According to the method, a large number of high-quality images are adopted to generate a learning model, prior knowledge learned by the learning model is introduced in the process of recovering the low-quality images, and a neural network is trained to search for the corresponding relation between the low-quality images and the corresponding high-quality images, so that richer details are obtained, and a satisfactory image enhancement effect is obtained. The deep learning-based method comprises LL-NET, MBLLEN, LightenNet and the like, the models have certain image enhancement capability, but due to the limitation of the network layer number, the number of detail information which can be extracted by the models is limited, and the generated images are not sharp enough and the details are not rich enough.

Disclosure of Invention

Based on the problems in the prior art, the invention aims to provide an image quality enhancement method based on a channel attention mechanism, which can solve the problems of insufficient sharpness, insufficient details and the like in the conventional image enhancement method.

The technical scheme adopted by the method for solving the technical problems in the known technology is as follows:

an image quality enhancement method based on a channel attention mechanism comprises the following steps:

step one, processing the high-quality original image into a corresponding low-quality image to obtain a low-quality-high-quality image contrast group.

And step two, building an image enhancement network model.

And step three, training an image enhancement network model through the low-quality image obtained in the step one.

And step four, inputting the low-quality image into the trained image enhancement network model to obtain a high-quality image.

According to the technical scheme provided by the invention, the image enhancement method provided by the embodiment of the invention has the beneficial effects that:

the neural network model with the residual error network module and the channel attention module is used as an image enhancement model to enhance the image, and the residual error network and the channel attention network model are matched to output a high-quality image with richer details by utilizing a low-quality image.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a schematic diagram of a structure of an image enhancement network model according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an original low quality image according to an embodiment of the present invention;

fig. 4 is an image output after an original image is enhanced by using the image enhancement method provided by the embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the specific contents of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention. Details which are not described in detail in the embodiments of the invention belong to the prior art which is known to the person skilled in the art.

As shown in fig. 1, an embodiment of the present invention provides an image enhancement method for enhancing a low-quality image by using a neural network model having a residual network module and a channel attention module, including:

step one, processing a high-quality original image into a corresponding low-quality image to obtain a low-quality-high-quality image comparison group, wherein the specific method comprises the following steps:

reducing the quality of the image by JPEG compressing (QF <30) or changing the exposure (increasing or decreasing the exposure within +/-3 EV) to obtain a low-quality-high-quality image picture group; the problem that in practice, low-quality pictures with only single controllable degradation factors cannot be collected at the same time, and a large amount of data cannot be provided for deep learning can be solved.

In order to expand the number of the data set samples and enable the network to have better generalization, the same random inversion and rotation operation is carried out on the paired low-quality-high-quality image contrast groups, and in order to accelerate the training speed, the low-quality-high-quality image contrast groups are cut into 128 × 128 images by carrying out the cutting operation of randomly selecting cutting positions;

step two, building an image enhancement network model;

the image enhancement network model comprises an image enhancement network model and is composed of a down-sampling part, a residual error network part, a channel attention module and an up-sampling part. The down-sampling part comprises two stages of Pixel unscuffle and two-dimensional convolution with convolution kernel size of 3 x 3, the residual network part comprises a residual block part consisting of N residual blocks with the same structure and two-dimensional convolution with convolution kernel size of 3 x 3, and the up-sampling part comprises two stages of Pixel Shuffle and two-dimensional convolution with convolution kernel size of 3 x 3.

The input image is a low-quality image processed in the first step.

The structure of the image enhancement network model is shown in fig. 2, in which:

a Down-sampling part (Down Scale) performs four-time Down-sampling processing on an image by adopting a two-stage Pixel unschuffle method, inputs the image with RGB three channels, firstly reduces the length and width of the image into 1/2 of the input image by using a Pixel unschuffle method with a scaling multiple of 2, the number of channels is changed into 12, then converts the image from 12 channels into 64 channels by using two-dimensional convolution with a convolution kernel size of 3 x 3, then reduces the length and width of the image into 1/4 of the input image by using a Pixel unschuffle method with a scaling multiple of 2, the number of channels is changed into 256, and finally converts the image from 256 channels into 64 channels by using two-dimensional convolution with a convolution kernel size of 3 x 3.

The Residual Block part is N Residual blocks (Residual blocks) with the same structure, each Residual Block is composed of a two-dimensional convolution with a convolution kernel size of 3 x 3, a linear rectification unit and a convolution with a convolution kernel size of 3 x 3, and the number N is 16. The deep network is built by using the plurality of residual blocks, so that shallow information can be transmitted to the deep network layer, the problems of gradient disappearance, network degradation and the like are avoided, and the feature extraction and mapping capability of the network is greatly enhanced. The residual block used here only includes 3 × 3 convolution, linear sorting unit and 3 × 3 convolution, and compared with the conventional ResNet without batch normalization, this modification can greatly improve performance and effectively reduce the video memory usage rate of the GPU.

Conv3 × 3 is a two-dimensional convolution with a convolution kernel size of 3 × 3.

The channel attention module (SeBlock) is composed of 3 x 3 two-dimensional convolution with the step size of 2, a global pooling layer, a full connection layer, a linear rectification unit, a full connection layer and a Sigmoid activation function. SeBlock adopts a brand-new characteristic recalibration strategy, and can explicitly model the interdependence relation between characteristic channels. SeBlock firstly carries out Squeeze operation on the feature graph obtained by convolution to obtain the global feature of a channel level, then carries out Excitation operation on the global feature to learn the relation among channels, also obtains the weight of different channels, and finally multiplies the weight by the original feature graph to obtain the final feature. This attention mechanism allows the model to focus more on the channel features with the greatest amount of information, while suppressing those channel features that are not important. Thereby enabling better utilization of the features.

An upsampling part (Up Scale) performs quadruple upsampling processing on an image by adopting a two-stage Pixel Shuffle method, firstly, the number of channels of a feature map is changed from 64 to 256 through two-dimensional convolution with the convolution kernel size of 3 × 3, then the length and the width of the image are amplified to be 2 times of the input feature map by using a Pixel Shuffle method with the scaling multiple of 2, the number of the channels is changed to 64, then the number of the channels of the feature map is changed from 64 to 256 through two-dimensional convolution with the convolution kernel size of 3 × 3, finally the length and the width of the image are amplified to be 4 times of the input feature map by using a Pixel Shuffle method with the scaling multiple of 2, the number of the channels is changed to 64, and finally the feature map of 64 channels is converted into an RGB three-channel high-quality image output after network enhancement by using two-dimensional convolution with the convolution kernel size of 3 × 3.

Step three, training an image enhancement network model;

the training uses Adam optimizer as optimizer and charbonier Loss as Loss function. In a graph enhancement task, direct corresponding relations between low-quality images and high-quality images are not in one-to-one correspondence and are completely determined, the information content in the low-quality images is far less than that of the high-quality images, one low-quality image can have multiple corresponding high-quality images, the images generated by training of L1 and L2 norms commonly used in deep learning cannot well capture comprehensive characteristics of potential high-quality images, and the enhanced images are often too smooth, so that the Charbonnier Loss with stronger noise resistance is used as a Loss function, a network can have higher convergence speed, and the generated images have sharper details.

In the training process, the initial learning rate is set to be 1e-4, the learning rate of every n epochs is reduced to be half of the current learning rate, the higher learning rate in the early stage can enable the network to be quickly converged, the lower learning rate in the later stage can enable the model to be finely adjusted, the effect of the model is better, the epoch number n is determined according to the sample size of the training data set and the training effect, and the training is finished when the learning rate is lower than 1 e-6.

Inputting the low-quality image into an image enhancement network model to obtain a high-quality image, wherein the specific processing flow is as follows:

the input low quality images are input into a trained image enhancement network model, as shown in fig. 3. The image is changed into a feature map of 64 channels through Down sampling (Down Scale), and then feature extraction and feature mapping are performed through N Residual blocks (Residual blocks), wherein the larger the number N of the Residual blocks is, the deeper the network depth is, the stronger the feature extraction and mapping capability is, but more computing resources are consumed at the same time. And after convolution of 3 × 3, the feature graph output by the Residual Block is added with the 64-channel feature graph before entering the Residual network part, the output feature graph is input into the SeBlock, and different channels of the feature graph are distributed with different weights by applying an attention mechanism. Then, 4 times of upsampling operation is performed to restore the feature map to the size of the input image, and finally, a convolution of 3 × 3 is performed to convert the 64-channel feature map into a 3-channel RGB image for final output, as shown in fig. 4.

Further, the number N of the residual blocks is preferably 16.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image quality enhancement method based on a channel attention mechanism is characterized by comprising the following steps:

processing a high-quality original image into a corresponding low-quality image to obtain a low-quality-high-quality image contrast group;

step two, building an image enhancement network model;

step three, training an image enhancement network model through the low-quality image obtained in the step one;

2. The image quality enhancement method based on the channel attention mechanism as claimed in claim 1, wherein the step one is as follows:

reducing the quality of the image by JPEG compressing (QF <30) or changing the exposure (increasing or decreasing the exposure within +/-3 EV) to obtain a low-quality-high-quality image picture group;

and carrying out the same random inversion and rotation operation on the paired low-quality-high-quality image contrast groups, and carrying out cutting operation of randomly selecting cutting positions on the low-quality-high-quality image contrast groups to cut the low-quality-high-quality image contrast groups into 128-by-128 images in order to increase the training speed.

3. The image quality enhancement method based on the channel attention mechanism is characterized in that the specific method of the step two is as follows;

the image enhancement network model comprises an image enhancement network model and a channel attention module, wherein the image enhancement network model comprises a down-sampling part, a residual error network part and an up-sampling part; the down-sampling part comprises two stages of Pixel Unshuffles and two-dimensional convolution with convolution kernel size of 3 x 3, the residual network part comprises a residual block part consisting of N residual blocks with the same structure and two-dimensional convolution with convolution kernel size of 3 x 3, and the up-sampling part comprises two stages of Pixel Shuffles and two-dimensional convolution with convolution kernel size of 3 x 3;

the input image is a low-quality image processed in the first step;

the structure of the image enhancement network model is as follows:

a Down-sampling part (Down Scale) performs four-time Down-sampling processing on an image by adopting a two-stage Pixel Unshuffle method, inputs the image with RGB three channels, firstly reduces the length and width of the image into 1/2 of the input image by using a Pixel Unshuffle method with a scaling multiple of 2, the number of channels is changed into 12, then converts the image from 12 channels into 64 channels by using two-dimensional convolution with a convolution kernel size of 3X 3, then reduces the length and width of the image into 1/4 of the input image by using a Pixel Unshuffle method with a scaling multiple of 2, the number of channels is changed into 256, and finally converts the image from 256 channels into 64 channels by using two-dimensional convolution with a convolution kernel size of 3X 3;

the residual block part is N residual blocks (ResidualBlock) with the same structure, each residual block is composed of a two-dimensional convolution with a convolution kernel of 3 x 3, a linear rectification unit and a convolution with a convolution kernel of 3 x 3, and the number N is 16;

conv3 × 3 is a two-dimensional convolution with a convolution kernel size of 3 × 3;

the channel attention module (SeBlock) is composed of 3 x 3 two-dimensional convolution with the step length of 2, a global pooling layer, a full connection layer, a linear rectification unit, a full connection layer and a Sigmoid activation function; firstly, carrying out Squeeze operation on a feature map obtained by convolution by SeBlock to obtain global features of channel levels, then carrying out Excitation operation on the global features, learning the relation among channels, obtaining the weights of different channels, and finally multiplying the weights by the original feature map to obtain final features; the attention mechanism enables the model to pay more attention to the channel characteristics with the largest information quantity and restrain the unimportant channel characteristics; thereby enabling better utilization of the features;

an upsampling part (Up Scale) performs quadruple upsampling processing on an image by adopting a two-stage Pixel Shuffle method, firstly, the number of channels of a feature map is changed from 64 to 256 through two-dimensional convolution with the convolution kernel size of 3 × 3, then the length and the width of the image are amplified to be 2 times of the input feature map by using a PixelShuffle method with the scaling multiple of 2, the number of the channels is changed to 64, then the number of the channels of the feature map is changed from 64 to 256 through two-dimensional convolution with the convolution kernel size of 3 × 3, finally the length and the width of the image are amplified to be 4 times of the input feature map by using a Pixel Shuffle method with the scaling multiple of 2, the number of the channels is changed to 64, and finally the feature map of the 64 channels is converted into an RGB three-channel high-quality image which is subjected to network enhancement by using two-dimensional convolution with the convolution kernel size of 3 × 3.

4. The image quality enhancement method based on the channel attention mechanism is characterized in that the specific method in the third step is as follows;

training by adopting an Adam optimizer as an optimizer and a Charbonier Loss as a Loss function;

5. The image quality enhancement method based on the channel attention mechanism as claimed in claim 4, wherein the specific processing flow of the step four is as follows:

inputting a low-quality image into a trained image enhancement network model, changing the image into a feature map of 64 channels through Down sampling (Down Scale), and then performing feature extraction and feature mapping through N residual blocks (ResidualBlock), wherein the larger the number N of the residual blocks is, the deeper the network depth is, the stronger the feature extraction and mapping capability is, but more computing resources are consumed; adding the feature graph output by the Residual Block with a 64-channel feature graph before entering a Residual network part after convolution by 3 × 3, inputting the output feature graph into a SeBlock, and distributing different channels of the feature graph with different weights by applying an attention mechanism; and then, performing 4 times of upsampling operation to restore the feature map to the size of the input image, and finally converting the 64-channel feature map into a 3-channel RGB image through a 3-by-3 convolution for final output.

6. The method of claim 5, wherein the number N of the residual blocks is preferably 16.