CN116485646A

CN116485646A - Micro-attention-based light-weight image super-resolution reconstruction method and device

Info

Publication number: CN116485646A
Application number: CN202310400906.1A
Authority: CN
Inventors: 张乐飞; 毕修平
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2023-07-25

Abstract

The invention discloses a light-weight image super-resolution reconstruction method and device based on micro-attention, and relates to the field of computer vision, wherein the reconstruction method comprises the following steps: constructing a mixed-attention transducer module based on mobile packet convolution, variable multi-headed blueprint self-attention and blueprint feedforward neural network; collecting a preset number of high-low resolution image pairs, cutting the collected images to a preset pixel size, and performing rotation treatment to obtain training images; constructing a light-weight image super-resolution network based on a micro-attention mechanism and a mixed-attention transducer module; training a lightweight image super-resolution network based on the training image, and calculating gradients and updating parameters of the lightweight image super-resolution network by using an Adam optimizer; based on the training light-weight image super-resolution network, shallow layer feature and deep layer feature extraction is carried out on the image to be processed, and image super-resolution reconstruction is completed through common convolution and pixel recombination. The complexity of the model is effectively reduced, and the image processing efficiency is improved.

Description

Micro-attention-based light-weight image super-resolution reconstruction method and device

Technical Field

The invention relates to the field of computer vision, in particular to a light-weight image super-resolution reconstruction method and device based on micro attention.

Background

Image super-resolution reconstruction aims at reconstructing a high-resolution image having texture details and good visual effects from a low-resolution image degraded for various reasons. The super-resolution reconstruction of images is an uncomfortable task, and for a certain low-resolution image input, a plurality of different high-resolution images can be corresponding to the image input, so that the image super-resolution reconstruction is a one-to-many mapping relation. Super-resolution is an important task in the fields of computer vision and image processing, and is widely applied to the fields of target detection, intelligent mobile equipment, large intelligent screen display equipment, medical images, target identification in security monitoring, remote sensing images and the like.

In recent years, with the rise of deep learning in the field of computer vision, achievements far exceeding the conventional methods have been achieved in various visual tasks. Naturally, applying deep learning to image super-resolution is also a current hot spot, and a large number of image super-resolution algorithms based on deep learning emerge like spring bamboo shoots after raining and achieve world first achievements far exceeding the traditional image reconstruction method. Especially in the application of the transducer model (self-attention model) to the field of image processing, researchers find that they possess the feature extraction capability of far-super convolutional neural networks. Transformer also achieves great achievement on image super-resolution reconstruction, however, most methods in the prior art have great parameter quantity and complex network structure, and the model is large in size and low in operation rate. These problems have led to a recognition that it is difficult to deploy on a range of computing resource limited devices, such as mobile devices and edge devices. To solve these problems, a number of lightweight deep learning network models have been proposed.

The IMDN network (Lightweight Image Super-Resolution with Information Multi-distilation network, super-resolution network) is a classical algorithm in the algorithms, but the lightweight super-resolution algorithm represented by the model uses a common convolution kernel to cause partial redundancy parameters; in addition, the common convolution mechanism has a small receptive field, and has the problem that long-distance information dependency cannot be captured; in order to reduce the amount of computation, many lightweight networks perform poorly in detail reconstruction. In summary, these methods do not balance well with model capacity, run time, and super resolution results. Therefore, there is a need to design a solution that can make the model lighter in all aspects while keeping the super-resolution result of the model well.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a light-weight image super-resolution reconstruction method and device based on micro attention. The invention utilizes the recursion idea to extract and decompose deep features into 4 iterative modules, thereby effectively reducing the complexity of the model, improving the information processing efficiency, and compared with the existing lightweight image super-resolution model, the invention reduces the capacity and calculation consumption of the model to improve the reasoning speed of the model while obtaining excellent image super-resolution reconstruction results.

In order to achieve the above purpose, the invention adopts the following technical scheme:

constructing a mixed-attention transducer module based on mobile packet convolution, variable multi-headed blueprint self-attention and blueprint feedforward neural network;

collecting a preset number of high-low resolution image pairs, clipping the collected images to a preset pixel size, and performing rotation processing on the clipped images to serve as training images;

constructing a light-weight image super-resolution network based on a micro-attention mechanism and a mixed-attention transducer module;

training a lightweight image super-resolution network based on the training image, and calculating gradients and updating parameters of the lightweight image super-resolution network by using an Adam optimizer;

based on the training light-weight image super-resolution network, shallow layer feature and deep layer feature extraction is carried out on the image to be processed, and image super-resolution reconstruction is completed through common convolution and pixel recombination.

Based on the technical scheme, for mobile packet convolution, the specific steps of construction include:

initializing a convolution weight with a 3*3 value of 0, dividing the channel into 5 parts, and recording the size of each part as g;

resetting the weight of the (0, 0) position of the [0, g ] channel to 1, focusing on the upper left corner information; resetting the weight of the (0, 2) position of the [ g,2g ] channel to 1, focusing on the upper right corner information; resetting the weight of the (0, 0) position of the [2g,3g ] channel to 1, focusing on the lower left corner information; resetting the weight of the (2, 2) position of the [3g,4g ] channel to 1, focusing on the lower right corner information; the weight of the (1, 1) position of the [4g,5g ] channel is reset to 1, global information is focused on, and packet convolution with a step size of 1 is performed.

On the basis of the technical scheme, the variable multi-head blueprint self-attention is used for setting and controlling the number of heads, extracting Q, K, V, multiplying Q and K to obtain an attention matrix, and multiplying the attention matrix with V to strengthen key information;

where Q is the query matrix, K is the key matrix, and V is the value matrix.

Based on the technical scheme, the blueprint feedforward neural network is constructed based on blueprint convolution and GELU activation functions.

On the basis of the technical scheme, the method for collecting the preset number of high-low resolution image pairs comprises the following specific steps:

and downloading high-resolution images in the DIV2K data set and the Flickr2K data set provided in the image recovery field, and downloading low-resolution images with different downsampling multiples corresponding to the high-resolution images.

Based on the technical scheme, the training image-based light-weight image super-resolution network is used for training, and an Adam optimizer is used for calculating gradients and updating parameters of the light-weight image super-resolution network, and the specific steps comprise:

calculating gradient and updating light-weight image super-resolution network parameters based on micro-attention mechanism by using Adam optimizer, and iterative training for preset times in the calculation and updating process and adopting a loss function L ₁ Evaluating, setting the interest rate as a set value, and adopting cos cosine scienceThe learning rate is reduced;

wherein the loss function L ₁ The method comprises the following steps:

the HR is a high-resolution image, the SR is an image super-resolution reconstruction result, and the n is the number of pixels of the image super-resolution reconstruction result.

Based on the technical scheme, the lightweight image super-resolution network comprises 2 common volumes of 3*3 and 4 transducer module groups;

the group of transducer modules includes 2 mixed-attention transducer modules.

Based on the technical scheme, the training-completed lightweight image super-resolution network is used for extracting shallow features and deep features of an image to be processed, and completing image super-resolution reconstruction through common convolution and pixel recombination, and the method comprises the following specific steps of:

inputting an image to be processed into a light-weight image super-resolution network, and extracting to obtain shallow features based on the common convolution of 3*3;

taking an image to be processed as input, inputting the image to a current transducer module group, and controlling a first mobile grouping convolution to expand a channel;

the feature convolved by the first mobile packet is subjected to channel retraction by an activation function ReLU and by the next mobile packet convolution;

the attention of the channel is enhanced by contrast perception of the channel, the channel important information is added with the characteristic convolved by the first mobile grouping to obtain characteristic information, and the characteristic information is input to the blueprint feedforward neural network;

inputting the characteristic information passing through the blueprint feedforward neural network into the blueprint multi-head self-attention, and extracting to obtain deep characteristics;

the extracted deep features are used as input and input to the next transducer module group, and circulation is carried out according to the sequence until the fourth transducer module group outputs the deep features;

and splicing the deep layer features output by the fourth transducer module group with the extracted shallow layer features, and completing image super-resolution reconstruction through 3*3 common convolution and pixel recombination operation.

On the basis of the technical scheme, the number of the heads of the blueprint multi-head self-attention is respectively set to be 1, 2, 4 and 8.

The invention also provides a light-weight image super-resolution reconstruction device based on micro-attention, which comprises:

the construction module is used for constructing a mixed attention transducer module based on mobile packet convolution, a variable multi-head blueprint self-attention and blueprint feedforward neural network and constructing a lightweight image super-resolution network based on a micro-attention mechanism and the mixed attention transducer module;

the collecting module is used for collecting a preset number of high-low resolution image pairs, clipping the collected images to a preset pixel size, and performing rotation processing on the clipped images to serve as training images;

the training module is used for training the lightweight image super-resolution network based on the training image, calculating gradient by using an Adam optimizer and updating parameters of the lightweight image super-resolution network;

the execution module is used for extracting shallow layer features and deep layer features of the image to be processed based on the light-weight image super-resolution network trained by the training module, and completing image super-resolution reconstruction through common convolution and pixel recombination.

Compared with the prior art, the invention has the advantages that:

(1) The invention provides a light convolution-mobile convolution, which is carried out on channels, each channel pays attention to one direction, only one 3*3 channel convolution and one 1*1 convolution parameter are needed, and meanwhile, the surrounding information can be effectively paid attention to;

(2) The invention provides a miniature blueprint multi-head self-attention and a blueprint feedforward neural network which form the core of the invention, can acquire abundant detail information with less parameter quantity and improve the characteristic expression capability of the network;

(3) The invention provides a mixed attention transducer module based on the components, which strengthens the attention of important information with higher calculation speed and extracts deep features;

(4) According to the invention, the light-weight image super-resolution reconstruction network based on the micro-attention mechanism is built by using the module, so that the super-resolution reconstruction process is divided into 3 processes of shallow feature extraction, deep feature extraction and image reconstruction. The deep feature is extracted and decomposed into 4 iterative modules by using a recursion idea, so that the complexity of a model is effectively reduced, and the information processing efficiency is improved;

(5) Compared with the existing lightweight image super-resolution model, the network provided by the invention can obtain good image super-resolution reconstruction results, and simultaneously reduce the capacity and calculation consumption of the model to improve the reasoning speed of the model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a light-weight image super-resolution reconstruction method based on micro-attention in an embodiment of the invention;

FIG. 2 is a schematic workflow diagram of a quantized image super resolution network according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a workflow of a transducer module group according to an embodiment of the present invention.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments.

Referring to fig. 1, an embodiment of the present invention provides a light-weight image super-resolution reconstruction method based on micro-attention, including the following steps:

s1: constructing a mixed-attention transducer module based on mobile packet convolution, variable multi-headed blueprint self-attention and blueprint feedforward neural network;

s2: collecting a preset number of high-low resolution image pairs, clipping the collected images to a preset pixel size, and performing rotation processing on the clipped images to serve as training images;

s3: constructing a light-weight image super-resolution network based on a micro-attention mechanism and a mixed-attention transducer module;

s4: training a lightweight image super-resolution network based on the training image, and calculating gradients and updating parameters of the lightweight image super-resolution network by using an Adam optimizer;

s5: based on the training light-weight image super-resolution network, shallow layer feature and deep layer feature extraction is carried out on the image to be processed, and image super-resolution reconstruction is completed through common convolution and pixel recombination.

The technical method is realized by a tranformer neural network based on a micro-attention mechanism, and comprises the specific steps of firstly constructing a hybrid attention tranformer module based on mobile group convolution, variable multi-head blueprint self-attention and blueprint feedforward neural network, and capturing element context relations by the module with less parameter quantity to obtain more abundant detail information; then, a large number of pairs of high-resolution images and low-resolution images are collected, the collected images are cut to a preset pixel size, the collected images are cut to a pixel size of 192 x 192 in the embodiment, then, the collected images are randomly rotated by 90 degrees, 180 degrees, 270 degrees and horizontally flipped to perform data enhancement, and the rotated images are used as training images.

Then, based on a micro-attention mechanism and a mixed-attention transducer module, constructing a light-weight image super-resolution network, wherein the light-weight image super-resolution network reduces the parameter number of the model, the floating point calculation amount of the model and the reasoning time of the model while maintaining the performance of a reference model; in order to improve the accuracy of the super-resolution reconstruction of the light-weight image super-resolution network image, training the light-weight image super-resolution network by using the training image obtained in the steps, calculating gradients by using an Adam optimizer (Adaptive MomentEstimation, adaptive gradient momentum estimation optimizer) and updating network parameters when training the light-weight image super-resolution network by using the training image; and then the training light-weight image super-resolution network extracts shallow features and deep features of the image to be processed, and the image super-resolution reconstruction is completed through common convolution and pixel recombination.

Further, for mobile packet convolution, the specific steps of construction are as follows:

The embodiment of the invention designs a mobile grouping convolution, which comprises the steps of firstly initializing a convolution weight with a 3*3 value of 0, dividing a channel into 5 parts, and marking the size of each part as g, wherein the weight of the (0, 0) position of a (0, g) channel is reset to be 1 for paying attention to upper left corner information; resetting the weight of the (0, 2) position of the [ g,2g ] channel to 1 for focusing on the upper right corner information; resetting the weight of the (0, 0) position of the [2g,3g ] channel to 1 for focusing on the lower left corner information; resetting the weight of the (2, 2) position of the [3g,4g ] channel to 1 for focusing on the lower right corner information; the weight of the (1, 1) position of the [4g,5g ] channel is reset to 1 for focusing on global information, and then a block convolution with a step size of 1 is performed.

Further, the variable multi-head blueprint self-attention is used for setting and controlling the number of heads, extracting Q, K, V, multiplying Q and K to obtain an attention matrix, and multiplying the attention matrix with V to strengthen key information; where Q is the query matrix, K is the key matrix, and V is the value matrix.

The embodiment of the invention also introduces a convolution which is lighter than the moving group convolution, namely blueprint convolution, only needs to use the parameter quantity which is less than 1/2 of the common convolution, consists of a point-by-point convolution of 1*1 and a channel-by-channel convolution of 3*3, inputs are respectively subjected to the point-by-point convolution of 1*1, and the obtained characteristic diagram is used as the output of the blueprint convolution after being subjected to the channel-by-channel convolution of 3*3. In addition, the invention also designs a mobile convolution, which is carried out on channels, each channel pays attention to one direction, and only one 3*3 channel convolution and one 1*1 convolution parameter are needed to be spent. On the basis, the invention also designs a miniature blueprint multi-head self-attention, which is used for reducing the calculation cost brought by the attention.

The blueprint multi-head self-attention can be used for setting the number of heads, limiting the quantity of heads, preventing excessive calculation cost, using blueprint convolution to search a matrix (Q), using a key matrix (K), extracting a value matrix (V), multiplying Q by K to obtain an attention matrix, multiplying the attention matrix by V to strengthen key information, reducing the required parameter quantity, and enabling the module to aggregate local and non-local pixel interactions and efficiently focusing on key parts of images. On the basis, the invention also designs a blueprint feedforward neural network based on a blueprint convolution and GELU activation function (Gaussian ErrorLinearUnit ), and meanwhile, the blueprint feedforward neural network introduces the attention of a contrast sensing channel to strengthen the extraction of important information. The blueprint feedforward neural network is constructed based on blueprint convolution and GELU activation functions.

Further, the collecting the preset number of the high-low resolution image pairs specifically includes the following steps:

In collecting a large number of pairs of high-resolution and low-resolution images, the main method is to download DIV2K data sets and another common Flickr2K data set in the field, which are respectively 800 and 2650 training pictures, wherein downsampling multiples of the low-resolution images are respectively 2 times, 3 times and 4 times, which are provided in a global event 'image restoration and enhanced new trend large race' with the most influence in the field of image restoration on a network.

The High-Resolution Image (HR) and the corresponding Low-Resolution Image (LR) of different downsampling multiples in the DIV2K data set and the Flickr2K data set are downloaded from the network. The collected high resolution image may be represented asWherein y is _i Represents the ith image in the high resolution images, H _i And W is _i Respectively representing the height and width of the ith image, and m represents the number of high-resolution images; the low resolution image collected from the network download can be expressed as +.>Wherein (1)>Representing the i-th downsampled multiple s low resolution image, < >>Representing the high of the i-th low resolution image, which is equal to the high H of the corresponding high resolution image _i Divided by downsampling multiple s,/>For the width of the i-th low-resolution image, the principle is the same as that of the high, and m represents the number of low-resolution images with the downsampling multiple s.

Further, the training image-based training of the lightweight image super-resolution network, and the calculation of the gradient and the updating of parameters of the lightweight image super-resolution network by using an Adam optimizer comprise the following specific steps:

calculating gradient and updating light-weight image super-resolution network parameters based on micro-attention mechanism by using Adam optimizer, and iterative training for preset times in the calculation and updating process and adopting a loss function L ₁ Evaluating, setting the learning interest rate as a set value, and adopting cos cosine learning rate to decrease;

wherein the loss function L ₁ The method comprises the following steps:

In the embodiment of the invention, when the training image is used for training the lightweight image super-resolution network, an Adam optimizer is used for calculating gradients and updating the lightweight image super-resolution network parameters based on a micro-attention mechanism. Total iterative training 5×10 ⁵ Sub-employing a loss function L ₁ Evaluation was performed, and the learning rate was initially set to 5×10 ^-4 The cos cosine learning rate is reduced.

Further, the lightweight image super-resolution network comprises 2 common volumes of 3*3 and 4 transducer module groups; the group of transducer modules includes 2 mixed-attention transducer modules.

The lightweight image super-resolution network in the embodiment of the invention comprises 2 common volume and 4 transducer module groups of 3*3, wherein the transducer module groups consist of common volume and residual connection of 2 mixed attention transducer modules (HybridAttentionTransformerBlock, HATB) and 3*3; the mixed attention transducer module is composed of a moving channel attention module composed of two moving group convolutions and a blueprint feedforward neural network, a blueprint multi-head self-attention mechanism and a blueprint self-attention module composed of the blueprint feedforward neural network.

Further, the training-completed lightweight image super-resolution network is used for extracting shallow features and deep features of an image to be processed, and completing image super-resolution reconstruction through common convolution and pixel recombination, and the method comprises the following specific steps of:

In the embodiment of the invention, the super-resolution reconstruction of the image is completed based on a lightweight super-resolution network of the image, and the super-resolution reconstruction is mainly divided into three parts, namely shallow feature extraction, deep feature extraction and image reconstruction.

Referring to fig. 2, when the super-resolution reconstruction of the image is completed based on the light-weight super-resolution network, the shallow features are extracted through a common convolution of 3*3, then deep features are extracted through four iterative Transformer module groups, finally the deep features and the shallow features obtained through extraction are spliced, and the super-resolution reconstruction result of the image is obtained through a common convolution of 3*3 and pixel recombination operation. In FIG. 2, conv-3*3 is 3*3 normal convolution, RATG is a group of transducer modules, and Pixel-shuffle is a Pixel reassembly operation.

Referring to fig. 3, each transducer module group in this embodiment includes a common convolution of two hybrid attention transducer modules and one 3*3; in fig. 3, HATB is a mixed-attention transducer module.

In order to prevent the processing speed from being slow due to overlarge calculated amount of the super-resolution network of the lightweight image, the head number of the multi-head self-attention of the blueprint is respectively set to be 1, 2, 4 and 8, and the head number and the common multi-head self-attention are uniformly set to be big heads, and the progressive head size design ensures the attention of the heavy point characteristic information and prevents the calculated amount from being too large.

The embodiment of the invention also provides a light-weight image super-resolution reconstruction device based on micro-attention, which comprises the following components:

The light-weight image super-resolution reconstruction device based on micro-attention provided by the embodiment of the invention comprises a construction module, a collection module, a training module and an execution module. The construction module is used for constructing a mixed attention transducer module and constructing a light image super-resolution network based on the constructed mixed attention transducer module and a micro-attention mechanism, wherein the mixed attention transducer module is constructed by moving group convolution, variable multi-head blueprint self-attention and blueprint feedforward neural network. The collecting module is used for collecting a large number of high-low resolution image pairs, cutting the collected high-low resolution image pairs to a preset pixel size, then performing rotation processing, and taking the rotated image as a training image. The training module is used for training the lightweight image super-resolution network constructed by the construction module based on the training image obtained by the collection module, and calculating gradient and updating parameters of the lightweight image super-resolution network by using an Adam optimizer in the training process. The execution module is used for extracting shallow layer features and deep layer features of the image to be processed based on the light-weight image super-resolution network trained by the training module, and completing image super-resolution reconstruction through common convolution and pixel recombination.

The foregoing is merely a specific embodiment of the application to enable one skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims

1. The light-weight image super-resolution reconstruction method based on micro-attention is characterized by comprising the following steps of:

2. The micro-attention based light image super-resolution reconstruction method as claimed in claim 1, wherein for the mobile packet convolution, the specific steps of constructing include:

3. The micro-attention-based light-weight image super-resolution reconstruction method as set forth in claim 2, wherein:

the variable multi-head blueprint self-attention is used for setting and controlling the number of heads, extracting Q, K, V, multiplying Q and K to obtain an attention matrix, and multiplying the attention matrix with V to strengthen key information;

where Q is the query matrix, K is the key matrix, and V is the value matrix.

4. A micro-attention based light-weight image super-resolution reconstruction method as defined in claim 3, wherein:

the blueprint feedforward neural network is constructed based on blueprint convolution and a GELU activation function.

5. The micro-attention based light image super-resolution reconstruction method as set forth in claim 4, wherein said collecting a predetermined number of high-low resolution image pairs comprises the steps of:

6. The micro-attention-based light image super-resolution reconstruction method as set forth in claim 5, wherein the training of the light image super-resolution network based on the training image and the calculation of gradients and the updating of parameters of the light image super-resolution network using an Adam optimizer comprises the steps of:

calculating gradient and updating light-weight image super-resolution network parameters based on micro-attention mechanism by using Adam optimizer, and iterative training for preset times in the calculation and updating process and adopting a loss function L ₁ An evaluation is made and the evaluation is carried out,setting the learning interest rate as a set value, and adopting the cos cosine learning rate to decrease;

wherein the loss function L ₁ The method comprises the following steps:

7. The micro-attention-based light-weight image super-resolution reconstruction method as set forth in claim 6, wherein:

the lightweight image super-resolution network comprises 2 common convolution of 3*3 and 4 transducer module groups;

the group of transducer modules includes 2 mixed-attention transducer modules.

8. The method for reconstructing the super-resolution of the light-weight image based on the micro-attention as set forth in claim 7, wherein the training-based light-weight image super-resolution network performs shallow feature extraction and deep feature extraction on the image to be processed, and performs super-resolution reconstruction of the image by common convolution and pixel recombination, the method comprising the steps of:

9. The micro-attention-based light-weight image super-resolution reconstruction method as set forth in claim 8, wherein:

the number of the heads of the blueprint multi-head self-attention is respectively set to be 1, 2, 4 and 8.

10. A light-weight image super-resolution reconstruction device based on micro-attention, characterized by comprising: