CN110807752A

CN110807752A - Image attention mechanism processing method based on convolutional neural network

Info

Publication number: CN110807752A
Application number: CN201910896954.8A
Authority: CN
Inventors: 陈旋; 吕成云; 张玉立
Original assignee: Jiangsu Ai Jia Household Articles Co Ltd
Current assignee: Jiangsu Ai Jia Household Articles Co Ltd
Priority date: 2019-09-23
Filing date: 2019-09-23
Publication date: 2020-02-18
Anticipated expiration: 2039-09-23
Also published as: CN110807752B

Abstract

The invention relates to an image attention mechanism processing method based on a convolutional neural network, which adopts a brand-new control logic, combines the convolutional neural network and introduces an attention mechanism on the basis of a residual error module, wherein in order to better process redundant information, the spatial position attention and channel attention processing is firstly carried out on a target image, so that redundant information generated by image superposition is removed, then fusion processing is carried out on the target image by butting a preset convolutional layer, and finally, the input and the output of the module are connected by adopting a jump connection method to obtain a processing result aiming at the target image, so that the spatial position attention and channel attention addition can be more accurately realized.

Description

Image attention mechanism processing method based on convolutional neural network

Technical Field

The invention relates to an image attention mechanism processing method based on a convolutional neural network, and belongs to the technical field of image processing.

Background

With the advent of the big data age, deep neural networks, i.e., deep learning, have been developed dramatically and have been successfully applied in many industrial fields. As one of the deep neural networks, the convolutional neural network is widely used in the field of image processing. When the traditional machine vision method is used for image processing, characteristics need to be designed manually, the manually designed characteristics cannot cope with conditions of complicated and variable light, color, texture and the like, and the processing effect is not good.

The convolutional neural network method is to adopt convolutional neural network to establish an image processing model, compared with the traditional machine vision method, the image processing model does not need to be designed manually, but is obtained by learning of the network, and therefore the model can deal with conditions of complex and changeable light, colors, textures and the like. At present, the convolutional neural network has great success in many directions in the image processing fields of image recognition, semantic segmentation, target detection, human body posture estimation and the like, and some expressions of the convolutional neural network exceed those of human beings. As with humans, a better result can be achieved by adding a mechanism of attention when the convolutional neural network processes images.

The attention mechanism in the current image processing mainly comprises two aspects, namely information about spatial position and characteristic map channels. Attention mechanisms in terms of spatial location information may allow the network to better process information about spatial location, giving a large weight to regions of interest and a small weight to regions not of interest. However, in the current method, they are used separately, or convolution operation is performed first, and then attention operation of spatial position and channel is performed in sequence, so that redundant information existing in the feature map after superposition cannot be processed well.

Disclosure of Invention

The technical problem to be solved by the invention is to provide an image attention mechanism processing method based on a convolutional neural network, wherein brand-new control logic is adopted, and the spatial position attention and channel attention addition can be more accurately realized by combining the convolutional neural network.

The invention adopts the following technical scheme for solving the technical problems: the invention designs an image attention mechanism processing method based on a convolutional neural network, which is used for realizing attention processing aiming at a target image and comprises the following steps:

a, copying a target image as an original target image, and then entering the step B;

b, adding spatial attention information to the target image for updating to obtain an updated target image, and then entering the step C;

c, adding channel attention information for the target image to update, obtaining an updated target image, and then entering the step D;

d, adopting preset type convolution layers with preset layers, carrying out fusion processing on the target image, updating the target image, and then entering the step E;

and E, adding the pixel values of all the pixel points on the original target image and the pixel values of the pixel points at the same position on the target image respectively to obtain a final target image which is used as a result of attention processing on the target image.

As a preferred technical scheme of the invention: the step B comprises the following steps B1 to B2;

step B1, according to the preset weight of each preset division area on the target image, obtaining an image with the same size as the target image as a weight image, wherein the pixel value of each pixel point in the weight image is the preset weight of the preset division area where the pixel point at the same position on the target image is, and then entering the step B2;

and step B2, respectively aiming at the channel images of the preset types of channels corresponding to the target image, respectively multiplying the corresponding attribute values of the pixel points in the channel images by the pixel values of the pixel points at the same positions in the weight image, updating the channel images, further updating the channel images of the preset types of channels corresponding to the target image, updating the target image, and then entering the step C.

As a preferred technical scheme of the invention: in step B1, a 1 × 1 convolution layer is used to output an image with the same size as the target image as a weight image, and the pixel value of each pixel point in the weight image is the preset weight of the preset division area where the pixel point at the same position on the target image is located.

As a preferred technical scheme of the invention, the step C comprises the following steps C1 to C3;

step C1, obtaining an N-dimensional array corresponding to the target image according to pixel values of pixel points under each channel of the preset type corresponding to the target image and on each channel image, wherein N represents the channel number of the preset type channel corresponding to the target image, each dimension of the N-dimensional array corresponds to each channel of the preset type corresponding to the target image one by one, and then entering the step C2;

step C2, obtaining the weight of each channel of the preset type corresponding to the target image according to the N-dimensional array, and then entering the step C3;

and step C3, respectively aiming at the channel images of the preset types of channels corresponding to the target image, respectively multiplying the corresponding attribute values of the pixel points in the channel images by the weights of the corresponding channels, updating the channel images, further updating the channel images of the preset types of channels corresponding to the target image, updating the target image, and then entering the step D.

As a preferred technical scheme of the invention: in the step C1, the target image is processed in a global pooling manner according to the pixel values of the pixels of the preset types of the channels and the pixels of the channel image, so as to obtain the N-dimensional array corresponding to the target image.

As a preferred technical scheme of the invention: and D, adopting three layers of convolution layers of preset types, carrying out fusion processing on the target image, and updating the target image.

As a preferred technical solution of the present invention, the three preset types of convolutional layers in step D sequentially include the following:

the first layer adopts a convolution kernel of 1 multiplied by 1, the following layer is connected with a batch normalization layer, and the activation function adopts ReLu; the second layer adopts a convolution kernel of 3 multiplied by 3, the following layer is connected with a batch normalization layer, and the activation function adopts ReLu; the last layer uses a 1 x 1 convolution kernel followed by a batch normalization layer without the use of an activation function.

As a preferred technical scheme of the invention: in the step E, a skip-join method is adopted to add the pixel values of the pixels on the original target image and the pixel values of the pixels at the same position on the target image, so as to obtain a final target image, which is used as a result of performing attention processing on the target image.

Advantageous effects

Compared with the prior art, the image attention mechanism processing method based on the convolutional neural network has the following technical effects by adopting the technical scheme:

the invention designs an image attention mechanism processing method based on a convolutional neural network, which adopts a brand-new control logic and combines the convolutional neural network to introduce an attention mechanism on the basis of a residual error module, wherein in order to better process redundant information, spatial position attention and channel attention processing are firstly carried out on a target image, so that redundant information generated by image superposition is removed, then fusion processing is carried out on the target image by butting a preset convolutional layer, and finally the input and the output of the module are connected by adopting a jump connection method to obtain a processing result aiming at the target image, thereby more accurately realizing spatial position attention and channel attention addition.

Drawings

FIG. 1 is a flow chart of an image attention mechanism processing method based on a convolutional neural network according to the present invention.

Detailed Description

The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.

The invention designs an image attention mechanism processing method based on a convolutional neural network, which is used for realizing attention processing aiming at a target image, and as shown in fig. 1, in practical application, the following steps A to E are specifically executed.

Spatial attention is used to process the spatial information of an image, which mimics the attention mechanism in the human visual system, gives more attention to a point that is desired to be focused on, blurs less important information, and can concentrate on important information. In image processing, the spatial attention mechanism sets corresponding weights for each position in the image according to the importance degree.

And step A, copying a target image as an original target image, and then entering the step B.

And B, adding spatial attention information for the target image to update, obtaining an updated target image, and then entering the step C.

In practical applications, the step B specifically includes the following steps B1 to B2.

And B1, outputting an image with the same size as the target image by adopting a 1 x 1 convolution layer as a weight image according to the preset weight of each preset division area on the target image, wherein the pixel value of each pixel point in the weight image is the preset weight of the preset division area in which the pixel point at the same position on the target image is positioned, and then entering the step B2.

And C, adding channel attention information for the target image, updating to obtain an updated target image, and then entering the step D.

In practical applications, the step C specifically performs the following steps C1 to C3.

And C1, processing the target image in a global pooling mode according to pixel values of pixel points under each channel of the preset type corresponding to the target image and on each channel image to obtain an N-dimensional array corresponding to the target image, wherein N represents the channel number of the preset type channel corresponding to the target image, each dimension of the N-dimensional array corresponds to each channel of the preset type corresponding to the target image one by one, and then entering the step C2.

And C2, acquiring the weight of each channel of the preset type corresponding to the target image according to the N-dimensional array, and then entering the step C3.

Based on the operation of step C, redundant information between the channels of the feature map is removed by adding a weight to each channel.

And D, adopting preset type convolution layers with preset layers, carrying out fusion processing on the target image, updating the target image, and then entering the step E.

Specifically, in the step D, three preset type convolutional layers are adopted, fusion processing is performed on the target image, and the target image is updated, wherein the three preset type convolutional layers sequentially include the following:

And E, adding the pixel values of all pixel points on the original target image and the pixel values of the pixel points at the same positions on the target image by adopting a jump connection method to obtain a final target image as a result of performing attention processing on the target image.

The image attention mechanism processing method based on the convolutional neural network is designed by adopting a brand-new control logic, combines the convolutional neural network and introduces an attention mechanism on the basis of a residual error module, wherein in order to better process redundant information, the spatial position attention and channel attention processing is firstly carried out on a target image, so that redundant information generated by image superposition is removed, then fusion processing is carried out on the target image by butting a preset convolutional layer, and finally the input and the output of the module are connected by adopting a jump connection method to obtain a processing result aiming at the target image, so that the spatial position attention and channel attention addition can be more accurately realized.

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims

1. An image attention mechanism processing method based on a convolutional neural network is used for realizing attention processing aiming at a target image, and is characterized by comprising the following steps:

2. The image attention mechanism processing method based on the convolutional neural network as claimed in claim 1, wherein said step B comprises the following steps B1 to B2;

3. The image attention mechanism processing method based on the convolutional neural network as claimed in claim 2, wherein: in step B1, a 1 × 1 convolution layer is used to output an image with the same size as the target image as a weight image, and the pixel value of each pixel point in the weight image is the preset weight of the preset division area where the pixel point at the same position on the target image is located.

4. The convolutional neural network-based image attention mechanism processing method as claimed in any one of claims 1 to 3, wherein said step C comprises the following steps C1 to C3;

5. The image attention mechanism processing method based on the convolutional neural network as claimed in claim 5, wherein: in the step C1, the target image is processed in a global pooling manner according to the pixel values of the pixels of the preset types of the channels and the pixels of the channel image, so as to obtain the N-dimensional array corresponding to the target image.

6. The image attention mechanism processing method based on the convolutional neural network as claimed in claim 1, wherein: and D, adopting three layers of convolution layers of preset types, carrying out fusion processing on the target image, and updating the target image.

7. The image attention mechanism processing method based on the convolutional neural network as claimed in claim 6, wherein the three layers of convolutional layers of preset type in step D sequentially include the following:

8. The image attention mechanism processing method based on the convolutional neural network as claimed in claim 1, wherein: in the step E, a skip-join method is adopted to add the pixel values of the pixels on the original target image and the pixel values of the pixels at the same position on the target image, so as to obtain a final target image, which is used as a result of performing attention processing on the target image.