CN112580614B

CN112580614B - Hand-drawn sketch identification method based on attention mechanism

Info

Publication number: CN112580614B
Application number: CN202110210499.9A
Authority: CN
Inventors: 郑影; 章依依; 徐晓刚; 王军; 何鹏飞; 曹卫强
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2021-06-08
Anticipated expiration: 2041-02-25
Also published as: CN112580614A

Abstract

The invention discloses a hand-drawn sketch recognition method based on an attention mechanism, which comprises the steps of inputting an original hand-drawn sketch into a deep convolutional neural network to obtain a characteristic diagram output by a last convolutional layer; inputting the feature map into a channel attention module to obtain a feature map optimized based on channel attention; training a classification network for predicting vertical turning of the hand-drawn sketch, and inputting the original hand-drawn sketch into the trained classification network to obtain a vertical turning space attention diagram; calculating to obtain a feature map after the attention optimization of the vertical turnover space based on the feature map after the attention optimization of the channel and the attention map of the vertical turnover space; and finally, outputting the identification result through the full connection layer. The invention has the advantages that: the characteristics of the convolutional neural network are optimized by adopting the channel attention and the vertical turnover space attention, so that the network can be focused on a part with more discrimination in learning, and the identification precision of the hand-drawn sketch is effectively improved.

Description

Hand-drawn sketch identification method based on attention mechanism

Technical Field

The invention belongs to the field of computer vision, relates to a hand-drawn sketch classification task, and particularly relates to a hand-drawn sketch identification method based on an attention mechanism.

Background

The hand-drawn sketch can be regarded as an abstract form on a two-dimensional plane, which not only shows the information to be expressed, but also contains an infinite imaginary space. The system can be conveniently used for drawing objects or scenes, outlining story lines, designing products or buildings and the like, and is widely applied to drawing and designing works, such as cartoon making, city planning and design, building composition, industrial design, clothing design and the like. The related art of hand-drawn sketch recognition can be applied to various fields of computer vision, such as image retrieval and generation, 3D graphic retrieval and reconstruction, and the like, and thus has received increasing attention in recent years.

One major difference in hand-drawn sketch recognition compared to the general object recognition task is that the hand-drawn sketch lacks prominent color and texture information. Furthermore, the lines of the hand-drawn sketch also have obvious shape changes and high abstraction in drawing objects, which makes the task of recognition of the hand-drawn sketch extremely challenging. Early hand-drawn sketch recognition research mainly focuses on designing manual features under a traditional object recognition framework, and although certain achievements are achieved, a very large promotion space still exists in recognition performance. In recent years, methods based on deep learning are widely used in recognition tasks of hand-drawn sketches. However, the high abstraction of the freehand sketch makes it difficult to effectively model the freehand sketch, which sharply reduces the recognition accuracy of the deep network model such as the CNN on the freehand sketch.

Disclosure of Invention

The embodiment of the invention aims to provide a hand-drawn sketch recognition method based on an attention mechanism so as to effectively improve the precision of the existing method in hand-drawn sketch recognition.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

the invention provides a hand-drawn sketch recognition method based on an attention mechanism, which comprises the following steps of:

inputting the original freehand sketch into a deep convolutional neural network to obtain a characteristic diagram output by a last convolutional layer;

inputting the feature map into a channel attention module to obtain a feature map optimized based on channel attention;

training a classification network for predicting vertical turning of the hand-drawn sketch, and inputting the original hand-drawn sketch into the trained classification network to obtain a vertical turning space attention diagram;

calculating to obtain a feature map after the attention optimization of the vertical turnover space based on the feature map after the attention optimization of the channel and the attention map of the vertical turnover space;

and inputting the optimized characteristic graph into the global average pooling layer and the full-connection layer to finally obtain the recognition result of the hand-drawn sketch.

Further, inputting the original hand-drawn sketch into a deep convolutional neural network to obtain a feature map output by a last convolutional layer, wherein the feature map comprises:

inputting the original freehand sketch into the residual error network to obtain a characteristic diagram output by the last convolutional layer Conv5

Wherein

，

、

Are respectively the characteristic diagram

Width, height and channel dimensions.

Further, inputting the feature map into a channel attention module to obtain a feature map optimized based on channel attention, including:

in the characteristic diagram

Carrying out average pooling and maximum pooling operations to respectively obtain dimensionality of

Feature vector of

And

；

the feature vector is combined

And

are respectively input into a convolution kernel of size

And reducing the channel dimension to

Wherein

Set to 16, then activate the reduced-dimension eigenvector with the ReLU function, then input to another convolution kernel of size

And restoring the channel dimension to

Respectively obtaining new feature vectors

And

the calculation formula is as follows:

wherein,

and

are respectively two

Parameters of the convolutional layer;

for the new feature vector

And

adding operation is carried out, and activation is carried out by adopting Sigmoid function to obtain a channel attention diagram

The calculation formula is as follows:

wherein,

indicating that the Sigmoid-activated function,

is a summing operation;

based on the feature map

With the channel attention map

The feature map based on the channel attention optimization is calculated by adopting the following formula

：

Wherein,

representing a multiplication operation.

Further, training a classification network for predicting vertical turning of the hand-drawn sketch, and inputting the original hand-drawn sketch into the trained classification network to obtain a vertical turning spatial attention diagram, comprising:

selecting a most representative TU-Berlin hand-drawn sketch data set as training data, vertically overturning each hand-drawn sketch to obtain an overturned sketch, setting a label of the hand-drawn sketch which is not overturned to be 0, setting a label of the hand-drawn sketch which is vertically overturned to be 1, and constructing a two-class data set containing the original hand-drawn sketch which is not overturned and is vertically overturned; selecting a residual error network as a classifier, and training a classification network for predicting vertical turning of the freehand sketch on the second class of data sets;

inputting the original freehand sketch into the classification network to obtain a predicted class label

And extracting the feature map output by the last convolutional layer

Wherein

Defining the characteristic diagram

The feature map on the single channel is

，

The vertical flip space attention map is calculated as follows

：

Wherein,

it is shown that the multiplication operation is performed,

，

to the full connection layer corresponding to

In a channel

The weight of each category.

Further, the feature map after the attention optimization of the vertical flip space is obtained through calculation based on the feature map after the attention optimization of the channel and the attention map of the vertical flip space in a combined mode, and the method comprises the following steps:

obtaining the feature map after the optimization based on the channel attention

With the vertically flipped space attention map

Then, a feature map after vertical turnover space attention optimization is calculated and obtained in the following mode

：

。

Further, the feature map is used

And inputting a subsequent global average pooling layer and a full-connection layer, and finally obtaining the recognition result on the original hand-drawn sketch.

Compared with the prior art, the design scheme of the invention can achieve the following beneficial effects:

1. the method for identifying the hand-drawn sketch based on the attention mechanism can enable the deep convolutional neural network to focus on a part with more discrimination in feature representation, so that the accuracy of identifying the hand-drawn sketch is effectively improved.

2. The vertical turnover space attention provided by the invention adopts a self-supervision learning mode, can automatically evaluate the importance of different space positions in the feature diagram, and is mutually assisted with the channel attention module to learn more effective feature representation.

3. The channel attention module and the vertical turning space attention module can be embedded into most of the current deep convolutional neural networks and can be used as a standard component for improving the characteristic expression capability of the hand-drawn sketch, so that the channel attention module and the vertical turning space attention module can be applied to other related fields such as hand-drawn sketch segmentation, generation, retrieval based on the hand-drawn sketch and the like.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is a network structure diagram of a hand-drawn sketch identification method based on an attention mechanism according to an embodiment of the present invention;

FIG. 2 is a block diagram of a channel attention module in an embodiment of the present invention;

FIG. 3 is a block diagram of a vertically flipped spatial attention module in an embodiment of the invention.

FIG. 4 is an example of recognition results for different categories of hand-drawn sketches.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a network structure diagram of a hand-drawn sketch identification method based on an attention mechanism according to an embodiment of the present invention; the hand-drawn sketch recognition method based on the attention mechanism provided by the embodiment comprises the following steps of:

step S101, inputting an original hand-drawn sketch into a deep convolutional neural network to obtain a characteristic diagram output by a last convolutional layer; specifically, the method comprises the following steps:

taking the most representative residual error network in the deep convolutional neural network as an example, inputting the original freehand sketch into the residual error network to obtain the feature map output by the last convolutional layer Conv5

Wherein

，

、

Are respectively the characteristic diagram

Width, height and channel dimensions.

Step S103, inputting the feature map into a channel attention module to obtain a feature map optimized based on channel attention; as shown in fig. 2, specifically, the following sub-steps are included:

step S1031 of forming a feature map

Feature vector of

And

(ii) a The invention adopts two different pooling operations of average pooling and maximum pooling, aiming at extracting richer high-level features, thereby increasing the expression capability of the features;

step S1032, the feature vector is processed

And

are respectively input into a convolution kernel of size

Of

Conv and reduce the channel dimension to

Wherein

Setting as 16, and then activating the feature vector after dimensionality reduction by adopting a ReLU function, so that more nonlinearity can be achieved, and complex correlation among channels can be better fitted; then input to another convolutional layer

Conv and restore channel dimension to

Respectively obtaining new feature vectors

And

the calculation formula is as follows:

wherein,

and

respectively two of the convolution layers

Parameters of Conv; the method of firstly reducing the dimension and then restoring the dimension can effectively reduce the number of parameters of network learning, thereby achieving the effect of reducing the complexity of the model;

step S1033, for the new feature vector

And

adding operation is carried out, and activation is carried out by adopting Sigmoid function, thereby obtaining a channel attention diagram fused with double attention

The calculation formula is as follows:

wherein,

indicating that the Sigmoid-activated function,

is a summing operation; according to the method, the importance degree of each characteristic channel is automatically acquired in a learning mode, so that useful characteristics are improved, characteristics with low use on the current task are restrained, and the identification performance of the model can be effectively improved.

Step S1034, based on the feature map

With the channel attention map

：

Wherein,

represents a multiplication operation; the jump connection mode adopted by the invention reserves the characteristic diagram

The information of (2) and the attention information brought by the attention of the channel are added, so that the network learning can be helped to obtain more effective hand-drawn sketch characteristics.

Step S105, training a classification network for predicting vertical turning of the freehand sketch, and inputting the original freehand sketch into the trained classification network to obtain a vertical turning space attention diagram; as shown in fig. 3, specifically, the following sub-steps are included:

step S1051, taking a TU-Berlin hand-drawn sketch data set which is commonly used at present as an example, vertically turning each hand-drawn sketch therein, setting a label of the hand-drawn sketch which is not turned over as 0, setting a label of the hand-drawn sketch which is vertically turned over as 1, and constructing a two-class data set containing the original hand-drawn sketch which is not turned over and is vertically turned over; selecting a residual error network as a classifier, and training a classification network for predicting vertical turning of the freehand sketch on the second class of data sets;

step S1052, inputting the original hand-drawn sketch into the classification network to obtain a predicted class label

And extracting the feature map output by the Conv5 of the last convolutional layer

Wherein

Defining the characteristic diagram

The feature map on the single channel is

，

The vertical flip space attention map is calculated as follows

：

Wherein,

it is shown that the multiplication operation is performed,

，

in the fully-connected layer after the global average pooling layer GAP, corresponds to the second

A channel is

The weight of each category is used for measuring the characteristic diagram of each channel relative to the second category

The importance of the individual categories; the global average pooling layer GAP adopted by the invention can reduce the number of network parameters so as to reduce the occurrence of overfitting, and the extracted features have global receptive fields so as to enhance the expression capability of the features.

The invention trains a network for predicting vertical turning of the hand-drawn sketch in an automatic supervision learning mode, and automatically calculates the response strength at different spatial positions in the characteristic diagram so as to reflect the importance of the different spatial positions.

Step S107, calculating to obtain a feature map after the attention optimization of the vertical turnover space based on the feature map after the attention optimization of the channel and the attention map of the vertical turnover space; specifically, the method comprises the following steps:

obtaining the feature map after the optimization based on the channel attention

With the vertically flipped space attention map

：

Step S109, the characteristic diagram is processed

Inputting the subsequent global average pooling layer GAP and the full connection layer FC to obtain the final hand-drawn sketch recognition result.

As shown in fig. 1, a hand-drawn sketch of the category "airplane" is input, and the recognition result "airplane" is output through the hand-drawn sketch recognition method based on the attention mechanism. Fig. 4 shows an example of the recognition result for more category implementation cases, and it can be seen that the invention can accurately recognize the input hand-drawn sketch.

On a common TU-Berlin hand-drawn sketch data set, compared with a ResNet-50 model as a base line, the method can achieve 2.4% of identification accuracy improvement, and can also achieve about 1% of performance improvement compared with models with more layers, such as ResNet-101, ResNet-152, ResNeXt-101 and the like, so that the effectiveness of the method in hand-drawn sketch identification is proved. The specific results are shown in the following table:

	ResNet-50	ResNet-101	ResNet-152	ResNeXt-101
					base line	77.3%	79.5%	80.3%	80.5%
The invention	79.7%	80.6%	81.2%	81.5%

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A hand-drawn sketch recognition method based on an attention mechanism is characterized by comprising the following steps:

inputting the original freehand sketch into a deep convolutional neural network to obtain an original characteristic diagram output by the last convolutional layer;

inputting the original feature map into a channel attention module to obtain a feature map optimized based on channel attention;

training a classification network for predicting vertical turning of the hand-drawn sketch, and inputting the original hand-drawn sketch into the classification network to obtain a vertical turning space attention diagram;

combining the feature map after the optimization based on the channel attention and the vertical turnover space attention map, and calculating to obtain a feature map after the optimization of the vertical turnover space attention;

inputting the feature graph after the vertical turning space attention optimization into a global average pooling layer and a full-connection layer, and finally obtaining an identification result on the original hand-drawn sketch;

inputting the original freehand sketch into a deep convolutional neural network to obtain an original characteristic diagram output by a last convolutional layer, wherein the method comprises the following steps:

inputting the original hand-drawn sketch into the residual error network by adopting the residual error network as a main network for feature extraction, and extracting an original feature map output by the Conv5 of the last convolutional layer

Wherein

，

、

Respectively are the original characteristic diagram

Width, height and channel dimensions;

inputting the original feature map into a channel attention module to obtain a feature map optimized based on channel attention, wherein the feature map comprises:

a) in the original feature map

Feature vector of

And

；

b) the feature vector is combined

And

are respectively input into a convolution kernel of size

And reducing the channel dimension to

Wherein

And restoring the channel dimension to

Respectively obtaining new feature vectors

And

the calculation formula is as follows:

wherein,

and

respectively two of said convolution kernels having a size of

Parameters of the convolutional layer of (a);

c) for the new featureEigenvector

And

performing summation operation, and activating by adopting Sigmoid function to obtain channel attention diagram

The calculation formula is as follows:

wherein,

indicating that the Sigmoid-activated function,

is a summing operation;

d) based on the feature map

With the channel attention map

：

Wherein,

representing a matrix multiplication operation.

2. The method for hand-drawn sketch recognition based on attention mechanism as claimed in claim 1, wherein training a classification network for predicting vertical flipping of the hand-drawn sketch, inputting the original hand-drawn sketch into the classification network, and obtaining a vertical flipping spatial attention map comprises:

a) vertically overturning each hand-drawn sketch in the TU-Berlin hand-drawn sketch data set to obtain a vertically overturned sketch; setting a label of the hand-drawn sketch not vertically turned over to be 0, and setting a label of the hand-drawn sketch after vertical turning over to be 1, thereby constructing a second class data set containing the hand-drawn sketch not vertically turned over and the hand-drawn sketch after vertical turning over; selecting a residual error network as a classifier, and training a classification network for predicting vertical turning of the freehand sketch on the second class of data sets;

b) inputting the original hand-drawn sketch into the classification network, outputting a predicted class label t, and extracting a feature map output by the last convolution layer

Wherein

Defining the characteristic diagram

The feature map on the single channel is

，

The vertical flip space attention map is calculated as follows

：

Wherein,

it is shown that the multiplication operation is performed,

，

to the full connection layer corresponding to

In a channel

The weight of each category.

3. The method for identifying a hand-drawn sketch map based on an attention mechanism as claimed in claim 2, wherein the step of calculating the feature map after the attention optimization of the vertically flipped space by combining the feature map after the attention optimization of the channel and the vertically flipped space attention map comprises:

obtaining the feature map after the optimization based on the channel attention

With the vertically flipped space attention map

：

。