CN114897149A

CN114897149A - Multitask multi-branch attention network structure

Info

Publication number: CN114897149A
Application number: CN202210705174.2A
Authority: CN
Inventors: 范军俊; 任晓宇; 韩晓红; 董于杰; 王亮; 冯晋首; 朱鹏飞; 李耀军; 王丽娜
Original assignee: Shanxi Qingzhong Technology Co ltd
Current assignee: Shanxi Qingzhong Technology Co ltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-08-12

Abstract

The invention provides a multitask multi-branch attention network structure, belonging to the technical field of deep learning; the problem that the existing multi-task network structure can enable a model to process a plurality of tasks at the same time, but the characteristics required by each task are different, so that network negative migration is caused is solved; the technical scheme for solving the technical problems is as follows: the structure has three modules: the system comprises a multi-branch feature extraction network, an attention-based feature selection module and an attention-based multi-branch prediction module, wherein the multi-branch feature extraction and output network can extract feature graphs of pictures at different stages of the network, the feature selection module is used for weighting the feature graphs and providing task-related feature graphs for each task, and the attention-based multi-branch prediction module can integrate prediction results of the same task from different branches so as to improve the accuracy of the network; the invention is applied to image classification processing.

Description

Multitask multi-branch attention network structure

Technical Field

The invention provides a multitask multi-branch attention network structure, and belongs to the technical field of deep learning of computer technology.

Background

The convolutional neural network is a neural network which comprises convolution, pooling and activation function calculation and has a certain deep structure, and is one of representative algorithms in the deep learning field. At present, a large number of research examples prove that the method has strong expression in the fields of target classification, positioning and detection, and makes breakthrough progress in the field of target classification by means of multi-level feature learning and rich feature expression capability.

In recent years, another classical network model is created in the field of image classification, and *** lenet proposes an inclusion module to widen the network, so that the feature extraction capability of the network is enhanced from another direction; ResNet provides a residual structure, effectively relieves the problems of gradient disappearance, network degradation and the like, and the neural network can reach the depth of hundreds of layers; DensenNet proposes a dense connection concept, and connects the input of each layer before the network to all layers after the network, thereby achieving efficient feature multiplexing and simultaneously alleviating the problems of gradient disappearance and the like. In the field of object classification, many scholars apply the model to image classification, but most of the constructed models are single-task models, only one task can be performed at a time, and in real life, one picture generally needs to be judged by a plurality of tasks at the same time. In this case, it is usually necessary to train a plurality of single-task models and perform classification calculation for each task, which results in a huge amount of calculation and a slow detection speed.

At present, for the situation that a plurality of tasks exist in a picture, a plurality of multitask neural network frameworks exist, and multitask learning (MTL) is a comprehensive learning method and is realized by simultaneously training a plurality of tasks and sharing some parameters among the tasks. In a multitasking network, multiple tasks share a structure, and information of different tasks can be utilized. When the loss of all tasks tends to be flat, the structure is equivalent to fusing the information of all tasks. In general, multitasking networks have greater generalization capability than single-tasking networks. However, most multitasking networks require a strong connection between different tasks, which otherwise may result in a negative migration of the multitasking network. In practical applications, the precision is not high, and there are many limitations.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention aims to solve the technical problems that: an improvement in a multitasking, multi-drop attention network architecture is provided.

In order to solve the technical problems, the invention adopts the technical scheme that: a multitasking, multi-drop attention network architecture comprising the following modules:

multi-branch feature extraction network: the system comprises a preprocessing unit, a feature extraction unit and a display unit, wherein the preprocessing unit is used for extracting features from a preprocessed image and dividing a network into a plurality of branches, and each branch outputs a feature map extracted by the feature extraction network at the current stage;

an attention-based feature selection module: weighting the channels of the feature map output by each branch feature extraction network by using the attention of the channels, and generating a task-related feature map for each task;

attention-based multi-branch prediction module: inputting each weighted feature map of channel attention into a full-connection layer for multi-task prediction, then integrating prediction results of the same task of different branches by using a branch attention module and an intra-task attention module, and finally outputting classification results of each task.

The feature extraction network uses a modified ResNet50 as a backbone network, the modified ResNet50 comprises 1 input part and 4 blocks in total, the input part comprises 3 tandem 3 × 3 convolutional layers, the blocks respectively comprise 3, 4, 6 and 3 layers, and each Layer comprises a 1 × 1 convolutional Layer, a batch normalization Layer, a modified linear unit Layer, a 3 × 3 convolutional Layer, a batch normalization Layer, a modified linear unit Layer, a 1 × 1 convolutional Layer, a batch normalization Layer and a modified linear unit Layer.

The attention-based feature selection module is represented by the following formula:

；

in the above formula:f _i representing the original feature map output by the ith branch of the feature extraction network; AvgPool and MaxPool stand for pooling operations, MLP stands for multilayer perceptron with one hidden layer,

representing the function of sigmoid and the function of,

representing a specific task feature graph generated by the attention module on the ith branch.

The branch attention module assigns a weight to each branch by generating an attention mask, assigning different weights to different branches.

The calculation steps of the branch attention module for assigning different weights to different branches are as follows:

splicing the prediction results of each branch together, wherein the spliced results are presented in the form of a multi-channel feature map F;

performing 1 × n convolution operation on the graph F to obtain compression information of a plurality of branches;

the obtained compressed information is fed into the fully-connected layer, and the output of the fully-connected layer is then fed into the activation function to obtain the attention maskW ₁ ；

Attention maskW ₁ Multiplying with the graph F to obtain a weighted graphF ₁ 。

The intra-task attention module generates different weights for the prediction results of the same subclass on different branches on the basis of the branch attention module.

The intra-task attention module calculates the weight as follows:

multi-channel map of branch attention module outputF ₁ Is converted into a single channel mapF ₂ ；

For single channel chartF ₂ Performing a k × 1 convolution operation to obtain an attention maskW ₂ ；

Attention maskW ₂ And single channel diagram

And multiplying to obtain a weighted graph R, and adding the elements of the weighted graph R column by column to obtain the final classification result of the task.

Compared with the prior art, the invention has the beneficial effects that: the invention provides a multitask and multi-branch attention network structure, which comprises three modules: the system comprises a multi-branch feature extraction network, an attention-based feature selection module and an attention-based multi-branch prediction module. The multi-branch feature extraction and output network can extract feature graphs of pictures at different stages of the network, the feature selection module is used for weighting the feature graphs and providing task-related feature graphs for each task, and the attention-based multi-branch prediction module can integrate prediction results of the same task from different branches, so that the accuracy of the network is improved. Experiments show that compared with a single-task network, the network structure can utilize the characteristic information of different stages of the network, meanwhile, different tasks are mutually promoted, and the effect of the network structure is superior to that of the single-task network.

Drawings

The invention is further described below with reference to the accompanying drawings:

FIG. 1 is a schematic diagram of the overall network architecture of the present invention;

FIG. 2 is a schematic diagram of an internal structure of an attention-based feature selection module according to the present invention;

FIG. 3 is a diagram illustrating an internal structure of a multi-branch prediction module based on attention of the present invention.

Detailed Description

As shown in fig. 1 to 3, the problems to be solved by the present invention are: 1. the multitasking network generally provides the same characteristic diagram for all tasks, so that the multitasking network has the problem of negative migration. Reasons for negative migration may include: (1) different tasks require different stages of feature maps, some tasks require low-order image features, and some tasks require high-order image features. (2) The areas of interest in the same image are different for different tasks. 2. There is also a problem in the multi-branch structure how to assign an appropriate weight to the prediction result of each branch. For a particular task, it has different predicted outcomes on different branches. How to allow the network to utilize the prediction results of different branches in the multitask network in a reasonable mode and suppress branches with low prediction precision so as to obtain a better classification result is the problem to be solved by the invention.

The technical scheme adopted by the invention for solving the technical problem comprises the following three modules as shown in figure 1:

module 1: multi-branch feature extraction network

In order to solve the problem that different tasks require different stages of feature maps, the invention provides a multi-branch feature extraction network. As shown in fig. 2, the feature extraction network extracts features from the pre-processed image and divides the network into five branches, consisting of 1-5 branches. Each branch outputs the feature map extracted by the feature extraction network at the stage. Such a multi-branch structure can output the features extracted by the network at different stages to meet the requirements of different tasks.

The feature extraction network uses a modified ResNet50 as the backbone network, the modified ResNet50 contains 1 input part and 4 blocks in total. The input part consists of 3 tandem 3 × 3 convolutional layers, the Block consists of 3, 4, 6 and 3 layers respectively, and each Layer comprises a 1 × 1 convolutional Layer, a Batch Normalization Layer (BN), a modified Linear Unit Layer (ReLu), a 3 × 3 convolutional Layer, a Batch Normalization Layer, a modified Linear Unit Layer, a 1 × 1 convolutional Layer, a Batch Normalization Layer and a modified Linear Unit Layer.

The specific structure of the network is shown in table 1 below:

table 1 feature extraction network architecture.

And (3) module 2: attention-based feature selection module

The module uses the channel attention to carry out weighting operation on the channels of the feature map output by each branch, and generates task-related feature maps for each task so as to improve the accuracy of the network.

The attention-based feature selection module may be represented by the following formula:

；

representative sigThe function of the moid is a function of,

And a module 3: attention-based multi-branch prediction module

As shown in FIG. 3, the module inputs each weighted feature map of channel attention into the fully-connected layer for multi-task prediction, and then integrates the prediction results of the same task for different branches using the branch attention module and the intra-task attention module. Finally, the module outputs the classification results for each task.

A branch attention module: the branch attention module assigns a weight to each branch by generating an attention mask. By assigning different weights to different branches, the network can make reasonable use of the predicted results of different branches to pay more attention to the branches with better classification results, thereby improving the classification performance of the model. Calculated by the following steps:

(1) the prediction results of each branch are spliced together, and the spliced result is presented in the form of a feature diagram F with the channel number of 5.

(2) The convolution operation of 1 × n is performed on the graph F. This operation compresses the information of each branch, enabling an overall evaluation of the prediction outcome of each branch.

(3) The obtained compressed information is fed into a Full Connection (FC) layer, and then the output of the FC layer is fed into an activation function to obtain an attention maskW ₁ 。

(4) Attention maskW ₁ Multiplying with the graph F to obtain a weighted graphF ₁ 。

An intra-task attention module: the branch attention module only evaluates the overall performance of the network in the branch classification, ignores the accuracy of the sub-category of the task on the branch, and designs the intra-task attention module in order to solve the problem. On the basis of the branch attention module, different weights are generated for the prediction results of the same subclass on different branches. This may improve the accuracy of the model. It can be calculated by the following steps:

(1) multi-channel map of branch attention module outputF ₁ Is converted into a single channel mapF ₂ 。

(2) To enable the network to analyze the prediction results of the same subclass on different branches, the graph is checkedF ₂ Performing a k × 1 convolution operation to obtain an attention maskW ₂ 。

(3) Attention maskW ₂ And the drawingsF ₂ The multiplication results in a weighted graph R, and the final classification result of this task is obtained by adding the elements of the weighted graph R column by column.

It should be noted that, regarding the specific structure of the present invention, the connection relationship between the modules adopted in the present invention is determined and can be realized, except for the specific description in the embodiment, the specific connection relationship can bring the corresponding technical effect, and the technical problem proposed by the present invention is solved on the premise of not depending on the execution of the corresponding software program.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A multitasking, multi-drop attention network architecture characterized by: the system comprises the following modules:

2. The multi-tasking multi-drop attention network structure of claim 1, wherein: the feature extraction network uses a modified ResNet50 as a backbone network, the modified ResNet50 comprises 1 input part and 4 blocks in total, the input part comprises 3 tandem 3 × 3 convolutional layers, the blocks respectively comprise 3, 4, 6 and 3 layers, and each Layer comprises a 1 × 1 convolutional Layer, a batch normalization Layer, a modified linear unit Layer, a 3 × 3 convolutional Layer, a batch normalization Layer, a modified linear unit Layer, a 1 × 1 convolutional Layer, a batch normalization Layer and a modified linear unit Layer.

3. The multitasking, multi-branch attention network architecture of claim 1, wherein: the attention-based feature selection module is represented by the following formula:

；

in the above formula:f _i representing the original feature map output by the ith branch of the feature extraction network; AvgPool and MaxPool represent pooling operations,MLP stands for multi-layer perceptron with one hidden layer,

represents the function of the sigmoid and is,

4. The multi-tasking multi-drop attention network structure of claim 1, wherein: the branch attention module assigns a weight to each branch by generating an attention mask, assigning different weights to different branches.

5. The multi-tasking multi-drop attention network structure of claim 4, wherein: the calculation steps of the branch attention module for assigning different weights to different branches are as follows:

6. The multi-tasking multi-drop attention network structure of claim 1, wherein: the intra-task attention module generates different weights for the prediction results of the same subclass on different branches on the basis of the branch attention module.

7. The multi-tasking multi-drop attention network structure of claim 6, wherein: the intra-task attention module calculates the weight as follows:

Attention maskW ₂ And single channel diagram