CN114897149A - Multitask multi-branch attention network structure - Google Patents

Multitask multi-branch attention network structure Download PDF

Info

Publication number
CN114897149A
CN114897149A CN202210705174.2A CN202210705174A CN114897149A CN 114897149 A CN114897149 A CN 114897149A CN 202210705174 A CN202210705174 A CN 202210705174A CN 114897149 A CN114897149 A CN 114897149A
Authority
CN
China
Prior art keywords
attention
branch
task
network
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210705174.2A
Other languages
Chinese (zh)
Inventor
范军俊
任晓宇
韩晓红
董于杰
王亮
冯晋首
朱鹏飞
李耀军
王丽娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi Qingzhong Technology Co ltd
Original Assignee
Shanxi Qingzhong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi Qingzhong Technology Co ltd filed Critical Shanxi Qingzhong Technology Co ltd
Priority to CN202210705174.2A priority Critical patent/CN114897149A/en
Publication of CN114897149A publication Critical patent/CN114897149A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multitask multi-branch attention network structure, belonging to the technical field of deep learning; the problem that the existing multi-task network structure can enable a model to process a plurality of tasks at the same time, but the characteristics required by each task are different, so that network negative migration is caused is solved; the technical scheme for solving the technical problems is as follows: the structure has three modules: the system comprises a multi-branch feature extraction network, an attention-based feature selection module and an attention-based multi-branch prediction module, wherein the multi-branch feature extraction and output network can extract feature graphs of pictures at different stages of the network, the feature selection module is used for weighting the feature graphs and providing task-related feature graphs for each task, and the attention-based multi-branch prediction module can integrate prediction results of the same task from different branches so as to improve the accuracy of the network; the invention is applied to image classification processing.

Description

Multitask multi-branch attention network structure
Technical Field
The invention provides a multitask multi-branch attention network structure, and belongs to the technical field of deep learning of computer technology.
Background
The convolutional neural network is a neural network which comprises convolution, pooling and activation function calculation and has a certain deep structure, and is one of representative algorithms in the deep learning field. At present, a large number of research examples prove that the method has strong expression in the fields of target classification, positioning and detection, and makes breakthrough progress in the field of target classification by means of multi-level feature learning and rich feature expression capability.
In recent years, another classical network model is created in the field of image classification, and *** lenet proposes an inclusion module to widen the network, so that the feature extraction capability of the network is enhanced from another direction; ResNet provides a residual structure, effectively relieves the problems of gradient disappearance, network degradation and the like, and the neural network can reach the depth of hundreds of layers; DensenNet proposes a dense connection concept, and connects the input of each layer before the network to all layers after the network, thereby achieving efficient feature multiplexing and simultaneously alleviating the problems of gradient disappearance and the like. In the field of object classification, many scholars apply the model to image classification, but most of the constructed models are single-task models, only one task can be performed at a time, and in real life, one picture generally needs to be judged by a plurality of tasks at the same time. In this case, it is usually necessary to train a plurality of single-task models and perform classification calculation for each task, which results in a huge amount of calculation and a slow detection speed.
At present, for the situation that a plurality of tasks exist in a picture, a plurality of multitask neural network frameworks exist, and multitask learning (MTL) is a comprehensive learning method and is realized by simultaneously training a plurality of tasks and sharing some parameters among the tasks. In a multitasking network, multiple tasks share a structure, and information of different tasks can be utilized. When the loss of all tasks tends to be flat, the structure is equivalent to fusing the information of all tasks. In general, multitasking networks have greater generalization capability than single-tasking networks. However, most multitasking networks require a strong connection between different tasks, which otherwise may result in a negative migration of the multitasking network. In practical applications, the precision is not high, and there are many limitations.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to solve the technical problems that: an improvement in a multitasking, multi-drop attention network architecture is provided.
In order to solve the technical problems, the invention adopts the technical scheme that: a multitasking, multi-drop attention network architecture comprising the following modules:
multi-branch feature extraction network: the system comprises a preprocessing unit, a feature extraction unit and a display unit, wherein the preprocessing unit is used for extracting features from a preprocessed image and dividing a network into a plurality of branches, and each branch outputs a feature map extracted by the feature extraction network at the current stage;
an attention-based feature selection module: weighting the channels of the feature map output by each branch feature extraction network by using the attention of the channels, and generating a task-related feature map for each task;
attention-based multi-branch prediction module: inputting each weighted feature map of channel attention into a full-connection layer for multi-task prediction, then integrating prediction results of the same task of different branches by using a branch attention module and an intra-task attention module, and finally outputting classification results of each task.
The feature extraction network uses a modified ResNet50 as a backbone network, the modified ResNet50 comprises 1 input part and 4 blocks in total, the input part comprises 3 tandem 3 × 3 convolutional layers, the blocks respectively comprise 3, 4, 6 and 3 layers, and each Layer comprises a 1 × 1 convolutional Layer, a batch normalization Layer, a modified linear unit Layer, a 3 × 3 convolutional Layer, a batch normalization Layer, a modified linear unit Layer, a 1 × 1 convolutional Layer, a batch normalization Layer and a modified linear unit Layer.
The attention-based feature selection module is represented by the following formula:
Figure DEST_PATH_IMAGE001
in the above formula:f i representing the original feature map output by the ith branch of the feature extraction network; AvgPool and MaxPool stand for pooling operations, MLP stands for multilayer perceptron with one hidden layer,
Figure 888326DEST_PATH_IMAGE002
representing the function of sigmoid and the function of,
Figure 673748DEST_PATH_IMAGE004
representing a specific task feature graph generated by the attention module on the ith branch.
The branch attention module assigns a weight to each branch by generating an attention mask, assigning different weights to different branches.
The calculation steps of the branch attention module for assigning different weights to different branches are as follows:
splicing the prediction results of each branch together, wherein the spliced results are presented in the form of a multi-channel feature map F;
performing 1 × n convolution operation on the graph F to obtain compression information of a plurality of branches;
the obtained compressed information is fed into the fully-connected layer, and the output of the fully-connected layer is then fed into the activation function to obtain the attention maskW 1
Attention maskW 1 Multiplying with the graph F to obtain a weighted graphF 1
The intra-task attention module generates different weights for the prediction results of the same subclass on different branches on the basis of the branch attention module.
The intra-task attention module calculates the weight as follows:
multi-channel map of branch attention module outputF 1 Is converted into a single channel mapF 2
For single channel chartF 2 Performing a k × 1 convolution operation to obtain an attention maskW 2
Attention maskW 2 And single channel diagram
Figure DEST_PATH_IMAGE005
And multiplying to obtain a weighted graph R, and adding the elements of the weighted graph R column by column to obtain the final classification result of the task.
Compared with the prior art, the invention has the beneficial effects that: the invention provides a multitask and multi-branch attention network structure, which comprises three modules: the system comprises a multi-branch feature extraction network, an attention-based feature selection module and an attention-based multi-branch prediction module. The multi-branch feature extraction and output network can extract feature graphs of pictures at different stages of the network, the feature selection module is used for weighting the feature graphs and providing task-related feature graphs for each task, and the attention-based multi-branch prediction module can integrate prediction results of the same task from different branches, so that the accuracy of the network is improved. Experiments show that compared with a single-task network, the network structure can utilize the characteristic information of different stages of the network, meanwhile, different tasks are mutually promoted, and the effect of the network structure is superior to that of the single-task network.
Drawings
The invention is further described below with reference to the accompanying drawings:
FIG. 1 is a schematic diagram of the overall network architecture of the present invention;
FIG. 2 is a schematic diagram of an internal structure of an attention-based feature selection module according to the present invention;
FIG. 3 is a diagram illustrating an internal structure of a multi-branch prediction module based on attention of the present invention.
Detailed Description
As shown in fig. 1 to 3, the problems to be solved by the present invention are: 1. the multitasking network generally provides the same characteristic diagram for all tasks, so that the multitasking network has the problem of negative migration. Reasons for negative migration may include: (1) different tasks require different stages of feature maps, some tasks require low-order image features, and some tasks require high-order image features. (2) The areas of interest in the same image are different for different tasks. 2. There is also a problem in the multi-branch structure how to assign an appropriate weight to the prediction result of each branch. For a particular task, it has different predicted outcomes on different branches. How to allow the network to utilize the prediction results of different branches in the multitask network in a reasonable mode and suppress branches with low prediction precision so as to obtain a better classification result is the problem to be solved by the invention.
The technical scheme adopted by the invention for solving the technical problem comprises the following three modules as shown in figure 1:
module 1: multi-branch feature extraction network
In order to solve the problem that different tasks require different stages of feature maps, the invention provides a multi-branch feature extraction network. As shown in fig. 2, the feature extraction network extracts features from the pre-processed image and divides the network into five branches, consisting of 1-5 branches. Each branch outputs the feature map extracted by the feature extraction network at the stage. Such a multi-branch structure can output the features extracted by the network at different stages to meet the requirements of different tasks.
The feature extraction network uses a modified ResNet50 as the backbone network, the modified ResNet50 contains 1 input part and 4 blocks in total. The input part consists of 3 tandem 3 × 3 convolutional layers, the Block consists of 3, 4, 6 and 3 layers respectively, and each Layer comprises a 1 × 1 convolutional Layer, a Batch Normalization Layer (BN), a modified Linear Unit Layer (ReLu), a 3 × 3 convolutional Layer, a Batch Normalization Layer, a modified Linear Unit Layer, a 1 × 1 convolutional Layer, a Batch Normalization Layer and a modified Linear Unit Layer.
The specific structure of the network is shown in table 1 below:
Figure 5897DEST_PATH_IMAGE006
table 1 feature extraction network architecture.
And (3) module 2: attention-based feature selection module
The module uses the channel attention to carry out weighting operation on the channels of the feature map output by each branch, and generates task-related feature maps for each task so as to improve the accuracy of the network.
The attention-based feature selection module may be represented by the following formula:
Figure 218572DEST_PATH_IMAGE001
in the above formula:f i representing the original feature map output by the ith branch of the feature extraction network; AvgPool and MaxPool stand for pooling operations, MLP stands for multilayer perceptron with one hidden layer,
Figure 610240DEST_PATH_IMAGE002
representative sigThe function of the moid is a function of,
Figure DEST_PATH_IMAGE007
representing a specific task feature graph generated by the attention module on the ith branch.
And a module 3: attention-based multi-branch prediction module
As shown in FIG. 3, the module inputs each weighted feature map of channel attention into the fully-connected layer for multi-task prediction, and then integrates the prediction results of the same task for different branches using the branch attention module and the intra-task attention module. Finally, the module outputs the classification results for each task.
A branch attention module: the branch attention module assigns a weight to each branch by generating an attention mask. By assigning different weights to different branches, the network can make reasonable use of the predicted results of different branches to pay more attention to the branches with better classification results, thereby improving the classification performance of the model. Calculated by the following steps:
(1) the prediction results of each branch are spliced together, and the spliced result is presented in the form of a feature diagram F with the channel number of 5.
(2) The convolution operation of 1 × n is performed on the graph F. This operation compresses the information of each branch, enabling an overall evaluation of the prediction outcome of each branch.
(3) The obtained compressed information is fed into a Full Connection (FC) layer, and then the output of the FC layer is fed into an activation function to obtain an attention maskW 1
(4) Attention maskW 1 Multiplying with the graph F to obtain a weighted graphF 1
An intra-task attention module: the branch attention module only evaluates the overall performance of the network in the branch classification, ignores the accuracy of the sub-category of the task on the branch, and designs the intra-task attention module in order to solve the problem. On the basis of the branch attention module, different weights are generated for the prediction results of the same subclass on different branches. This may improve the accuracy of the model. It can be calculated by the following steps:
(1) multi-channel map of branch attention module outputF 1 Is converted into a single channel mapF 2
(2) To enable the network to analyze the prediction results of the same subclass on different branches, the graph is checkedF 2 Performing a k × 1 convolution operation to obtain an attention maskW 2
(3) Attention maskW 2 And the drawingsF 2 The multiplication results in a weighted graph R, and the final classification result of this task is obtained by adding the elements of the weighted graph R column by column.
It should be noted that, regarding the specific structure of the present invention, the connection relationship between the modules adopted in the present invention is determined and can be realized, except for the specific description in the embodiment, the specific connection relationship can bring the corresponding technical effect, and the technical problem proposed by the present invention is solved on the premise of not depending on the execution of the corresponding software program.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. A multitasking, multi-drop attention network architecture characterized by: the system comprises the following modules:
multi-branch feature extraction network: the system comprises a preprocessing unit, a feature extraction unit and a display unit, wherein the preprocessing unit is used for extracting features from a preprocessed image and dividing a network into a plurality of branches, and each branch outputs a feature map extracted by the feature extraction network at the current stage;
an attention-based feature selection module: weighting the channels of the feature map output by each branch feature extraction network by using the attention of the channels, and generating a task-related feature map for each task;
attention-based multi-branch prediction module: inputting each weighted feature map of channel attention into a full-connection layer for multi-task prediction, then integrating prediction results of the same task of different branches by using a branch attention module and an intra-task attention module, and finally outputting classification results of each task.
2. The multi-tasking multi-drop attention network structure of claim 1, wherein: the feature extraction network uses a modified ResNet50 as a backbone network, the modified ResNet50 comprises 1 input part and 4 blocks in total, the input part comprises 3 tandem 3 × 3 convolutional layers, the blocks respectively comprise 3, 4, 6 and 3 layers, and each Layer comprises a 1 × 1 convolutional Layer, a batch normalization Layer, a modified linear unit Layer, a 3 × 3 convolutional Layer, a batch normalization Layer, a modified linear unit Layer, a 1 × 1 convolutional Layer, a batch normalization Layer and a modified linear unit Layer.
3. The multitasking, multi-branch attention network architecture of claim 1, wherein: the attention-based feature selection module is represented by the following formula:
Figure DEST_PATH_IMAGE002
in the above formula:f i representing the original feature map output by the ith branch of the feature extraction network; AvgPool and MaxPool represent pooling operations,MLP stands for multi-layer perceptron with one hidden layer,
Figure DEST_PATH_IMAGE004
represents the function of the sigmoid and is,
Figure DEST_PATH_IMAGE006
representing a specific task feature graph generated by the attention module on the ith branch.
4. The multi-tasking multi-drop attention network structure of claim 1, wherein: the branch attention module assigns a weight to each branch by generating an attention mask, assigning different weights to different branches.
5. The multi-tasking multi-drop attention network structure of claim 4, wherein: the calculation steps of the branch attention module for assigning different weights to different branches are as follows:
splicing the prediction results of each branch together, wherein the spliced results are presented in the form of a multi-channel feature map F;
performing 1 × n convolution operation on the graph F to obtain compression information of a plurality of branches;
the obtained compressed information is fed into the fully-connected layer, and the output of the fully-connected layer is then fed into the activation function to obtain the attention maskW 1
Attention maskW 1 Multiplying with the graph F to obtain a weighted graphF 1
6. The multi-tasking multi-drop attention network structure of claim 1, wherein: the intra-task attention module generates different weights for the prediction results of the same subclass on different branches on the basis of the branch attention module.
7. The multi-tasking multi-drop attention network structure of claim 6, wherein: the intra-task attention module calculates the weight as follows:
multi-channel map of branch attention module outputF 1 Is converted into a single channel mapF 2
For single channel chartF 2 Performing a k × 1 convolution operation to obtain an attention maskW 2
Attention maskW 2 And single channel diagram
Figure DEST_PATH_IMAGE008
And multiplying to obtain a weighted graph R, and adding the elements of the weighted graph R column by column to obtain the final classification result of the task.
CN202210705174.2A 2022-06-21 2022-06-21 Multitask multi-branch attention network structure Pending CN114897149A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210705174.2A CN114897149A (en) 2022-06-21 2022-06-21 Multitask multi-branch attention network structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210705174.2A CN114897149A (en) 2022-06-21 2022-06-21 Multitask multi-branch attention network structure

Publications (1)

Publication Number Publication Date
CN114897149A true CN114897149A (en) 2022-08-12

Family

ID=82727122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210705174.2A Pending CN114897149A (en) 2022-06-21 2022-06-21 Multitask multi-branch attention network structure

Country Status (1)

Country Link
CN (1) CN114897149A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794357A (en) * 2023-01-16 2023-03-14 山西清众科技股份有限公司 Device and method for automatically building multi-task network
CN117557397A (en) * 2023-12-08 2024-02-13 广州德威生物科技有限公司 Method and system for controlling disinfection based on intelligent AI monitoring of warehouse pests

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794357A (en) * 2023-01-16 2023-03-14 山西清众科技股份有限公司 Device and method for automatically building multi-task network
CN117557397A (en) * 2023-12-08 2024-02-13 广州德威生物科技有限公司 Method and system for controlling disinfection based on intelligent AI monitoring of warehouse pests
CN117557397B (en) * 2023-12-08 2024-06-11 广州德威生物科技有限公司 Method and system for controlling disinfection based on intelligent AI monitoring of warehouse pests

Similar Documents

Publication Publication Date Title
JP6980958B1 (en) Rural area classification garbage identification method based on deep learning
CN111967468B (en) Implementation method of lightweight target detection neural network based on FPGA
CN114897149A (en) Multitask multi-branch attention network structure
CN111242844B (en) Image processing method, device, server and storage medium
Zhang et al. Lightweight and efficient asymmetric network design for real-time semantic segmentation
CN110084274A (en) Realtime graphic semantic segmentation method and system, readable storage medium storing program for executing and terminal
WO2021103731A1 (en) Semantic segmentation method, and model training method and apparatus
Yu et al. Real-time object detection towards high power efficiency
CN111210432A (en) Image semantic segmentation method based on multi-scale and multi-level attention mechanism
CN111985597B (en) Model compression method and device
CN117033609B (en) Text visual question-answering method, device, computer equipment and storage medium
CN110222607A (en) The method, apparatus and system of face critical point detection
CN115222950A (en) Lightweight target detection method for embedded platform
CN112288087A (en) Neural network pruning method and device, electronic equipment and storage medium
CN112130805B (en) Chip comprising floating point adder, device and control method of floating point operation
CN112528904A (en) Image segmentation method for sand particle size detection system
CN116580184A (en) YOLOv 7-based lightweight model
CN111368707A (en) Face detection method, system, device and medium based on feature pyramid and dense block
CN114972780A (en) Lightweight target detection network based on improved YOLOv5
WO2022001364A1 (en) Method for extracting data features, and related apparatus
CN112711985A (en) Fruit identification method and device based on improved SOLO network and fruit picking robot
Yang et al. Research and Implementation of Embedded Real-time Target Detection Algorithm Based on Deep Learning
CN116595133A (en) Visual question-answering method based on stacked attention and gating fusion
CN116524180A (en) Dramatic stage scene segmentation method based on lightweight backbone structure
CN115794357A (en) Device and method for automatically building multi-task network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination