CN113537365B

CN113537365B - Information entropy dynamic weighting-based multi-task learning self-adaptive balancing method

Info

Publication number: CN113537365B
Application number: CN202110820646.4A
Authority: CN
Inventors: 王玉峰; 丁文锐; 肖京
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2024-02-06
Anticipated expiration: 2041-07-20
Also published as: CN113537365A

Abstract

The invention discloses a multi-task learning method based on information entropy dynamic weighting, and belongs to the technical field of machine learning. Firstly, an initial multi-task learning model M is built, model inference is carried out on an input image to obtain a plurality of task output graphs, normalization processing is carried out on the input image to obtain a corresponding normalized probability graph; then, calculating a fixed weight multi-task loss function by utilizing each normalized probability map, and performing preliminary training on the multi-task learning model M; and finally, constructing a final self-adaptive multi-task loss function through an information entropy dynamic weighting algorithm on the basis of the preliminarily trained multi-task learning model M, performing iterative optimization training on the preliminarily trained multi-task learning model until the multi-task learning model reaches convergence, and stopping training to obtain an optimized multi-task learning model M1. The method can effectively cope with different types of tasks, adaptively balance the relative importance of each task, and has strong algorithm applicability, simplicity and high efficiency.

Description

Information entropy dynamic weighting-based multi-task learning self-adaptive balancing method

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to a multi-task learning self-adaptive balancing method based on information entropy dynamic weighting.

Background

Machine learning improves the performance of computer algorithms through empirical knowledge to achieve intelligent autonomous learning work, and is one of core technologies of artificial intelligence. Machine learning techniques typically require a large number of learning samples, and more recently, popular deep learning models typically require a large number of labeled samples to train the network. However, in many applications, certain task labels of training samples are difficult to collect or are labor and time consuming to manually annotate. In this case, multitasking learning may be utilized to maximize the utilization of the limited training samples in each task.

The multi-task learning aims to jointly learn a plurality of related tasks to improve the generalization performance of each task, and is widely applied to the fields of natural language processing, computer vision and the like. Each of which may be a general learning task such as a supervised task (e.g., classification or regression problem), an unsupervised task (e.g., clustering problem), a reinforcement learning task, or a multi-view learning task, etc.

In recent years, deep learning greatly improves the performance of various computer vision tasks, while multi-task learning jointly learns a plurality of tasks in one model so as to obtain better generalization performance and lower memory occupation, and the combination of the two, namely the deep multi-task learning research, has greatly progressed. However, the following problems still exist in deep multitasking learning: (1) The information exchange between different subtasks is insufficient and the advantage of multi-task learning is difficult to fully develop; (2) The loss function of most existing MTL studies is typically derived from the loss linear weighting of subtasks, which rely on human experience and lack adaptability.

Current deep multitasking studies focus mainly on the design of network architecture and optimization strategies:

in network structure research, there are two main ways of performing a multi-task learning mechanism in a deep neural network, namely hard parameter sharing and soft parameter sharing. Where hard parameter sharing typically shares a hidden layer among all tasks while preserving multiple task-specific output layers. Because the more tasks that are learned at the same time, the more the model needs to find expressions that are applicable to all tasks, hard parameter sharing greatly reduces the risk of overfitting. On the other hand, in soft parameter sharing, each task has its own model and corresponding parameters, and then regularized adjustment is performed on the model parameter distance to increase the phase degree of the parameters.

In the research of optimization strategies, most multi-task learning related works simply set the weights of the tasks to be fixed proportion, but the mode is seriously dependent on human experience, and in some cases, improper weights can lead to that some subtasks cannot work normally. Therefore, unlike the structure of designing a multi-tasking model, another part of research is focused on balancing the impact of different tasks on the network, including the researches of uncertainty weights, gradient normalization algorithms, dynamic weight averaging strategies, and the like.

In summary, since the multitasking model includes multiple learning tasks, how to adaptively balance the importance among different tasks has important research significance.

Disclosure of Invention

In order to improve generalization of a multi-task learning model, the invention designs a multi-task learning self-adaptive balance method based on information entropy dynamic weighting on a model optimization strategy through the characteristics of different tasks and the demand analysis of multi-task model application, namely, the relative weight of each task loss function is dynamically adjusted in the model training process, so that self-adaptive training and accurate prediction of the multi-task learning model are realized.

The information entropy dynamic weighting-based multi-task learning self-adaptive balancing method comprises the following specific steps:

firstly, constructing a multi-task learning model M, and carrying out model inference and normalization processing on an input image through the current multi-task learning model M to obtain different types of normalized probability diagrams;

the initial multi-task learning model M contains one shared encoder and three task-specific decoders.

The multi-task learning model M carries out model inference on an input image to generate three pixel-level task outputs which are respectively semantic segmentation output graphs P _s Depth estimation output map P _d And an edge detection output map P _b Respectively carrying out normalization processing on each task output graph to obtain different types of normalized probability graphs, wherein the normalization probability graphs specifically comprise:

1) Semantic segmentation output map P _s By usingProcessing the softmax function to obtain a normalized semantic segmentation probability map:

wherein M is the total category number of semantic segmentation, i represents the i-th layer semantic category in the predictive picture, and P _s，i Outputting a map P for the model _s Is divided into numerical graphs by the i-th layer of the semanteme, and P' _s，i Then represents normalized i-th layer semantic segmentation probability map P' _s 。

2) Edge detection output map P _b Processing by adopting a sigmoid function to obtain a normalized edge detection probability map P' _b ：

3) Depth estimation output map P _d Converting the depth regression task into a classification task by utilizing a logarithmic space discretization strategy, and obtaining a normalized depth classification probability map by adopting a softmax function;

firstly, discrete division of depth values of a continuous space into K categories corresponding to K subintervals is performed by adopting a logarithmic space discretization strategy;

the method comprises the following steps: depth value interval [ D ] ₁ ，D ₂ ]Mapping to [ D ] ₁ +1，D ₂ +1]Is marked as [ D ]' ₁ ，D′ ₂ ]And according to the discretized depth threshold d _k Dividing to obtain K sub-intervals { [ d ] ₀ ，d ₁ ]，[d ₁ ，d ₂ ]，...，[d _K-1 ，d _K ]}。

Discretized depth threshold d _k The definition is as follows:

then discretizing the depth estimation truth value into a depth classification truth value according to the strategy;

i.e. when the depth estimation truth value is at [ d ] _k-1 ，d _k ]Class k is assigned to it and the deep task branches are trained with deep classification truth values.

Finally, obtaining a depth classification predictive graph in a training stage, and processing by adopting a softmax function to obtain a normalized depth classification probability graph P' _d，k ；

The depth classification probability map is:

wherein K is the total category number of depth classification, K represents the kth depth category, and P _d，k Representing a k-th layer depth classification prediction graph, P' _d，k And (5) representing a normalized k-th layer depth classification probability map.

Calculating a multi-task loss function by using the normalized probability map, and performing preliminary training on the current multi-task learning model M;

the method comprises the following steps:

firstly, calculating the loss corresponding to each type of normalized probability map by adopting a cross entropy function;

cross entropy loss function L _t The method comprises the following steps:

wherein y is _t A supervision category label is provided for the one-hot form corresponding to each task; t is s, d or b, i.e. P' _t Is a normalized probability map of semantic segmentation, edge detection or depth estimation tasks; c is the total category number corresponding to each task, and i represents the category of the ith layer in the predictive diagram.

Then, an equal weight addition and multitasking loss function L is constructed according to the fixed weight of each task _mtl The method comprises the following steps:

finally, the multi-task loss function L is utilized _mtl And carrying out gradient back transmission and parameter updating of the network model, and carrying out iterative training to obtain the preliminarily trained multi-task learning model.

Thirdly, constructing a final self-adaptive multi-task loss function L 'by utilizing an information entropy dynamic weighting algorithm on the basis of a multi-task learning model M of preliminary training' _mtl 。

The method comprises the following steps:

first, the information entropy value E of each task is calculated by using each category multi-layer probability map _t ：

Wherein W and H are the probability map row and column coordinates, respectively, and W and H are the maximum values of the probability map row and column lengths, respectively; and C is the channel value number of the probability graph, and C is the total number of categories corresponding to each task.

Then, the relative weight w of each task is distributed by utilizing the information entropy value _t ；

Relative weight w _t The method comprises the following steps:

when the prediction result of the task is worse, the uncertainty of the output probability map is higher, and the entropy value of the corresponding information is also larger. Therefore, a large weight is assigned to the task with poor prediction performance, and the model can be used for training the corresponding task in a weight mode.

Finally, according to the relative weight of each task and the cross entropy loss function L _t And constructing a final self-adaptive multi-task loss function in a weighted summation mode.

Final adaptive multitasking loss function L' _mtl The method comprises the following steps:

step four, utilizing a final self-adaptive multitasking loss function L' _mtl Performing back propagation to obtain a parameter gradient of the current multi-task learning model M, and updating the parameter of the current multi-task learning model M by using a gradient descent algorithm to complete one-time iterative training;

and step five, after the iterative training is completed, a new multi-task learning model M1 is obtained, the next iteration is carried out in the step three until the multi-task learning model M1 reaches convergence, and the training is terminated.

The invention has the advantages that:

(1) The multi-task learning self-adaptive balancing method based on the information entropy dynamic weighting adopts the discretization strategy to convert the regression task into the classification task, can effectively cope with different types of tasks, and has strong algorithm applicability;

(2) According to the multi-task learning self-adaptive balance method based on the information entropy dynamic weighting, the information entropy is calculated by utilizing the predictive graph output by the task, the model structural design or the parameter updating process is not required to be changed, and the method is concise, efficient and plug-and-play;

(3) According to the multi-task learning self-adaptive balancing method based on the information entropy dynamic weighting, the weight of the task loss function is dynamically adjusted based on the information entropy value, the relative importance of each task can be self-adaptively balanced, and therefore overall task performance is improved.

(4) The self-adaptive balance method for multi-task learning based on information entropy dynamic weighting can effectively extract general shared features and task specific features of a model and quickly and uniformly complete training of the multi-task learning model.

Drawings

FIG. 1 is an overall flow chart of the method for adaptive balancing of multi-task learning based on dynamic weighting of information entropy of the present invention;

FIG. 2 is a schematic diagram of a multi-task learning model in accordance with the present invention;

FIG. 3 is a schematic representation of the discretization of the regression task in the present invention.

Detailed Description

The following describes the implementation method of the present invention in further detail by taking a multi-task learning network for realizing semantic segmentation, depth estimation and edge detection in combination in computer vision as an example in combination with the accompanying drawings.

The invention provides a multi-task learning self-adaptive balance method based on information entropy dynamic weighting, which adopts staged training, firstly uses a multi-task loss function with fixed weight to perform pre-training, and then uses a self-adaptive multi-task loss function with dynamic weighting to perform dynamic training. In the model training process, the information entropy algorithm can effectively evaluate the prediction result of each task, and the relative weight of the task is adjusted through a dynamic weighting strategy, so that the multi-task prediction model is more concerned with the task with relatively poor performance, and the self-adaptive balance learning of the performance of different tasks is realized.

The invention relates to a multi-task learning self-adaptive balancing method based on information entropy dynamic weighting, which is shown in figure 1 and comprises the following steps:

initializing network parameters, and training to obtain an initial multi-task learning model.

A multi-task learning network model based on a single encoder-multiple decoder is constructed, as shown in fig. 2, specifically:

the encoder contains network parameters shared by all tasks and is initialized with a skeletal network (e.g., resNet) pre-trained on ImageNet. The decoders contain task-specific network parameters, each task corresponds to one decoder, and a random parameter initialization mode is adopted. In this embodiment, three tasks are set to be solved: semantic segmentation, depth estimation and edge detection, the multi-task learning model contains one shared encoder and three task-specific decoders.

After the three tasks are output by the decoder, three cross entropy losses L are obtained ₁ 、L ₂ And L ₃ The relative weight w corresponding to each task ₁ 、w ₂ And w ₃ The cross entropy loss is summed by weighting to obtain a multi-task loss function L _mtl ：

L _mtl ＝w ₁ L ₁ +w ₂ L ₂ +w ₃ L ₃

Secondly, carrying out model inference and normalization processing on the input image through a multi-task learning model to obtain different types of normalized probability maps;

the multi-task learning model carries out model inference on an input image to generate three pixel-level task outputs, which are respectively used for outputting a graph P for semantic segmentation _s Depth estimation output map P _d And an edge detection output map P _b Normalizing each task output graph to obtain different types of normalized probability graphs, wherein the normalized probability graphs specifically comprise:

1) Semantic segmentation output map P _s Processing by adopting a softmax function to obtain a normalized multi-classification semantic segmentation probability map:

s is the total category number of semantic segmentation, i represents the ith semantic category in the predictive picture, and P _s，i Outputting a map P for the model _s Is divided into numerical graphs by the i-th layer of the semanteme, and P' _s，i The normalized i-th layer semantic segmentation probability map is represented.

2) Edge detection output map P _b Processing by adopting a sigmoid function (equivalent to a binary class softmax function) to obtain a normalized edge detection probability map P' _b ：

first, as shown in fig. 3, a logarithmic space discretization strategy is adopted to discretely divide depth values of a continuous space into K sub-intervals corresponding to K categories, specifically:

depth value interval [ D ] ₁ ，D ₂ ]Mapping to [ D ] ₁ +1，D ₂ +1]Is marked as [ D ]' ₁ ，D′ ₂ ]And according to the discretized depth threshold d _k Dividing to obtain K sub-intervals { [ d ] ₀ ，d ₁ ]，[d ₁ ，d ₂ ]，...，[d _K-1 ，d _K ]}。

Discretized depth threshold d _k The definition is as follows:

then discretizing the depth estimation truth value into depth classification truth value according to the strategy, namely when the depth estimation truth value is in [ d ] _k-1 ，d _k ]Class k is assigned to it and the deep task branches are trained with deep classification truth values.

The depth classification probability map is:

In an embodiment of the present invention, discretization of depth estimation is performed taking k=80. The supervised truth of the depth branches is in the form of classification, so the depth estimation task is here also trained directly in the form of depth classification.

Step three, performing preliminary training on the multi-task learning model;

since the error of each task result predicted by the initialization model is larger and unstable, a multitasking network model needs to be initially trained, specifically:

firstly, calculating the loss corresponding to each type of normalized probability map by adopting a cross entropy function:

wherein y is _t A supervision category label is provided for the one-hot form corresponding to each task; t corresponds to each task in step one and can be expressed as s, d or b, i.e. P' _t Is a normalized probability map of semantic segmentation, edge detection or depth estimation tasks; c is the total category number corresponding to each task, and i represents the category of the ith layer in the predictive diagram.

Second, construct an equal weight addition and multitask loss function L _mtl The method comprises the following steps:

during the preliminary training process, the loss function of each task is given an equal fixed weight.

Then, the multi-tasking loss function L is utilized _mtl And carrying out gradient back transmission and parameter updating of the network model, and carrying out preliminary task prediction on the multi-task learning model obtained after training for a certain number of iterations.

And step four, constructing a self-adaptive multi-task loss function by utilizing an information entropy dynamic weighting algorithm on the basis of the multi-task learning model obtained through preliminary training, and further optimizing and training the multi-task learning model.

The method comprises the following steps:

first, the information entropy value E of each task is calculated by using each type of multi-layer probability map _t ：

Wherein W and H are the probability map row and column coordinates, respectively, and W and H are the maximum values of the probability map row and column lengths, respectively; c is the channel number of the probability graph, and C is the total number of the corresponding classes of each task;

then, the relative weight w of each task is distributed by utilizing the information entropy value _t ：

The information entropy can reflect the uncertainty of the predictive probability map, so the information entropy of the task output probability map can be utilized to assign relative weights:

when the prediction result of the task is worse, the uncertainty of the output probability map is higher, and the entropy value of the corresponding information is also larger. Therefore, tasks with poor prediction performance are assigned a larger weight so that the model is stressed to train the corresponding tasks.

Finally, according to the relative weight of each task and the cross entropy loss function L _t And constructing an overall self-adaptive multitask loss function in a weighted summation mode.

Overall adaptive multitasking loss function L' _mtl The method comprises the following steps:

step five, utilizing the integral self-adaptive multitasking loss function L' _mtl Performing back propagation to obtain a model parameter gradient, and then updating model parameters by using a gradient descent algorithm to complete one-time iterative training;

and step six, after the model parameters are updated, a new multi-task learning model is obtained. And returning to the fourth step for the next iteration until the multi-task learning model reaches convergence, and ending training.

Because the prediction performance of each task can be changed after each network parameter is updated, the corresponding relative weight can be changed dynamically, and thus the self-adaptive adjustment of the loss function in the network model training is realized.

The above embodiment only describes three specific tasks of semantic segmentation, depth estimation and edge detection, but the application of the method is not limited to the three specific tasks, and the method can be applied to other tasks, can also be applied to more than three task conditions, and can be used for adjusting a multi-task learning model according to actual conditions. The situation that other tasks or three or more tasks are included belongs to the technical problem solved by the invention.

Claims

1. A multi-task learning self-adaptive balancing method based on information entropy dynamic weighting is characterized by comprising the following steps:

firstly, an initial multi-task learning model M is built, an input image is inferred through the multi-task learning model M, different types of output of different tasks are obtained, normalization processing is carried out respectively, and a normalization probability map corresponding to the different tasks is obtained;

the multi-task learning model M carries out model inference on an input image to generate three pixel-level task outputs which are respectively semantic segmentation output graphs P _s Depth estimation output map P _d And an edge detection output map P _b The corresponding normalized probability map is:

1) Semantic segmentation output map P _s Processing by adopting a softmax function to obtain a normalized semantic segmentation probability map:

s is the total category number of semantic segmentation, i represents the i-th layer semantic category in the predictive picture, and P _s,i Outputting a map P for the model _s Is divided into numerical graphs by the i-th layer of the semanteme, and P' _s,i Then represents normalized i-th layer semantic segmentation probability map P' _s ；

2) Classified edge detection output map P _b Processing by adopting a sigmoid function to obtain a normalized edge detection probability map P' _b ：

firstly, discrete division of depth values of a continuous space into K categories corresponding to K sub-intervals is performed by adopting a logarithmic space discretization strategy, and the method specifically comprises the following steps:

depth value interval [ D ] ₁ ,D ₂ ]Mapping to [ D ] ₁ +1,D ₂ +1]Is marked as [ D ]' ₁ ,D′ ₂ ]And according to the discretized depth threshold d _k Dividing to obtain K sub-intervals { [ d ] ₀ ,d ₁ ],[d ₁ ,d ₂ ],...,[d _K-1 ,d _K ]}；

Discretized depth threshold d _k The definition is as follows:

then discretizing the depth estimation truth value into depth classification truth value according to the strategy, namely when the depth estimation truth value is in [ d ] _k-1 ,d _k ]The class is assigned as k, and the deep task branches are trained by depth classification truth values;

finally, obtaining a depth classification predictive graph in a training stage, and processing by adopting a softmax function to obtain a normalized depth classification probability graph P' _d,k ；

The depth classification probability map is:

wherein K is the total category number of depth classification, K represents the kth depth category, and P _d,k Representing a k-th layer depth classification prediction graph, P' _d,k Representing a normalized k-th layer depth classification probability map;

then, calculating a multi-task loss function by utilizing each normalized probability map, and performing preliminary training on a multi-task learning model M through the multi-task loss function;

finally, on the basis of the preliminarily trained multi-task learning model M, constructing a final self-adaptive multi-task loss function through an information entropy dynamic weighting algorithm, obtaining a parameter gradient of the current multi-task learning model M by utilizing a back propagation algorithm, updating parameters, and completing one-time iterative training;

the specific process for constructing the final self-adaptive multitasking loss function is as follows:

step 501, calculating the information entropy value E of each task by using each class multi-layer probability map _t ：

step 502, assigning relative weights w of the tasks by using the information entropy values _t ；

Relative weight w _t The method comprises the following steps:

step 503, according to the relative weight of each task and the cross entropy loss function L _t Constructing a final self-adaptive multitask loss function in a weighted summation mode;

after iterative training, a new multi-task learning model M1 is obtained, the input image is inferred and normalized again, the next iteration is carried out by utilizing the self-adaptive multi-task loss function until the multi-task learning model M1 achieves convergence, and training is terminated.

2. The adaptive balancing method for multi-task learning based on dynamic weighting of information entropy according to claim 1, wherein the multi-task learning model comprises a shared encoder and decoders corresponding to specific tasks.

3. The adaptive balance method for multi-task learning based on information entropy dynamic weighting according to claim 1, wherein the specific process of calculating the multi-task loss function and performing preliminary training by the multi-task learning model is as follows:

cross entropy loss function L _t The method comprises the following steps:

wherein y is _t A supervision category label is provided for the one-hot form corresponding to each task; t is s, d or b, i.e. P _t ' is a normalized probability map for semantic segmentation, edge detection, or depth estimation tasks; c is the total category number corresponding to each task, i represents the category of the ith layer in the predictive diagram;