CN112966811B

CN112966811B - Method and network for solving task conflict in MTL convolutional neural network

Info

Publication number: CN112966811B
Application number: CN202110155686.1A
Authority: CN
Inventors: 周傲; 丁春涛; 白乐金; 马骁; 徐梦炜; 孙其博; 王尚广
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-02-04
Filing date: 2021-02-04
Publication date: 2023-04-14
Anticipated expiration: 2041-02-04
Also published as: CN112966811A

Abstract

The embodiment of the invention discloses a method and a network for solving task conflicts in an MTL convolutional neural network, wherein a shared shallow layer of the MTL convolutional neural network comprises a modulation module obtained through training, the modulation module determines corresponding subnet structures in the shared shallow layer aiming at different tasks, task information of the tasks is input into the corresponding subnet structures for convolutional processing, the task information is modulated by the modulation module and then output to a task specific layer of the tasks for processing, a task result is output, the processing result is subjected to back propagation by adopting a gradient back propagation mode of an MTL convolutional neural network loss function, and parameters of the MTL neural network are adjusted, wherein the training process of the modulation module and the training of the shared shallow layer are simultaneously carried out by adopting a multi-task parallel learning method, the process is circularly carried out for multiple times until the parameters of the MTL convolutional neural network are converged, and the well-trained MTL convolutional neural network is obtained. Therefore, the conflict of multi-task parallel learning is avoided, and the effect of the MTL convolutional neural network obtained by training in processing different tasks is improved.

Description

Method and network for solving task conflict in MTL convolutional neural network

Technical Field

The invention relates to a deep neural network technology, in particular to a method and a network for solving Task conflicts in a Multi-Task Learning (MTL) convolutional neural network.

Background

The MTL convolutional neural network is obtained by training in a multi-task simultaneous learning mode, the model representation and generalization capability of the trained MTL convolutional neural network can be enhanced by the multi-task simultaneous learning mode, the aim is to utilize useful information contained in a plurality of related tasks to help each task to learn in the MTL convolutional neural network, and the more accurate and accurate MTL convolutional neural network is obtained by training.

As shown in fig. 1, fig. 1 is a schematic structural diagram of an MTL convolutional neural network provided in the prior art, which includes a shared shallow layer and a task specific layer, wherein the shared shallow layer is shared by a plurality of tasks and includes a plurality of convolutional layers connected in series, and performs convolutional processing on input task information; the task specific layer is set for the task, and the task information which is subjected to the convolution processing of the shared shallow layer is input into the corresponding task specific layer to be processed, so that a task processing result is obtained.

When the MTL convolutional neural network is trained, the shared shallow parameter training is determined by multi-task simultaneous learning, specifically, shared task information representations of a plurality of tasks are embedded into the same feature space and input into the shared shallow layer in the MTL convolutional neural network for learning. The task specific layers in the MTL convolutional neural network are obtained by learning respectively based on the specific task feature representation of each task. Therefore, during the training of the MTL convolutional neural network, the shared shallow layer multitask information sharing is beneficial to reducing the calculated amount, meanwhile, the shared shallow layer can enable the tasks with commonality to better utilize the correlation information, the task specific layer can carry out the training independently aiming at each task, and the unification of the shared task information and the specific task characteristic information in the same MTL convolutional neural network is realized.

The shared shallow layer training process of the MTL convolutional neural network comprises the following steps: and simultaneously learning a plurality of related tasks in parallel, performing back propagation according to the gradient of the loss function of the MTL convolutional neural network, adjusting the shared shallow layer parameters in the process, and gradually training the MTL convolutional neural network until convergence. Here, the gradient represents the maximum value of the directional derivative of the loss function at the current point along the direction, and the back propagation is to minimize the loss function by using a gradient descent algorithm after calculating the partial derivative of the loss function to the weight coefficient of the MTL convolutional neural network, so as to gradually determine the optimal point sharing the shallow parameter.

However, in the shared shallow training of the MTL convolutional neural network, task information of a plurality of tasks may assist each other and may interfere with each other. When two tasks have weak correlation or conflict, namely the dependency relationship between the two tasks is weak, the training effect of sharing the same parameter in the shared shallow layer is not good, and they may compete with each other in the training of the shared shallow layer, which causes the gradient directions of the loss function of the MIL convolutional neural network to be inconsistent and the training to be difficult to perform, and causes the MTL convolutional neural network obtained by training to have poor effect when processing different tasks.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method for solving task conflicts in an MTL convolutional neural network, which can alleviate the conflict of multi-task parallel learning with weak correlation in a training process of the MTL convolutional neural network, and improve an effect of the MTL convolutional neural network obtained through training when processing different tasks.

The embodiment of the invention also provides a network for solving the task conflict in the MTL convolutional neural network, which can relieve the conflict of multi-task parallel learning with weak correlation in the training process of the MTL convolutional neural network and improve the effect of the MTL convolutional neural network obtained by training when processing different tasks.

The invention is realized in the following way:

a solution method for task conflict in multi-task learning MTL convolutional neural network, includes the modulation module obtained by training in the sharing shallow layer of MTL convolutional neural network, the training process of the modulation module and the training of the sharing shallow layer are carried out simultaneously by adopting the multi-task parallel learning method, the method also includes:

the modulation module determines corresponding subnet structures in a shared shallow layer aiming at different tasks, inputs task information of the tasks into the determined corresponding subnet structures for convolution processing, and outputs the task information to a task specific layer of the tasks for processing after modulation of the modulation module to obtain processing results; carrying out back propagation on the processing result by adopting a gradient back propagation mode of an MTL convolutional neural network loss function, and adjusting parameters of the MTL neural network;

and circularly executing the processes for multiple times until the parameters of the MTL convolutional neural network are converged to obtain the trained MTL convolutional neural network.

Preferably, the module for including the trained modulation in the shared shallow layer of the MTL convolutional neural network comprises:

embedding the trained modulation modules in the convolutional layers of each layer in the shared shallow layer of the MTL convolutional neural network. After the task information of the task enters the convolution layer of each layer, the modulation module selects the subnet structure corresponding to the task and modulates the task information of the task, and then the task information enters the convolution layer of the next layer for processing.

Preferably, the determining, by the modulation module, corresponding subnet structures in the shared shallow layer for different tasks includes:

in each convolutional layer in the shared shallow layer, randomly selecting a plurality of convolutional layer channels from the multi-convolutional layer channels to be distributed to a task, and taking the selected plurality of convolutional layer channels as a sub-network structure aiming at the task;

each convolution channel in the multiple convolution layer channels in the shared shallow layer adopts a binary mask BM identification, the BMs of all the convolution layer channels are combined together to form a subnet selector B, C represents the number of the channels of the convolution channel, and B represents the subnet selector B of the task t _t ＝{BM _c }，BM _c ＝0or 1，c∈{1,2,…C}。

Preferably, the modulation by the modulation module includes:

modulating the task by adopting a scale vector, wherein the scale vector represents the contribution degree of each convolutional layer channel to the task;

the dimension of the scale vector is the number C of the convolutional layer channels in the sub-network structure corresponding to the task _new Where the values of the individual elements are values between 0 and 1, a scale vector M for the task t _t Is denoted as M _t ＝{M _c }，c∈{1,2,…c _new }。

Preferably, the training process of the modulation module and the training of the shared shallow layer adopt a multi-task parallel learning method to perform simultaneously, including:

scale vector M adopted for task modulation in modulation module _t Training the parameters of the MTL convolutional neural network together by adopting a gradient back propagation mode of an MTL convolutional neural network loss function through back propagation and the parameters of the MTL convolutional neural network, gradually adjusting the parameters of the MTL neural network and modulating a scale vector M adopted by aiming at a task _t Until the parameters of the MTL convolutional neural network converge;

the parameters of the MTL convolutional neural network F are represented by theta, the updating of which depends on the gradient thereof

The formula is as follows:

wherein L is a loss function of the MTL convolutional neural network F, I is an input of the MTL convolutional neural network F, F is an output of the MTL convolutional neural network FF, and F = F (I | theta, M) _t )。

Preferably, the training of the modulation module in the gradient back propagation manner using the MTL convolutional neural network loss function further includes:

if the task t conflicts with the task t', the modulation module modulates the gradient directions from different tasks, and a gradient formula behind the modulation module is introduced:

wherein it is present>

Represents a backtransmission gradient of task t, <' >>

Representing the backtransmission gradient of task t'.

Preferably, the method further comprises:

processing received task information of a certain task by adopting an MTL convolutional neural network obtained through training, wherein during processing, a modulation module is embedded in each convolutional layer of a shared shallow layer in the MTL convolutional neural network, after selecting a subnet structure corresponding to the task in the convolutional layer of the current layer for convolutional processing, the task information of the task is modulated by adopting a set scale vector, and then the modulated subnet information is input into the next convolutional layer for the same processing until the processing is finished through the shared shallow layer.

A network for resolving task conflicts in a multi-task learning MTL convolutional neural network, comprising: sharing shallow layer and task specific layer, sharing shallow layer having multi-layer convolution layer, embedding a modulation module in each convolution layer for selecting convolution layer channel processed in the convolution layer for the incoming task and modulating task information of the task, wherein,

the shared shallow layer is used for the modulation module to determine a corresponding sub-network structure in the convolution layer of each layer aiming at different tasks; inputting task information of a task into the determined corresponding sub-network structure for convolution processing, modulating the task information by a modulation module, and outputting the task information to a task specific layer of the task for processing, wherein the training process of the modulation module and the training of the shared shallow layer are simultaneously carried out by adopting a multi-task parallel learning method;

the task specific layer is used for receiving the task information modulated by the modulation module to obtain a processing result;

performing back propagation on the processing result in the task specific layer and the shared shallow layer by adopting a gradient back propagation mode of an MTL convolutional neural network loss function, and adjusting parameters of the MTL neural network;

and circularly executing the previous process for multiple times until the parameters of the MTL convolutional neural network are converged to obtain the well-trained MTL convolutional neural network.

Preferably, the method further comprises the following steps:

and processing the received task information of a certain task in the shared shallow layer and the task specific layer by adopting the MTL convolutional neural network obtained by training, wherein during processing, a modulation module embedded in each convolutional layer of the shared shallow layer in the MTL convolutional neural network selects a subnet structure corresponding to the task in the current convolutional layer for convolutional processing of the task information of the task, and then the subnet structure is modulated by adopting a set scale vector and then input into the next convolutional layer for the same processing until the task information is processed through the shared shallow layer.

As can be seen from the above, in the embodiment of the present invention, the shared shallow layer of the MTL convolutional neural network includes a modulation module obtained through training, where the modulation module determines corresponding subnet structures in the shared shallow layer for different tasks, inputs task information of the task to the corresponding subnet structures for convolutional processing, and after modulation by the modulation module, outputs the task information to a task specific layer of the task for processing, and outputs a task result, and performs back propagation on the processing result in a gradient back propagation manner of a loss function of the MTL convolutional neural network to adjust parameters of the MTL neural network, where a training process of the modulation module and training of the shared shallow layer are performed simultaneously in a multi-task parallel learning method, and the above process is performed repeatedly until the parameters of the MTL convolutional neural network converge, so as to obtain a trained MTL convolutional neural network. Therefore, the sharing shallow layer in the MTL convolutional neural training process adopts different subnet structures for different tasks to carry out convolutional processing and modulation, the conflict of multi-task parallel learning with weak correlation is relieved, and the effect of the MTL convolutional neural network obtained by training when processing different tasks is improved.

Drawings

Fig. 1 is a schematic structural diagram of an MTL convolutional neural network provided in the prior art;

fig. 2 is a flowchart of a method for resolving task conflicts in an MTL convolutional neural network according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a shared shallow layer of an MTL convolutional neural network with a modulation module according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a network for resolving task conflicts in the MTL convolutional neural network according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples.

As can be seen from the background art, the reason why the multiple tasks conflict during the training of the MTL convolutional neural network is that shared task information representations of different tasks are embedded into the same feature space and input to a shared shallow layer in the MTL convolutional neural network for learning, but because network architectures of the shared shallow layer for task information of different tasks are the same, task information of multiple tasks is mutually assisted and interfered with each other, so that gradient directions of loss functions of the MIL convolutional neural network are inconsistent and training is difficult to perform.

Therefore, in order to overcome the above problems, in the embodiments of the present invention, a modulation module obtained by training is set in a shared shallow layer of an MTL convolutional neural network, and the modulation module determines corresponding subnet structures in the shared shallow layer for different tasks, inputs task information of the task to the corresponding subnet structures for convolution processing, and outputs the task information to a task specific layer of the task after modulation by the modulation module for processing, and outputs a task result, and performs back propagation on a processing result by using a gradient back propagation manner of a MTL convolutional neural network loss function to adjust parameters of the MTL neural network, where a training process of the modulation module and training of the shared shallow layer are performed simultaneously by using a multi-task parallel learning method, and the above processes are performed repeatedly until parameters of the MTL convolutional neural network converge, so as to obtain a trained MTL convolutional neural network.

Therefore, the sharing shallow layer in the MTL convolutional neural training process adopts different subnet structures for different tasks to carry out convolutional processing and modulation, the conflict of multi-task parallel learning with weak correlation is relieved, and the effect of the MTL convolutional neural network obtained by training when processing different tasks is improved.

Fig. 2 is a flowchart of a method for resolving task conflicts in an MTL convolutional neural network according to an embodiment of the present invention, which includes the following specific steps:

step 201, a modulation module obtained by training is included in a shared shallow layer of an MTL convolutional neural network, and the training process of the modulation module and the training of the shared shallow layer are simultaneously carried out by adopting a multi-task parallel learning method;

step 202, the modulation module determines corresponding sub-network structures in a shared shallow layer according to different tasks;

in the step, the corresponding sub-network structures are randomly confirmed when different tasks are determined in the shared shallow layer, so that the generalization capability of the MTL convolutional neural network can be improved;

step 203, inputting task information of the task into the determined corresponding subnet structure for convolution processing, modulating the task information by the modulation module, and outputting the task information to a task specific layer of the task for processing to obtain a processing result;

step 204, performing back propagation on the processing result by adopting a gradient back propagation mode of the MTL convolutional neural network loss function, and adjusting parameters of the MTL neural network;

and step 205, circularly executing the processes of the step 202 to the step 204 for multiple times until the parameters of the MTL convolutional neural network are converged, and obtaining the trained MTL convolutional neural network.

In the method, the step of including a trained modulation module in a shared shallow layer of the MTL convolutional neural network comprises the following steps:

embedding the trained modulation modules in the convolutional layers of each layer in the shared shallow layer of the MTL convolutional neural network. After the task information of the task enters the convolution layer of each layer, the modulation module selects the subnet structure corresponding to the task and modulates the task information of the task, and then the task information enters the convolution layer of the next layer for processing. Fig. 3 is a schematic structural diagram of an MTL convolutional neural network with a modulation module according to an embodiment of the present invention, as shown in fig. 3, in a shared shallow layer in the MTL convolutional neural network, there are multiple convolutional layers, and a modulation module is embedded in each convolutional layer, and is used for selecting a convolutional layer channel to be processed in the layer for an incoming task and modulating task information of the task. Of course, according to different network architectures, a pooling layer for reducing the feature expression matrix of the task information may be included between each convolution layer, and the modulated task information is input into the pooling layer for processing and then input into the convolution layer of the next layer for processing. In this way, the modulated task information is accompanied by representation information related to the task, so that the coupling relation of the conflict task in the shared shallow layer is weakened.

In the method, the determining, by the modulation module, corresponding subnet structures in a shared shallow layer for different tasks includes:

in each convolutional layer in the shared shallow layer, a plurality of convolutional layer channels are randomly selected from the plurality of convolutional layer channels to be distributed to one task, and the selected plurality of convolutional layer channels serve as subnet structures for the task.

Specifically, each convolution channel in the multiple convolution layer channels in the shared shallow layer is identified by an identifier applied to the corresponding task, where the identifier may be a Binary Mask (BM). The setting of the BM is generated randomly, in particular, when the MTL convolutional neural network is instantiated. Once set, the BM is not modified and does not participate in the training of the MTL convolutional neural network, and therefore, the task corresponding to the convolutional layer channel in the shared shallow layer is also persistent.

The multi-convolution layer channels in the shared shallow layer are respectively distributed to different tasks, a corresponding subnet structure is set for each task, and the task information flow direction of each task can be adjusted. The BMs of the convolutional layer channels are combined together to form a subnet selector B, and C represents the number of channels of the convolutional channel, so that the subnet selector B for the task t is represented as follows:

B _t ＝{BM _c }，BM _c ＝0or 1，c∈{1,2,…C}。

in the method, the modulating by the modulating module includes:

and modulating the task by adopting a corresponding scale vector, wherein the scale vector represents the contribution degree of each convolutional layer channel to the task. Specifically, after the subnet structure corresponding to the task is formed, the corresponding scale vector is used for modulation, that is, the modulated input is nonzero convolutional layer channel data multiplied by BM. The corresponding scale vector is task dependentThe dimension of the scale vector is the number C of convolutional layer channels in the sub-network structure corresponding to the task _new Wherein the value of each element is a value between 0 and 1, that is, the number C of convolutional layer channels in the subnet structure is based on the subnet structure corresponding to the task _new The vector represents the contribution of the feature representation of each convolutional layer channel to the task. Scale vector M for task t _t Is represented as follows:

M _t ＝{M _c }，c∈{1,2,…c _new }。

in the method, the training process of the modulation module and the training of the shared shallow layer adopt a multi-task parallel learning method to simultaneously perform the following steps:

scale vector M used for modulating tasks in modulation module _t Training the parameters of the MTL convolutional neural network together by adopting a gradient back propagation mode of an MTL convolutional neural network loss function through back propagation and the parameters of the MTL convolutional neural network, gradually adjusting the parameters of the MTL neural network and modulating a scale vector M adopted by aiming at a task _t Until the parameters of the MTL convolutional neural network converge. The parameters of the MTL convolutional neural network F are represented by theta, and the updating of theta depends on the gradient of theta

The formula is as follows:

Thus, the scale vector M adopted by the modulation module for modulating the task _t And introducing the parameters into a parameter set and a loss function of the MTL convolutional neural network, and learning each task in parallel and continuously by adopting a gradient back propagation mode of the loss function in the MTL convolutional neural network until the parameters of the MTL convolutional neural network are converged.

Training the modulation module in a gradient back propagation mode by adopting an MTL convolutional neural network loss function, and further comprising the following steps of:

assuming that the tasks t and t' conflict, the modulation module modulates the gradient directions from different tasks, and the gradient update formula after the modulation module is introduced is as follows:

wherein,

represents a backtransmission gradient of task t, <' >>

Representing the backtransmission gradient of task t'.

Therefore, after the modulation module respectively selects the corresponding sub-network structures for the multiple tasks in the MTL convolutional neural network, after different tasks are subjected to convolutional processing in the corresponding sub-network structures, the embodiment of the invention modulates the task information subjected to the convolutional processing through the scale vector, so that the tasks of the modulated tasks newly comprise information related to the tasks, the problem of task conflict is solved, and the multiple task parallel learning can achieve an ideal effect.

In the method, after the MTL convolutional neural network is obtained by training using the process shown in fig. 2, the method further includes:

Fig. 4 is a schematic diagram of a network for resolving task conflicts in an MTL convolutional neural network according to an embodiment of the present invention, including: sharing shallow layer and task specific layer, the sharing shallow layer has multi-layer convolution layer, in each convolution layer a modulation module is embedded, and is used for selecting convolution layer channel to be processed in said layer and modulating task information of said task for incoming task,

In this structure, further comprising: and processing the received task information of a certain task in the shared shallow layer and the task specific layer by adopting the MTL convolutional neural network obtained by training, wherein during processing, a modulation module embedded in each convolutional layer of the shared shallow layer in the MTL convolutional neural network selects a subnet structure corresponding to the task in the current convolutional layer for convolutional processing of the task information of the task, and then the subnet structure is modulated by adopting a set scale vector and then input into the next convolutional layer for the same processing until the task information is processed through the shared shallow layer.

Therefore, the modulation module is introduced into each convolution layer in the shared shallow layer in the MTL convolutional neural network, convolution processing can be carried out on the sub-network structure corresponding to the task in the convolution layer, and modulation can be carried out, so that the problem of conflict of weak related multi-tasks during parallel learning is avoided, the learning effect is improved, and the effect of processing tasks of the MTL convolutional neural network obtained through final training is better.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for solving task conflict in a multi-task learning MTL convolutional neural network is characterized in that a shared shallow layer of the MTL convolutional neural network comprises a modulation module obtained by training, the training process of the modulation module and the training of the shared shallow layer are carried out simultaneously by adopting a multi-task parallel learning method, and the method also comprises the following steps:

the modulation module determines corresponding sub-network structures in a shared shallow layer aiming at different tasks, inputs task information of the tasks into the determined corresponding sub-network structures for convolution processing, and outputs the task information to a task specific layer of the tasks for processing after the task information is modulated by the modulation module to obtain processing results; performing back propagation on the processing result by adopting a gradient back propagation mode of an MTL convolutional neural network loss function, and adjusting parameters of the MTL neural network;

circularly executing the processes for multiple times until the parameters of the MTL convolutional neural network are converged to obtain a trained MTL convolutional neural network;

the modulation module determines corresponding sub-network structures in the shared shallow layer aiming at different tasks, and the sub-network structures comprise:

randomly selecting a plurality of convolutional layer channels from the multi-convolutional layer channels in each convolutional layer in the shared shallow layer to be distributed to a task, wherein the selected plurality of convolutional layer channels are used as a sub-network structure aiming at the task;

each convolution channel in the multiple convolution layer channels in the shared shallow layer adopts a binary mask BM identification, the BMs of all the convolution layer channels are combined together to form a subnet selector B, C represents the number of the channels of the convolution channel, and B represents the subnet selector B of the task t _t ＝{BM _c }，BM _c ＝0 or 1，c∈{1,2,…C}。

2. The method of claim 1, wherein including a trained modulation module in a shared shallow layer of an MTL convolutional neural network comprises:

embedding a modulation module obtained by training in a convolutional layer of each layer in a shared shallow layer of the MTL convolutional neural network; after the task information of the task enters the convolution layer of each layer, the modulation module selects the subnet structure corresponding to the task and modulates the task information of the task, and then the task information enters the convolution layer of the next layer for processing.

3. The method of claim 1, wherein the modulation by the modulation module comprises:

4. The method of claim 3, wherein the training process of the modulation module and the training of the shared shallow layer simultaneously adopt a multi-task parallel learning method, and comprises the following steps:

The formula is as follows:

wherein L is a loss function of the MTL convolutional neural network F, I is an input of the MTL convolutional neural network F, F is an output of the MTL convolutional neural network FF, and F = F (I | θ, M) _t )。

5. The method of claim 4, wherein said training together with gradient backpropagation of MTL convolutional neural network loss functions further comprises:

if the task t conflicts with the task t', the modulation module modulates the gradient directions from different tasks, and a gradient formula after the modulation module is introduced is as follows:

wherein +>

Representing a backtransmission gradient of task t, <' >>

Representing the backtransmission gradient of task t'.

6. The method of any one of claims 1 to 5, further comprising:

7. A system for task conflict resolution in a multi-task learning MTL convolutional neural network using the method of claim 1, comprising: sharing shallow layer and task specific layer, sharing shallow layer having multi-layer convolution layer, embedding a modulation module in each convolution layer for selecting convolution layer channel processed in the convolution layer for the incoming task and modulating task information of the task, wherein,

the shared shallow layer is used for the modulation module to determine a corresponding sub-network structure in the convolution layer of each layer aiming at different tasks; inputting task information of a task into the determined corresponding subnet structure for convolution processing, modulating the task information by a modulation module, and outputting the task information to a task specific layer of the task for processing, wherein the training process of the modulation module and the training of the shared shallow layer are simultaneously carried out by adopting a multi-task parallel learning method;

8. The network system of claim 7, further comprising: