CN112001741A

CN112001741A - Method and device for constructing multitask processing model, electronic equipment and storage medium

Info

Publication number: CN112001741A
Application number: CN202010688258.0A
Authority: CN
Inventors: 高海涵; 谢乾龙; 王兴星; 张小华; 杨蕾
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-07-16
Filing date: 2020-07-16
Publication date: 2020-11-27

Abstract

The application provides a method and a device for constructing a multitasking model, electronic equipment and a storage medium. The method comprises the following steps: acquiring a target task to be processed in a target scene and a plurality of data processing dimensions of the target scene; on the basis of the data processing dimensions, a first sub-network model corresponding to each data processing dimension and a second sub-network model used for controlling the weight of each data processing dimension on each target task are built in a preset multi-task processing model framework, and a target model of a target scene is obtained; acquiring training samples of a target scene, determining the data processing dimension of each training sample, and training a target model based on the training samples; during the training process of the target model, determining a first sub-network model used for inputting the sample characteristics of the training samples based on the data processing dimensionality of each training sample; the target scene comprises at least one of an object recommendation scene and a data prediction scene. Thereby making the entire model learning more sufficient.

Description

Method and device for constructing multitask processing model, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for constructing a multitasking model, an electronic device, and a storage medium.

Background

With the continuous advance of networking and computer processes, and the increasing abundance of various products and information contents, the demand for rapidly acquiring task processing results of different tasks in different scenes is also increasing. At present, in the field of multitasking, multitasking estimation is a main mode. The method can process a plurality of different tasks at the same time, for example, the starting frequency of the communication equipment is predicted at the same time, each index can be regarded as a branch when the indexes such as the communication quality in the starting process, the equipment quality of the communication equipment and the like are predicted at the same time, and the same characteristic matrix and the network structure are commonly used for parallel prediction. Therefore, three indexes can be estimated simultaneously through one model, and the same network structure can be jointly optimized through the three indexes.

However, in the related technical solutions, when multitasking is performed, all features are generally managed uniformly, and required data in the task processing process is not further subdivided according to scene characteristics, task characteristics and the like, so that model learning is insufficient, some characteristics strongly related to business are not learned, and the model training process is insufficient.

Disclosure of Invention

The embodiment of the application provides a method and a device for constructing a multitask model, electronic equipment and a storage medium, and aims to solve the problem that the accuracy of a model prediction result is not high in the related technology.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a method for constructing a multitasking model, including:

acquiring a target task to be processed in a target scene and a plurality of data processing dimensions aiming at the target scene;

on the basis of the data processing dimensions, constructing a first sub-network model corresponding to each data processing dimension and a second sub-network model used for controlling the weight of each data processing dimension on each target task in a preset multi-task processing model framework to obtain a target model of the target scene;

acquiring training samples of the target scene, determining the data processing dimension of each training sample, and training the target model based on the training samples;

during the training process of the target model, determining a first sub-network model used for inputting the sample characteristics of the training samples based on the data processing dimension of each training sample; the target scene comprises at least one of an object recommendation scene and a data prediction scene.

In a second aspect, an embodiment of the present application provides an apparatus for constructing a multitasking model, including:

the data acquisition module is used for acquiring a target task to be processed in a target scene and a plurality of data processing dimensions aiming at the target scene;

the model building module is used for building a first sub-network model corresponding to each data processing dimension and a second sub-network model used for controlling the weight of each data processing dimension on each target task in a preset multi-task processing model framework based on the data processing dimensions to obtain a target model of the target scene;

the model training module is used for acquiring training samples of the target scene, determining the data processing dimension of each training sample, and training the target model based on the training samples;

In a third aspect, an embodiment of the present application additionally provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method of constructing a multitasking model as described above.

In a fourth aspect, the present embodiment provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the steps of the method for building a multitasking model as described above.

In the embodiment of the application, when a plurality of target tasks are estimated, a first sub-network model can be built according to data processing dimensions, different characteristics of the data processing dimensions are learned, and the switch network serving as a second sub-network model is matched to control the weight of the data processing dimensions on the target tasks, so that the whole model is more sufficiently learned.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of steps of a method of constructing a multitasking model in an embodiment of the present application;

FIG. 2 is a flow chart of steps in another method of constructing a multitasking model in an embodiment of the present application;

FIG. 3 is a schematic diagram of a structure of an object model in an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an apparatus for constructing a multitasking model according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an apparatus for constructing a multitasking model according to an embodiment of the present application;

fig. 6 is a schematic hardware structure diagram of an electronic device in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, a flowchart illustrating steps of a method for constructing a multitasking model in an embodiment of the present application is shown.

Step 110, a target task to be processed in a target scene and a plurality of data processing dimensions for the target scene are obtained.

In practical application, data in the same scene can have a plurality of different data processing dimensions. However, as described above, in the prior art, all features are generally managed uniformly, and the features are not further subdivided according to the service features and the data processing dimensions of the data, so that model learning is insufficient, and some features strongly related to the service are difficult to learn.

Therefore, in the embodiment of the present application, in order to merge features in different data processing dimensions in a service into a model structure, a sub-channel network may be constructed, and specifically, a network model may be constructed according to the data processing dimensions in a target scene. Then, in order to determine the structure of the model, a target task to be processed in a target scene and a plurality of data processing dimensions for the target scene may be obtained, so as to construct a training model according to the target task and the data processing dimensions.

The target task and the data processing dimension can be set by self according to requirements and specific application scenarios, and the embodiment of the application is not limited.

For example, in the field of advertisement prediction, multitask prediction is a primary approach. The method can simultaneously predict a plurality of different indexes in the advertisement, such as click rate, order placing rate, GMV (Gross Merchandis Volume) and the like. Moreover, for the data processing dimension of the data, the input data of the client can correspond to the following modes: data are obtained through WeChat applets, official APP, webpages and the like. Then the target task at this time in this scenario may be click rate, order rate, GMV, etc. And if the data processing dimension is divided according to the acquisition channel of the data, the data processing dimension at this time may include the data processing dimension through the WeChat applet, the data processing dimension through the official APP, the data processing dimension through the webpage, and the like.

Or in the field of communication technology, when the enabling frequency of the communication device, the communication quality in each enabling process, the device quality of the communication device, and other indicators are predicted at the same time, the target task in this scenario may be the enabling frequency of the communication device, the communication quality in each enabling process, the device quality of the communication device, and so on.

In the application scene of the website, the target task may be data traffic, access amount, and the like in a certain time period; at this time, if the data processing dimension is still divided according to different data acquisition channels, the data processing dimension may include a computer-side-based data processing dimension, a mobile phone-side-based data processing dimension, or a group-based data processing dimension, an individual-based data processing dimension, and so on. Alternatively, the data processing dimension may be divided with reference to a time period (e.g., early, middle, late) in which the data is generated, the data processing dimension may be divided with reference to a user age (e.g., teenager, middle age, old age) to which the data corresponds, and the like, and if the input data includes a plurality of types of data, the data processing dimension that may be adopted for different types of data may be different, the data processing dimension in the corresponding scene may be divided according to the data processing dimension that may be adopted for the data accordingly; and so on.

And 120, constructing a first sub-network model corresponding to each data processing dimension and a second sub-network model used for controlling the weight of each data processing dimension on each target task in a preset multi-task processing model framework based on the data processing dimensions to obtain a target model of the target scene.

After obtaining the target tasks and the data processing dimensions of the target scene, a first sub-network model corresponding to each data processing dimension and a second sub-network model used for controlling the weight of each data processing dimension on each target task are built in a preset multi-task processing model framework based on the data processing dimensions, so that the target model of the target scene is obtained.

The multitask processing model framework applicable to the target scene can also be set in a user-defined mode according to requirements, and the embodiment of the application is not limited. For example, multitasking model frames adapted to different scenes may be preset, and then after the target scene is determined, the multitasking model frame applicable to the target scene may be obtained according to the preset adaptation relationship between the scene and the model frames. The multitask model framework can define the number of layers contained in the model, the effect of each layer and any information related to the model.

When the target model corresponding to the target task is created, the interior of the model frame may be filled on the basis of the multitasking model frame, the models included therein are created, and the created models are combined to obtain the target model of the target scene. Specifically, after a multitasking model framework applicable to the target scene is obtained, for each data processing dimension in the target scene, a first sub-network model corresponding to each data processing dimension and a second sub-network model used for controlling the weight of each data processing dimension on each target task are built in the multitasking model framework, so as to obtain the target model of the target scene. The specific types and structures of the first sub-network model and the second sub-network model may be set by a user according to requirements, and the embodiment of the present application is not limited thereto.

For example, for the advertisement prediction field, three first sub-network models can be constructed according to different data processing dimensions of data, and are specially used for learning of each data processing dimension, namely, a small program sub-network, an official APP sub-network and a webpage sub-network. And corresponding to each pre-estimated task and each data processing dimension, constructing a second sub-network model for controlling the weight of each data processing dimension to each target task, and representing the weight proportion of each channel in different pre-estimated tasks.

For example, a corresponding switch network may be constructed for each predictor task, and the network may be a shallow network, and the number of output nodes of the network may be the same as that of the output nodes of the first sub-network model. For example, for the above advertisement prediction field, the number of the switching networks may be 3, which respectively correspond to the order placement rate, the click rate and the GMV, and represent the weight ratio of each data processing dimension in different target tasks.

Step 130, acquiring training samples of the target scene, determining a data processing dimension of each training sample, and training the target model based on the training samples; during the training process of the target model, determining a first sub-network model used for inputting the sample characteristics of the training samples based on the data processing dimension of each training sample; the target scene comprises at least one of an object recommendation scene and a data prediction scene.

After the target model of the target scene is constructed, a training sample of the target scene may be further obtained. Moreover, as described above, in the embodiment of the present application, the differences and the associations between different data processing dimensions are obtained for training, so as to perform more detailed differences between different data processing dimensions, thereby making the model estimation result more accurate. Then, when the model is trained, the sample characteristics of the corresponding training sample can be input through the first sub-network model corresponding to the corresponding data processing dimension based on the data processing dimension of each training sample, so that the difference and the association between different data processing dimensions are obtained in the training process, the different data processing dimensions are more finely distinguished, and the accuracy of the prediction result of the trained model is improved.

In the embodiment of the present invention, the target scenario may include, but is not limited to, at least one of an object recommendation scenario and a data prediction scenario. Moreover, the object recommendation scene and the data prediction scene may be further subdivided according to requirements, and the embodiment of the present invention is not limited.

For example, the object recommendation scene can be further divided into an object recommendation scene in the communication field, an object recommendation scene in the internet field, an object recommendation scene in the advertisement field, and the like according to different service fields; the data prediction scene can be further divided into a data prediction scene in a communication field, a data prediction scene in an internet field, a data prediction scene in an advertisement field, and the like according to different service fields.

And for different target scenes, the target tasks under the target scenes can be different, so that the training samples can be different correspondingly, and the training samples can be specifically set by self-definition according to requirements, so that the embodiment of the invention is not limited.

Referring to fig. 2, in the embodiment of the present application, before the step 130, the method may further include:

step 140, obtaining a logical relationship between the target tasks in the target scene.

And 150, establishing a connection relation between the network branches corresponding to the target tasks in the target model according to the logic relation.

As described above, in the prior art, each predicted task is generally used as a branch to perform prediction in parallel, and the logical relationship and the sequence of each task in the service are not well considered, which wastes some potential factors. In the embodiment of the present application, in order to avoid the above problem, service logic may be considered on the basis of a multi-task learning framework, and the branch networks corresponding to each target task are assembled according to a service logic relationship, so as to fully utilize service information. Specifically, a logical relationship between the target tasks in the target scene may be obtained, and then a connection relationship between network branches corresponding to the target tasks is established in the target model according to the logical relationship.

For example, assume that the target tasks to be processed in the target scene include Click Through Rate (CTR), place rate (CVR), GMV.

In business logic, the GMV may be calculated by GMV ctr cvr price, where ctr is click rate, cvr is order rate, price is unit price, ctr is click amount/exposure amount, and cvr is order amount/click amount, so that the logic for determining the true price that a user can generate is whether click — > order drop — > unit price of click commodity.

Then, at this time, according to the logical relationship between the target tasks in the target scene, the branch network corresponding to the originally parallel target tasks can be reconstructed, and the click rate P is predicted first_ctrMaking it as an input parameter to make order rate P_ctcvrEstimating, and finally, taking the two as input parameters at the same time and combining the commodity unit price P_priceCarry out P_gmvAnd (7) estimating. And the network can learn the logical relationship among all target tasks in the learning process.

Or, when it is assumed that the target tasks to be processed in the target scene include call related parameters such as call quality and call time, the logical relationship between the target tasks may be to predict call time first, and then predict call quality based on the call time. Then, at this time, according to the logical relationship between the target tasks in the target scene, the branch network corresponding to the originally parallel target tasks can be reconstructed, and the call time is predicted first and then used as an input parameter to perform call quality estimation.

Or, it is assumed that the target task to be processed in the target scene includes an enabling frequency of the communication device, communication quality in each enabling process, device quality of the communication device, and other indicators. In the service logic, the device quality of the communication device is related to the communication quality of the communication device each time the communication device is enabled, i.e. the device quality of the communication device is related to its enabling frequency and the communication quality during each enabling.

Then, according to the logical relationship between the target tasks in the target scene, the branch network corresponding to the originally parallel target tasks can be reconstructed, the enabling frequency of the communication equipment is predicted first, and is used as an input parameter to predict the communication quality of the corresponding communication equipment in each enabling process, and finally, the two are simultaneously used as the input parameter and combined with the equipment quality and the enabling frequency of the communication equipment and the logical relationship between the communication qualities in each enabling process to predict the equipment quality of the communication equipment. And the network can learn the logical relationship among all target tasks in the learning process.

If other target tasks exist in the target scene, different structures can be assembled according to different logical relations among the target tasks. If there is no obvious relationship, it can be degenerated into a parallel structure.

Optionally, in this embodiment of the present application, the model framework may include a feature input layer, a shared network layer, and a task branch layer; the feature input layer is configured to obtain sample features of the training sample, and input the sample features into the shared network layer, where the shared network layer is a network layer shared by each target task, the shared network layer is connected to the task branch layer, and the task branch layer includes network branches corresponding to each target task.

In the embodiment of the application, in order to facilitate the multitask learning, a task framework of the multitask learning may be referred to, and the multitask processing model framework includes a feature input layer, a shared network layer and a task branch layer. The feature input layer is configured to obtain sample features of the training sample, and input the sample features into the shared network layer, where the shared network layer is a network layer shared by each target task, the shared network layer is connected to the task branch layer, and the task branch layer includes network branches corresponding to each target task.

At this time, when the connection relationship between the network branches corresponding to the target tasks is established, the original general parallel structure may be logically assembled directly in the task branch layer according to the logical relationship between the target tasks, so as to establish the connection relationship between the network branches corresponding to the target tasks.

Referring to fig. 2, in this embodiment of the present application, the step 120 may further include: constructing a first sub-network model corresponding to each data processing dimension and a second sub-network model used for controlling the weight of each data processing dimension on each target task, and constructing the shared network layer based on the first sub-network model and the second sub-network model; and the shared network layer is connected with the task branch layer so as to input the output data corresponding to each target task in the shared network layer to the network branch corresponding to the target task.

Correspondingly, the shared network layer can be subjected to channel separation processing, so that the model can be used for more finely distinguishing different data processing dimensions. Then a first sub-network model corresponding to each of the data processing dimensions and a second sub-network model for controlling the weight of each of the data processing dimensions on each of the target tasks may be constructed, and the shared network layer may be constructed based on the first sub-network model and the second sub-network model. The number of the first sub-network models in the shared network layer may be the same as the number of the data processing dimensions, and each first sub-network model may be used for training and learning sample features of training samples in the data processing dimensions corresponding to the first sub-network model, so as to more finely distinguish different data processing dimensions.

Moreover, when constructing the second sub-network model for controlling the weight of each data processing dimension to each target task, a second sub-network model may be respectively set for each combination of the data processing dimension and the target task, and then the number of the second sub-network models in the shared network layer may be the product of the number of the data processing dimensions and the number of the target tasks; if the same second sub-network model is adopted for different target tasks under the same data processing dimension, the number of the second sub-network models can be the same as that of the data processing dimension, each second sub-network model corresponds to one data processing dimension, and the data processing dimensions corresponding to the second sub-network models are different; if the same second sub-network model is adopted for the same target tasks under different data processing dimensions, the number of the second sub-network models can be the same as that of the target tasks, each second sub-network model corresponds to one target task, and the target tasks corresponding to the second sub-network models are different.

The specific model type and model structure of the first sub-network model and the specific model type and model structure of the second sub-network model may be set by user according to requirements, which is not limited in this embodiment of the present application. Moreover, the number of the second sub-network models and the corresponding relationship between each second sub-network model and the target task and the data processing dimension can be set by self-definition according to requirements, and the embodiment of the present application is not limited. For example, the first sub-network model and the second sub-network model may be any available machine learning model, and the embodiments of the present application are not limited thereto.

Fig. 3 is a schematic diagram of an object model. At this time, the number of first sub-network models (a in fig. 3) is the same as the number of data processing dimensions, and each first sub-network model corresponds to one data processing dimension, the number of second sub-network models (b in fig. 3) is the same as the number of target tasks, and each second sub-network model corresponds to one target task. The feature input layer can receive input training data containing a plurality of training samples, extract sample features of each training sample through the data preprocessing layer, further input sample data of each training sample into a first sub-network model corresponding to a corresponding data processing dimension according to the data processing dimension of each training sample, and correspondingly input the sample data of each training sample into each second sub-network model to determine weights of different data processing dimensions in different target tasks. Furthermore, for each target task, matrix multiplication and other operations are performed on the weight of the corresponding target task in different data processing dimensions and the output of the second sub-network model in the corresponding data processing dimensions, so that the output of the first sub-network model and the output of the second sub-network model are combined and output to the branch network corresponding to the corresponding target task in the task branch layer. Each branch network in the task branch layer may be any machine learning model, and the embodiment of the present application is not limited thereto.

Optionally, in this embodiment of the present application, the step 150 further may include: and establishing a connection relation between network branches corresponding to the target tasks in the task branch layer according to the logic relation.

As described above, if the multitasking model framework includes the feature input layer, and the network layer and the task branch layer are shared, the network branch corresponding to each target task is included in the task branch layer, and the network branches in the task branch layer may output the model prediction result of each target task.

As shown in the schematic diagram of the target model shown in fig. 3, in an application scenario in which click rate (CTR), ordering rate (CVR), and GMV are target tasks, the logical relationships among the target tasks may be referred to, and the original parallel branch networks in the task branch layer may be recombined to obtain the task branch layer in the target model. When the target task includes the enabling frequency of the communication device, the communication quality in each enabling process, the device quality of the communication device, and other indicators, the task branch layer in the target model may be obtained by recombining the original parallel branch networks in the task branch layer with reference to the above logical relationship between the target tasks.

Optionally, in this embodiment of the application, the number of the second sub-network models is the same as the number of the target tasks, each of the second sub-network models corresponds to any one of the target tasks, and the target tasks corresponding to the second sub-network models are different from each other; or,

the number of the second sub-network models is the same as that of the data processing dimensions, each second sub-network model corresponds to any data processing dimension, and the data processing dimensions corresponding to the second sub-network models are different from each other; or,

the number of the second sub-network models is a product of M and N, where M is the number of the target tasks and N is the number of the data processing dimensions, each of the second sub-network models corresponds to a combination of any one of the target tasks and any one of the data processing dimensions, and the target tasks and the data processing dimensions corresponding to the respective second sub-network models are different from each other.

Optionally, in this embodiment of the present application, if the number of the second sub-network models is the same as the number of the target tasks, the input of each of the second sub-network models includes sample features of all the training samples;

if the number of the second sub-network models is the same as the number of the data processing dimensions, or the number of the second sub-network models is a product of M and N, the input of each second sub-network model comprises the sample characteristics of the training sample in the data processing dimension corresponding to the second sub-network model.

Optionally, in this embodiment of the application, the number of output nodes of the first sub-network model and the second sub-network model is the same, and the first sub-network model is a deep neural network and the second sub-network model is a shallow neural network.

In practical application, the second sub-network model is used for training the weights of different data processing dimensions for different target tasks, and the first sub-network model is used for training and learning the relation between the sample characteristics of the training samples under different data processing dimensions and the target tasks, so that channel-specific learning is achieved. So that the learning accuracy of the second sub-network model may be relatively low with respect to the first sub-network model, the first sub-network model may be set as a deep neural network and the second sub-network model as a shallow neural network.

The shallow (shallow) neural network is a neural network with a few layers as the name implies, for example, the hidden layer has a layer number of 1, and the Deep (Deep) neural network is a neural network with a large number of layers as the name implies. In a neural network, the neurons at shallow positions extract simpler characteristic information from input, and as the hierarchy deepens, the neurons extract increasingly complex characteristic information, so that the neural network can make more accurate judgment. Of course, in the embodiment of the present application, it is also only necessary to set the number of layers of the second sub-network model to be less than that of the first sub-network model according to requirements, and the embodiment of the present application is not limited thereto.

Moreover, in order to facilitate operations between each first sub-network model and each second sub-network model, and to assign a weight to the training process under each target task in each data processing dimension, the number of output nodes of the first sub-network model may be set to be the same as the number of output nodes of the second sub-network model, where a specific value of the number of output nodes may be set by a user according to a requirement, which is not limited in the embodiment of the present application.

Optionally, in this embodiment of the present invention, when the target scene is the object recommendation scene, the target task includes a recommendation task for a target object dimension, where the target object dimension includes at least one of a text dimension, a video dimension, an audio dimension, a time dimension, a geographic location dimension, and a commodity dimension.

For example, when a message is pushed, in order to determine push messages adapted by different users, and recommend audio data, video data, and the like to the user at the same time, a target scene at this time is the object recommendation scene, and the target task may include a recommendation task for a text dimension, a recommendation task for a video dimension, a recommendation task for an audio dimension, and the like.

In the case that the target scenario is the data prediction scenario, the target task comprises a prediction task for a target parameter; the target parameters comprise at least one of quality parameters, attribute parameters and operation parameters; the quality parameters comprise at least one of text quality, audio and video quality, communication quality and network quality, and the attribute parameters comprise at least one of a profit value and audio and video watching duration; the operation parameters comprise at least one of exposure rate, click rate, order placing rate, recommendation rate, browsing rate and forwarding rate.

For example, in the advertisement prediction field, the target scene at this time may be the data prediction scene, and the target task at this time may be a prediction task for click rate, a prediction task for order placement rate, a prediction task for GMV, or the like.

Alternatively, for the above-mentioned communication technology field, the target scenario at this time may also be the above-mentioned data prediction scenario, and the target task at this time may be a prediction task for an enabling frequency, a prediction task for a communication quality, a prediction task for a device quality, and the like. Moreover, where the activation frequency is an operating parameter, both the communication quality and the device quality may be quality parameters.

It should be noted that, in the embodiment of the present invention, an object scene and an object task in the object scene may be set according to a requirement, and when the object scene includes a plurality of object tasks, dimensions or object parameters of object objects corresponding to different object tasks may be different from each other, and of course, may also be partially the same according to the requirement, which is not limited in the embodiment of the present invention.

In the embodiment of the application, when a plurality of target tasks are estimated, the business logic is considered, and the Bayesian formula is adopted to assemble each target task according to the business, so that the model can fully learn the business logic. And when a shared network layer is constructed, a first sub-network model can be built according to data processing dimensions, the different characteristics of the data processing dimensions are learned, and the switch network serving as a second sub-network model is matched to control the weight of the data processing dimensions on each target task, so that the whole model is more sufficiently learned.

Referring to fig. 4, a schematic structural diagram of an apparatus for constructing a multitask processing model in the embodiment of the present application is shown.

The data prediction device of the embodiment of the application comprises: a data acquisition module 210, a model building module 220, and a model training module 230.

The functions of the modules and the interaction relationship between the modules are described in detail below.

A data obtaining module 210, configured to obtain a target task to be processed in a target scene and multiple data processing dimensions for the target scene;

a model building module 220, configured to build, based on the data processing dimensions, a first sub-network model corresponding to each data processing dimension and a second sub-network model for controlling a weight of each data processing dimension on each target task in a preset multi-task processing model framework, so as to obtain a target model of the target scene;

a model training module 230, configured to obtain training samples of the target scene, determine a data processing dimension of each training sample, and train the target model based on the training samples; during the training process of the target model, determining a first sub-network model used for inputting the sample characteristics of the training samples based on the data processing dimension of each training sample; the target scene comprises at least one of an object recommendation scene and a data prediction scene.

Referring to fig. 5, in an embodiment of the present application, the apparatus may further include:

a logical relationship obtaining module 240, configured to obtain a logical relationship between the target tasks in the target scene;

a connection relationship establishing module 250, configured to establish, according to the logical relationship, a connection relationship between network branches corresponding to the target tasks in the target model.

Optionally, in this embodiment of the present application, the multitasking model framework includes a feature input layer, a shared network layer, and a task branch layer; the characteristic input layer is used for obtaining sample characteristics of the training samples and inputting the sample characteristics into the shared network layer; the shared network layer is a network layer shared by all the target tasks, and the shared network layer is connected with the task branch layer; the task branch layer comprises network branches corresponding to each target task.

Referring to fig. 5, in the embodiment of the present application, the model building module 220 is further configured to: constructing a first sub-network model corresponding to each data processing dimension and a second sub-network model used for controlling the weight of each data processing dimension on each target task, and constructing the shared network layer based on the first sub-network model and the second sub-network model; and the shared network layer is connected with the task branch layer so as to input the output data corresponding to each target task in the shared network layer to the network branch corresponding to the target task.

Optionally, in this embodiment of the present application, the connection relationship establishing module 250 is further configured to establish, according to the logical relationship, a connection relationship between network branches corresponding to each target task in the task branch layer.

Optionally, in this embodiment of the application, the number of the second sub-network models is the same as the number of the target tasks, each of the second sub-network models corresponds to any one of the target tasks, and the target tasks corresponding to the second sub-network models are different from each other;

or the number of the second sub-network models is the same as the number of the data processing dimensions, each second sub-network model corresponds to any one data processing dimension, and the data processing dimensions corresponding to the second sub-network models are different from each other;

or the number of the second sub-network models is a product of M and N, where M is the number of the target tasks and N is the number of the data processing dimensions, each of the second sub-network models corresponds to a combination of any one of the target tasks and any one of the data processing dimensions, and the target tasks and the data processing dimensions corresponding to the respective second sub-network models are different from each other.

Optionally, in this embodiment of the application, when the target scene is the object recommendation scene, the target task includes a recommendation task for a target object dimension, where the target object dimension includes at least one of a text dimension, a video dimension, an audio dimension, a time dimension, a geographic location dimension, and a commodity dimension;

in the case that the target scenario is the data prediction scenario, the target task comprises a prediction task for a target parameter; the target parameters comprise at least one of quality parameters, attribute parameters and operation parameters; the quality parameters comprise at least one of text quality, audio and video quality, communication quality and network quality, and the attribute parameters comprise at least one of a profit value and audio and video watching duration; the operation parameters comprise at least one of exposure rate, click rate, order placing rate, recommendation rate and browsing rate.

The data prediction apparatus provided in the embodiment of the present application can implement each process implemented in the method embodiments of fig. 1 to fig. 2, and is not described here again to avoid repetition.

Fig. 6 is a schematic diagram of a hardware structure of an electronic device implementing various embodiments of the present application.

The electronic device 500 includes, but is not limited to: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 507, an interface unit 508, a memory 509, a processor 510, and a power supply 511. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 6 does not constitute a limitation of the electronic device, and that the electronic device may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present application, the electronic device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

It should be understood that, in the embodiment of the present application, the radio frequency unit 501 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 510; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 501 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 501 can also communicate with a network and other devices through a wireless communication system.

The electronic device provides wireless broadband internet access to the user via the network module 502, such as assisting the user in sending and receiving e-mails, browsing web pages, and accessing streaming media.

The audio output unit 503 may convert audio data received by the radio frequency unit 501 or the network module 502 or stored in the memory 509 into an audio signal and output as sound. Also, the audio output unit 503 may also provide audio output related to a specific function performed by the electronic apparatus 500 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 503 includes a speaker, a buzzer, a receiver, and the like.

The input unit 504 is used to receive an audio or video signal. The input Unit 504 may include a Graphics Processing Unit (GPU) 5041 and a microphone 5042, and the Graphics processor 5041 processes image data of a still picture or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 506. The image frames processed by the graphic processor 5041 may be stored in the memory 509 (or other storage medium) or transmitted via the radio frequency unit 501 or the network module 502. The microphone 5042 may receive sounds and may be capable of processing such sounds into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 501 in case of the phone call mode.

The electronic device 500 also includes at least one sensor 505, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 5061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 5061 and/or a backlight when the electronic device 500 is moved to the ear. As one type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of an electronic device (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 505 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The display unit 506 is used to display information input by the user or information provided to the user. The Display unit 506 may include a Display panel 5061, and the Display panel 5061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 507 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 507 includes a touch panel 5071 and other input devices 5072. Touch panel 5071, also referred to as a touch screen, may collect touch operations by a user on or near it (e.g., operations by a user on or near touch panel 5071 using a finger, stylus, or any suitable object or attachment). The touch panel 5071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 510, and receives and executes commands sent by the processor 510. In addition, the touch panel 5071 may be implemented in various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 5071, the user input unit 507 may include other input devices 5072. In particular, other input devices 5072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.

Further, the touch panel 5071 may be overlaid on the display panel 5061, and when the touch panel 5071 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 510 to determine the type of the touch event, and then the processor 510 provides a corresponding visual output on the display panel 5061 according to the type of the touch event. Although in fig. 6, the touch panel 5071 and the display panel 5061 are two independent components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 5071 and the display panel 5061 may be integrated to implement the input and output functions of the electronic device, and is not limited herein.

The interface unit 508 is an interface for connecting an external device to the electronic apparatus 500. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 508 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the electronic apparatus 500 or may be used to transmit data between the electronic apparatus 500 and external devices.

The memory 509 may be used to store software programs as well as various data. The memory 509 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 509 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The processor 510 is a control center of the electronic device, connects various parts of the whole electronic device by using various interfaces and lines, performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 509 and calling data stored in the memory 509, thereby performing overall monitoring of the electronic device. Processor 510 may include one or more processing units; preferably, the processor 510 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 510.

The electronic device 500 may further include a power supply 511 (e.g., a battery) for supplying power to various components, and preferably, the power supply 511 may be logically connected to the processor 510 via a power management system, so as to implement functions of managing charging, discharging, and power consumption via the power management system.

In addition, the electronic device 500 includes some functional modules that are not shown, and are not described in detail herein.

Preferably, an embodiment of the present application further provides an electronic device, including: the processor 510, the memory 509, and a computer program stored in the memory 509 and capable of running on the processor 510, where the computer program, when executed by the processor 510, implements each process of the data prediction method embodiment described above, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements each process of the data prediction method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Those of ordinary skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed in the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for constructing a multitask model is characterized by comprising the following steps:

2. The method of claim 1, wherein prior to said obtaining training samples for the target scene, determining data processing dimensions for each of the training samples, and training the target model based on the training samples, the method further comprises:

acquiring a logical relationship between the target tasks in the target scene;

and establishing a connection relation between the network branches corresponding to the target tasks in the target model according to the logic relation.

3. The method of claim 1, wherein the multitasking model framework comprises a feature input layer, a shared network layer and a task branch layer;

the characteristic input layer is used for obtaining sample characteristics of the training samples and inputting the sample characteristics into the shared network layer; the shared network layer is a network layer shared by all the target tasks, and the shared network layer is connected with the task branch layer; the task branch layer comprises network branches corresponding to all target tasks;

the step of constructing, in a preset multitasking model framework, a first sub-network model corresponding to each data processing dimension and a second sub-network model for controlling the weight of each data processing dimension on each target task based on the data processing dimension to obtain a target model of the target scene includes:

constructing a first sub-network model corresponding to each data processing dimension and a second sub-network model used for controlling the weight of each data processing dimension on each target task, and constructing the shared network layer based on the first sub-network model and the second sub-network model;

and the shared network layer is connected with the task branch layer so as to input the output data corresponding to each target task in the shared network layer to the network branch corresponding to the target task.

4. The method according to claim 3, wherein the step of establishing, according to the logical relationship, a connection relationship between network branches corresponding to the target tasks in the target model includes:

and establishing a connection relation between network branches corresponding to the target tasks in the task branch layer according to the logic relation.

5. The method according to any of claims 1-4, wherein the number of the second sub-network models is the same as the number of the target tasks, and each of the second sub-network models corresponds to any one of the target tasks, and the target tasks corresponding to the second sub-network models are different from each other;

6. The method of claim 5, wherein if the number of the second sub-network models is the same as the number of the target tasks, the input of each of the second sub-network models comprises sample features of all of the training samples;

7. The method of any of claims 1-4, wherein the number of output nodes of the first and second sub-network models is the same, and wherein the first sub-network model is a deep neural network and the second sub-network model is a shallow neural network.

8. The method of any of claims 1-4, wherein in the case that the target scene is the object recommendation scene, the target tasks include recommendation tasks for target object dimensions, the target object dimensions including at least one of a text dimension, a video dimension, an audio dimension, a time dimension, a geographic location dimension;

in the case that the target scenario is the data prediction scenario, the target task comprises a prediction task for a target parameter; the target parameter comprises at least one of a quality parameter, an attribute parameter and an operation parameter.

9. The method of claim 8, wherein the attribute parameters comprise at least one of an income value and an audio-video watching duration, the operation parameters comprise at least one of an exposure rate, a click rate, a list-placing rate, a recommendation rate and a browsing rate, and the quality parameters comprise at least one of text quality, audio-video quality, communication quality and network quality.

10. An apparatus for constructing a multitasking model, comprising:

11. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method of constructing a multitasking model according to any one of claims 1 to 9.

12. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the method of building a multitasking model according to one of the claims 1 to 9.