CN117093862A

CN117093862A - Model training method and device, electronic equipment and storage medium

Info

Publication number: CN117093862A
Application number: CN202310983254.9A
Authority: CN
Inventors: 吴若凡; 刘腾飞; 张天翼; 王维强
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-08-04
Filing date: 2023-08-04
Publication date: 2023-11-21

Abstract

The specification discloses a model training method, device, electronic equipment and storage medium, based on the idea of federal learning, a cross sample and a non-cross sample in a first sample are determined, then an auxiliary label of the cross sample is determined through an initial model trained based on the first sample and a second sample, the first sample is input into a classification layer of a target model, a prediction layer corresponding to each preset classification in the target model is obtained according to a classification result of the first sample, a prediction classification result of the first sample is obtained, and finally the target model is trained based on the label of the cross sample, the auxiliary label and the label of the non-cross sample. When the first sample and the second sample contain fewer cross samples, the accurate target model can be obtained by training based on knowledge of the second sample in the initial model which is trained in advance and each first sample stored in the first node, and the accuracy of the target model obtained by training is ensured on the premise of ensuring the privacy data.

Description

Model training method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for model training, an electronic device, and a storage medium.

Background

With the development of computer technology and the increasing attention of people to self privacy data, the model is trained by adopting a federal learning mode, and the model training method has been widely applied in the field of model training.

Federal learning systems typically include a parameter server and a plurality of work nodes, each of which has stored therein different sample data, and model parameters stored in the parameter server. When model training tasks are required to be executed, the parameter server issues model parameters and model structures to all the working nodes, the working nodes deploy models according to the issued model parameters and model structures, model gradients are determined according to sample data stored by the deployed models, and the model gradients are returned to the parameter server. And updating the model parameters stored by the parameter server according to the received model gradients sent by the working nodes, and transmitting the updated model parameters to the working nodes so as to complete the model training task.

However, when the difference between the sample data stored in each working node is large, there is a case that an accurate model cannot be obtained through training, resulting in lower model training efficiency and accuracy.

Based on this, the present specification provides a model training method.

Disclosure of Invention

The specification provides a method, a device, electronic equipment and a storage medium for model training, so as to improve model training efficiency and accuracy of a trained model.

The technical scheme adopted in the specification is as follows:

the present specification provides a method of model training for use in a first node in a federal learning system, the system comprising the first node storing first samples and a second node storing second samples, at least part of the first samples and at least part of the second samples corresponding to the same user, the method comprising:

determining a cross sample and a non-cross sample from the first samples, wherein the cross sample is used for representing that a second sample corresponding to the same user as the cross sample exists;

determining an initial model obtained by training based on each first sample and each second sample, and determining auxiliary labels of the crossed samples through the initial model;

inputting each first sample into a classification layer of a target model to be trained to obtain a classification result output by the classification layer, wherein the classification result is used for representing whether the first sample is a cross sample or not;

Determining a target prediction result of the first sample according to the classification result and prediction layers corresponding to preset classifications in the target model, wherein the preset classifications comprise a cross type and a non-cross type;

and training the target model according to the target prediction result of the cross sample, the labeling and the auxiliary labeling of the cross sample, and the target prediction result of the non-cross sample and the labeling thereof.

The present specification provides an apparatus for model training, the apparatus being applied to a first node in a federal learning system, the system comprising the first node and a second node, the first node storing respective first samples, the second node storing respective second samples, at least part of the first samples and at least part of the second samples corresponding to the same user, the apparatus comprising:

a sample determining module, configured to determine a cross sample and a non-cross sample from the first samples, where the cross sample is used to characterize that there is a second sample corresponding to the same user as the cross sample;

the marking determining module is used for determining an initial model obtained based on training of each first sample and each second sample, and determining auxiliary marking of each crossed sample through the initial model;

The classification module is used for inputting each first sample into a classification layer of a target model to be trained to obtain a classification result output by the classification layer, and the classification result is used for representing whether the first sample is a cross sample or not;

the prediction module is used for determining a target prediction result of the first sample according to the classification result and a prediction layer corresponding to each preset classification in the target model, wherein each preset classification at least comprises a cross type and a non-cross type;

and the training module is used for training the target model according to the target prediction result of the cross sample, the marking and the auxiliary marking of the cross sample, and the target prediction result of the non-cross sample and the marking thereof.

The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the method of model training described above.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of model training as described above when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

in the model training method provided in the specification, a first sample stored in a first node is divided into a cross sample and a non-cross sample, an auxiliary label of the cross sample is determined through an initial model trained based on the first sample and the second sample, the first sample is input into a classification layer of a target model, a prediction classification result of the first sample is obtained according to the classification result of the first sample and prediction layers corresponding to preset classifications in the target model, and finally the target model is trained based on the label of the cross sample, the auxiliary label and the label of the non-cross sample.

Under the condition that the first sample and the second sample contain fewer cross samples, the accurate target model can be obtained through training based on knowledge of the second sample in the initial model which is completed through training in advance and each first sample stored in the first node, and model training efficiency and accuracy of the target model obtained through training are guaranteed.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. Attached at

In the figure:

FIG. 1 is a flow chart of a method of model training of the present disclosure;

FIG. 2 is a schematic diagram of a target model in the present specification;

FIG. 3 is a schematic diagram of a model training apparatus provided in the present specification;

fig. 4 is a schematic view of an electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

It should be noted that, in this specification, all actions for acquiring signals, information or data are performed under the condition of conforming to the corresponding data protection rule policy of the location, and obtaining the authorization given by the owner of the corresponding device.

At present, along with the development of computer technology and the increasing attention of people to self private data, how to share data to realize model-level cooperation win-win under the premise of guaranteeing the information security of self-stored user data among different institutions is one of the technical problems to be solved at present.

One of the most common approaches to solving the above-mentioned problems is to train the model by federal learning. The federal learning can perform multiparty joint modeling on the premise that local data respectively stored by all parties participating in the federal learning are not located, and the joint training of the model is realized by exchanging intermediate model results among different parties.

Based on this, the present specification provides a model training method, based on the idea of federal learning, of deploying a target model of a hybrid expert model structure in a first node. And performing distillation training on the target model based on an initial model obtained through knowledge training in the first node and the second node, so that under the condition that the difference between sample data stored in each working node is large, samples stored in the first node can be used for training to obtain an accurate target model. The difference between the sample data stored by each working node is larger, and the number of sample data which can be stored for each working node and corresponds to the same user is smaller.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a model training method in the present specification, which specifically includes the following steps:

S100: from the first samples, a cross sample and a non-cross sample are determined, wherein the cross sample is used for representing that a second sample corresponding to the same user as the cross sample exists.

In one or more embodiments provided herein, the model training method may be applied to a first node in a federal learning system. Thus, the execution of the model training method may be performed by a first node for training the model. The first node may be an electronic device such as a server or an intelligent terminal.

In addition, after the training of the target model is completed, in order to ensure the information security of the model structure and the model parameters of the target model, the node for predicting the data to be predicted by using the target model may be a first node or other nodes trusted by the first node. The specific target model after training is deployed on which node to execute the prediction task, and can be set according to the needs, which is not limited in the specification.

For convenience of description, the following description will take an example of a process of executing the model training method by the first node and a process of executing a task by using a trained target model.

Generally, in many scenarios, there is a need for privacy computation and data security. In general, for data held by a single data holder, the characteristics of the data are limited by the type of service performed between the data holder and the user, and therefore, even if the data holder has a wider user group, the data held by the data holder has a problem of single characteristics. That is, for each data holder, the sample data stored by that data holder may correspond to the same user as the sample data of other data holders, but the characteristic dimensions of the sample data stored by that data holder are different from the characteristic dimensions of the sample data stored by other data holders.

In this case, a federal learning method may be adopted, based on sample data respectively stored by a plurality of data holders, a cross sample corresponding to the same user among the data holders is determined, and based on feature enhancement brought by the cross sample, a more accurate model is obtained through training.

In the model training method provided in the present specification, the first node and the second node may each be a data holder storing sample data. The model training method is applied to different scenes, and the functions of the target model obtained by training can be different. The model training method is applied to the wind control scene, the sample data stored in the first node can be a verification mode used when a user logs in the platform, and the sample data stored in the second node can be user data when the user executes the service. The model training method is applied to a text evaluation scene, the sample data stored in the first node can be text data written by a user, and the sample data stored in the second node can be data such as portrait information of the user.

It can be seen that the data respectively stored by different data holders may have a difference in the dimension of the data characteristics, and also have a crossover in the dimension of the user corresponding to the data. That is, sample data stored separately for each node in the model training system may correspond to the same user, but for different data feature dimensions. As previously described, more accurate models may be trained based on cross samples between nodes corresponding to the same user.

Specifically, the federal learning system includes only one first node and one second node. The first node and the second node may receive a model training request.

The second node can determine sample characteristics corresponding to each second sample stored by the second node according to the model training request, and send the determined sample characteristics of each second sample to the first node.

The first node may receive sample characteristics of each second sample sent by the second node and determine a sample identification of each second sample.

Then, from among the first samples stored in the first node itself, the first node whose sample identification is the same as that of the second sample is determined, that is, the first sample and the second sample corresponding to the same user are determined, based on the sample identification of each second sample.

Finally, the first node may determine that the first sample corresponding to the same user as the second sample is a cross sample, and may use the first sample not corresponding to the same user as the second sample as a non-cross sample. That is, a cross sample is used to characterize the presence of a second sample corresponding to the same user as the cross sample, and a non-cross sample is used to characterize the absence of a second sample corresponding to the same user as the cross sample.

S102: and determining an initial model trained based on the first samples and the second samples, and determining auxiliary labels of the crossed samples through the initial model.

In one or more embodiments provided herein, to avoid network failure when deploying a model, different portions of the model may not be deployed in the first node and the second node, and thus, the initial model may be deployed only in the first node. In addition, because an accurate initial model cannot be trained under the condition that the difference between the first node and the second node is large, in the specification, a hybrid expert network structure is adopted to construct a target model, and distillation learning is performed on the target model based on the initial model after training, so that the first node can be trained to obtain the accurate target model based on the knowledge of the first sample and the knowledge of the second sample.

The first node may then determine a pre-trained initial model and determine auxiliary annotations for the intersecting samples from the initial model.

Specifically, the first node may have stored therein an initial model trained based on each first sample in the first node and each second sample in the second node. Wherein the initial model is trainable based on crossing samples of the first sample and the second sample.

The first node may then input the cross sample as input to the initial model after determining the cross sample, resulting in an auxiliary annotation of the cross sample.

Thus, for each first sample in the first node, if the first sample is a cross sample, the first sample has both the original label and the auxiliary label. If the first sample is a non-intersecting sample, the first sample has only one label, the original label.

S104: and inputting each first sample into a classification layer of the target model to be trained, and obtaining a classification result output by the classification layer, wherein the classification result is used for representing whether the first sample is a cross sample or not.

In one or more embodiments provided in the present disclosure, as described above, the object model deployed in the first node may be a hybrid expert network structure, and thus the object model may at least include a classification layer and a prediction layer corresponding to each preset classification, as shown in fig. 2.

Fig. 2 is a schematic structural diagram of the object model provided in the present specification. In the figure, the target model comprises a classification layer, a prediction layer corresponding to a preset classification 1 and a classification layer corresponding to a preset classification 2. Then, the first node may input, for each first sample, the first sample into a classification layer of the target model to be trained, to obtain a classification result output by the classification layer.

Wherein the classification result may be used to indicate whether the first sample is a cross sample. The classification result may be a sample type corresponding to the first sample: i.e. cross samples or non-cross samples. The probability that the first sample belongs to each preset class may also be provided. Taking the example that the preset classification includes a cross type and a non-cross type, the classification result may be a probability that the first sample belongs to the cross type and a probability of the non-cross type.

S106: and determining a target prediction result of the first sample according to the classification result and a prediction layer corresponding to each preset classification in the target model, wherein each preset classification comprises a cross type and a non-cross type.

In one or more embodiments provided herein, after determining the classification result, the first node may determine the target prediction result of the first sample based on the classification result and a prediction layer corresponding to each preset classification in the target model.

Specifically, taking the example that the classification result is used to indicate whether the first sample is a cross sample.

When the first sample is a cross sample, the first node may determine a sample feature of the first sample, and input the sample feature as an input into a prediction layer corresponding to a cross type in the target model. And obtaining an output result of the prediction layer corresponding to the cross type as a target prediction result of the first sample.

Similarly, when the first sample is a non-intersecting sample, the first node may determine a sample feature of the first sample, and input the sample feature as an input into a prediction layer corresponding to a non-intersecting type in the target model. And obtaining an output result of the prediction layer corresponding to the non-intersection type as a target prediction result of the first sample.

Of course, the classification result may be a probability that the first sample is each predetermined classification. After determining the classification result, the first node may determine a sample feature of the first sample, and input the sample feature as an input to prediction layers corresponding to preset classifications in the target model respectively, so as to obtain prediction results output by the prediction layers respectively. The first node may determine a target prediction result of the first sample according to the probability that the first sample belongs to each preset classification and the prediction result output by the prediction layer of each preset classification (as shown in fig. 2).

Specifically, how to determine the target prediction result of the first sample according to the classification result and the prediction layer corresponding to each prediction classification in the target model, which can be set according to the needs, and this description does not limit the present disclosure.

S108: and training the target model according to the target prediction result of the cross sample, the labeling and the auxiliary labeling of the cross sample, and the target prediction result of the non-cross sample and the labeling thereof.

In one or more embodiments provided herein, after determining and processing the training sample, the first node may train the model based on the annotation of the training sample.

Specifically, for each first sample, if the first sample is a cross sample, the first node may determine a difference between the target prediction result of the first sample and the label of the first sample as a first difference. And determining a gap between the target prediction result of the first sample and the auxiliary label of the first sample as a second gap. And determining a sum of the first gap and the second gap as a first loss.

Meanwhile, if the first sample is a non-intersecting sample, the first node may determine a second loss according to a gap between a target prediction result of the first sample and a label of the first sample.

Finally, the first node may determine a total loss based on the first loss and the second loss, and train the target model with a minimum total loss as an optimization target.

In l ₁ For the first loss, l ₂ For the second loss, there may be a total loss l _z ＝l ₁ +l ₂ Wherein l is ₁ ＝l _τ＝1 (Y _A ,f)+l _τ＝1 (Y _B F), τ=1 to characterize the first sample as a cross sample, Y _A Original annotation for characterizing the first sample, Y _B And the auxiliary label is used for representing the first sample, f is used for representing a target prediction result of the first sample, and l is used for representing a loss function. Likewise, there may also be l ₂ ＝l _τ＝0 (Y _A F), wherein τ=0 is used to characterize the first sample as a non-crossing sample, Y _A The original label used for representing the first sample, f is used for representing the target prediction result of the first sample, and l is used for representing the loss function.

After the training of the target model is completed, the target model is deployed at the first node, so that the first node can directly predict the data to be predicted based on the trained target model, and therefore the trained target model is not required to be transmitted, and the leakage of the model structure and model parameters of the target model is avoided.

The method for predicting the data to be predicted based on the target model can be that the data to be predicted is input into the target model, the classification result of the data to be predicted is determined through the classification layer of the target model, and then the target prediction result corresponding to the data to be predicted is determined through the prediction layer corresponding to each preset classification in the target model and the classification result of the data to be predicted.

Based on the method for training the model shown in fig. 1, the first sample stored in the first node is divided into a cross sample and a non-cross sample, an initial model obtained by training according to the first sample and the second sample in advance is used for determining auxiliary labels of the cross sample, the first sample is input into a classification layer of a target model to be trained, a prediction classification result of the first sample is obtained through the classification result of the first sample and prediction layers corresponding to preset classifications in the target model respectively, and finally distillation training is carried out on the target model based on the original labels and auxiliary labels of the cross sample and the original labels of the non-cross sample. Under the condition that the first sample and the second sample contain fewer cross samples, the accurate target model can be obtained through training based on knowledge of the second sample in the initial model which is completed through training in advance and each first sample stored in the first node, and model training efficiency and accuracy of the target model obtained through training are guaranteed.

That is, the model training method is to train in advance to obtain an initial model based on each first sample stored in the first node and each second sample stored in the second node for determining. And then carrying out distillation learning on the initial model to obtain a target model of the mixed expert network structure. In the case where there are fewer cross samples included in the first and second samples, the training may also be based on knowledge of the second sample in the initial model and the first sample to obtain accurate cross samples. Therefore, the technical problem that a model of the hybrid expert network structure can only be trained based on data of the same dimension, and a target model containing knowledge of a first sample and a second sample cannot be obtained through training is solved. The method also solves the technical problem that the model structure obtained by training cannot accurately learn the knowledge of the second sample due to the fact that only a part of the first samples are provided with auxiliary labels when only the initial model is distilled and learned to obtain the target model and the model structure of the target model is not obtained.

Further, the initial model in step S102 may be trained as follows:

specifically, the first node may be deployed with a complete initial model, and the second node may be deployed with a feature extraction layer of the initial model.

The first node may then determine, for each first sample, a sample feature of the first sample through a feature extraction layer of the initial model.

Meanwhile, the second node may determine, for each second sample, a sample feature of the second sample through a feature extraction layer of the initial model.

The second node may then send the determined sample characteristics for each second sample to the first node.

The first node may receive sample characteristics of each second sample and determine a sample identifier corresponding to each second sample.

The first node may then determine, from among the first and second samples, a first sample and a second sample corresponding to the same user as a first cross sample and a second cross sample based on the sample flag of the second sample and the sample flag of the first sample.

The first node may then take the annotation of the first cross sample as the annotation of the second cross sample. Thus, the first node may have stored therein a first sample with a label and a second cross sample with a label.

Finally, the first node can take each first sample and each second cross sample as samples in a training set for training the initial model, input sample characteristics of each sample in the training set into a prediction layer of the initial model to obtain an initial prediction result output by the initial model, determine loss according to the initial prediction result and labels of the samples in the training set, and adjust model parameters of the initial model according to the loss. Meanwhile, the first node can send the determined loss to the second node, and the second node adjusts the model parameters of the feature extraction layer stored by the second node according to the loss.

The above process is repeated until the preset iteration termination condition is reached, and then an initial model with completed training can be obtained.

Furthermore, the object model in the present specification may further include a feature extraction layer corresponding to each preset classification.

Specifically, the first node may input, for each first sample, the first sample as an input, and respectively input the feature extraction layers corresponding to each preset classification in the target model, so as to obtain each feature of the first sample output by each feature extraction layer.

And then, the first node can combine the determined features, and input the combined result into a classification layer of the target model to obtain a classification result output by the classification layer.

Finally, according to the classification result and each characteristic of the first sample, determining the sample characteristic of the first sample. Taking the probability that the first sample belongs to each preset class as an example, the first node can take the probability that the first sample belongs to each preset class as the weight of the first sample corresponding to the preset class, and then perform weighted average according to the corresponding characteristics of each preset class and the weight of each preset class, and take the result as the sample characteristic of the first sample. Of course, the first node may also determine a sample type corresponding to the first sample according to the classification result, and use the feature output by the feature extraction layer corresponding to the sample type as the sample feature of the first sample. How to determine the sample characteristics of the first sample can be set as needed, which is not limited in this specification.

In addition, in determining the loss, in order to avoid a situation that the difference between the label of the cross sample and the auxiliary label is too large, and the determined first loss is too large, the first node may set weights for the first gap and the second gap. Thus, there may be l ₁ ＝αl _τ＝1 (Y _A ,f)+(1-α)l _τ＝1 (Y _B F), wherein α is used to characterize the weight of the first gap and 1- α is used to characterize the weight of the second gap. The specific values of the weights of the first gap and the second gap may be set as needed, which is not limited in this specification.

Further, in the present specification, for each first sample, if the difference between the classification result of the first sample and the sample type corresponding to the first sample is large, the difference between the corresponding target prediction result and the original label of the first sample is also large. Thus, in determining the total loss of the target model, the first node may also determine the total loss based on the gap between the classification result and the sample type.

Specifically, the first node may determine, for each first sample, a sample type of the first sample according to whether the first sample is a cross sample. Wherein the sample types may include a cross type and a non-cross type.

The first node may then determine a third loss based on a gap between the sample type of the first sample and the classification result of the first sample.

Finally, the first node may determine the total loss based on the first, second, and third determined losses. Let the total loss be l _z ＝l ₁ +l ₂ +l ₃ As an example. Wherein l ₃ ＝-[1 _τ＝1 logP _A +1 _τ＝0 log(1-P _A )]，P _A 1-P for probability that the first sample belongs to the cross type _A The probability that the first sample is of a non-intersecting type.

Furthermore, the data to be predicted can be predicted based on the obtained target model trained by the model training method in the specification.

Specifically, the first node may receive a prediction request and determine data to be predicted carried in the prediction request. Wherein the data to be predicted is similar to the first sample. That is, the target model may be an accurate prediction of similar data to be predicted for the first sample.

Then, the first node can take the data to be predicted as input, and input the data to be predicted into a classification layer of the target model which is trained in advance to obtain a target classification result of the data to be predicted.

And then, the first node can obtain a target prediction result of the data to be predicted according to the target classification result and prediction layers corresponding to preset classifications in the target model.

Finally, the first node may return the target prediction result according to the prediction request.

The above method for model training provided for one or more embodiments of the present specification further provides an apparatus for model training based on the same ideas, as shown in fig. 3.

Fig. 3 is a schematic diagram of a model training apparatus provided in the present specification, where the apparatus is applied to a first node in a federal learning system, the system includes the first node and a second node, the first node stores first samples, the second node stores second samples, and at least a portion of the first samples and at least a portion of the second samples correspond to the same user, where:

a sample determining module 200, configured to determine a cross sample and a non-cross sample from the first samples, where the cross sample is used to characterize that there is a second sample corresponding to the same user as the cross sample.

The labeling determining module 202 is configured to determine an initial model trained based on each first sample and each second sample, and determine, according to the initial model, an auxiliary labeling of each cross sample.

The classification module 204 is configured to input, for each first sample, the first sample into a classification layer of a target model to be trained, and obtain a classification result output by the classification layer, where the classification result is used to characterize whether the first sample is a cross sample.

And the prediction module 206 is configured to determine a target prediction result of the first sample according to the classification result and a prediction layer corresponding to each preset classification in the target model, where each preset classification at least includes a cross type and a non-cross type.

The training module 208 is configured to train the target model according to the target prediction result of the cross sample, the label and the auxiliary label of the cross sample, and the target prediction result of the non-cross sample and the label thereof.

Optionally, the label determining module 202 is configured to receive the sample characteristics of each second sample sent by the second node, determine, according to the sample identifier of each first sample and the sample identifier of each second sample, from the each first sample and each second sample, a first sample and a second sample corresponding to the same user as a first cross sample and a second cross sample, and use a label of the first cross sample as a label of the second cross sample, and train the initial model through the each first sample and the label thereof, and the second cross sample and the label thereof, to obtain a trained initial model.

Optionally, the prediction module 206 is configured to determine a sample feature of the first sample, input the sample feature into prediction layers corresponding to preset classifications in the target model, obtain prediction results of the first sample output by the prediction layers, and determine a target prediction result of the first sample according to the classification result and the prediction results of the first sample, where the classification result is used to characterize a probability that the first sample belongs to the preset classifications.

Optionally, the prediction module 206 is configured to obtain, through the feature extraction layers corresponding to the preset classifications in the target model, each feature of the first sample output by each feature extraction layer, and determine, according to the classification result and each feature of the first sample, a sample feature of the first sample, where the target model further includes a feature extraction layer corresponding to each preset classification.

Optionally, the training module 208 is configured to determine, for each first sample, a first loss according to a difference between the target prediction result of the first sample and the label of the first sample and a difference between the target prediction result of the first sample and the label of the first sample if the first sample is a cross sample, determine a second loss according to a difference between the target prediction result of the first sample and the label of the first sample if the first sample is a non-cross sample, determine a total loss according to each determined first loss and each determined second loss, and train the target model with the total loss being the minimum as an optimization target.

Optionally, the training module 208 is configured to determine, for each first sample, a sample type of the first sample according to whether the first sample is a cross sample, determine a third loss according to a gap between the sample type of the first sample and a classification result of the first sample, and determine a total loss according to each determined first loss, each determined second loss, and each determined third loss.

Optionally, the prediction module 206 is configured to receive a prediction request, determine data to be predicted carried in the prediction request, input the data to be predicted into a classification layer of a target model that is trained in advance, obtain a target classification result of the data to be predicted, obtain a target prediction result of the data to be predicted according to the target classification result and prediction layers corresponding to preset classifications in the target model, and return the target prediction result according to the prediction request.

The specification also provides a computer readable storage medium storing a computer program operable to perform the method of model training described above.

The present specification also provides a schematic structural diagram of the electronic device shown in fig. 4. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as described in fig. 4, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to realize the model training method. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing nodes that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage nodes.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A method of model training, the method being applied to a first node in a federal learning system, the system comprising the first node storing first samples and a second node storing second samples, at least a portion of the first samples and at least a portion of the second samples corresponding to the same user, the method comprising:

2. The method of claim 1, determining an initial model trained based on the first samples and the second samples, specifically comprising:

receiving sample characteristics of each second sample sent by the second node;

determining a first sample and a second sample corresponding to the same user from the first samples and the second samples according to the sample identification of the first samples and the sample identification of the second samples, and marking the first cross samples as marks of the second cross samples;

And training the initial model through the first samples and the labels thereof and the second cross samples and the labels thereof to obtain a trained initial model.

3. The method of claim 1, wherein determining the target prediction result of the first sample according to the classification result and the prediction layer corresponding to each preset classification in the target model, specifically includes:

determining sample characteristics of the first sample, and respectively inputting the sample characteristics into prediction layers corresponding to preset classifications in the target model to obtain prediction results of the first sample output by the prediction layers;

and determining a target prediction result of the first sample according to the classification result and each prediction result of the first sample, wherein the classification result is used for representing the probability that the first sample belongs to each preset classification.

4. The method of claim 3, wherein the target model further comprises a feature extraction layer corresponding to each preset classification;

determining a sample characteristic of the first sample, comprising:

obtaining each feature of the first sample output by each feature extraction layer through the feature extraction layer corresponding to each preset classification in the target model;

And determining sample characteristics of the first sample according to the classification result and each characteristic of the first sample.

5. The method according to claim 1, training the target model according to the target prediction result of the cross sample, the labeling and auxiliary labeling of the cross sample, and the target prediction result of the non-cross sample and the labeling thereof, specifically comprising:

for each first sample, if the first sample is a cross sample, determining a first loss according to a gap between a target prediction result of the first sample and a label of the first sample and a gap between the target prediction result of the first sample and the label of the first sample;

if the first sample is a non-intersecting sample, determining a second loss according to a gap between a target prediction result of the first sample and a label of the first sample;

and determining total loss according to the determined first loss and the second loss, and training the target model by taking the minimum total loss as an optimization target.

6. The method of claim 5, wherein determining the total loss based on the determined first and second losses comprises:

For each first sample, determining the sample type of the first sample according to whether the first sample is a cross sample;

determining a third loss based on a gap between a sample type of the first sample and a classification result of the first sample;

and determining the total loss according to the determined first loss, the determined second loss and the determined third loss.

7. The method of claim 1, the method further comprising:

receiving a prediction request and determining data to be predicted carried in the prediction request;

inputting the data to be predicted into a classification layer of a target model which is trained in advance to obtain a target classification result of the data to be predicted;

obtaining a target prediction result of the data to be predicted according to the target classification result and prediction layers corresponding to preset classifications in the target model respectively;

and returning the target prediction result according to the prediction request.

8. An apparatus for model training, the apparatus being applied to a first node in a federal learning system, the system comprising the first node and a second node, the first node storing first samples and the second node storing second samples, at least a portion of the first samples and at least a portion of the second samples corresponding to the same user, the apparatus comprising:

the annotation determining module is used for determining an initial model obtained by training based on each first sample and each second sample, and determining auxiliary annotations of the crossed samples through the initial model;

the prediction module is used for determining a target prediction result of the first sample according to the classification result and a prediction layer corresponding to each preset classification in the target model, wherein each preset classification comprises a cross type and a non-cross type;

9. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-7 when the program is executed.