CN111461215A

CN111461215A - Multi-party combined training method, device, system and equipment of business model

Info

Publication number: CN111461215A
Application number: CN202010244168.2A
Authority: CN
Inventors: 蒋晨之
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-07-28
Anticipated expiration: 2040-03-31
Also published as: CN111461215B

Abstract

The present specification provides a method, an apparatus, a system and a device for multi-party joint training of a business model, which includes: in the process of multi-party combined training of the business model, the data holders cut off gradients of the iteration and then send the gradients to the collaborators, the collaborators calculate first target gradients based on the cut-off gradients sent by the data holders, noise is added into the first target gradients to obtain second target gradients, and the data holders determine model parameters of the iteration based on the second target gradients and model parameters of the iteration of the last time. The calculated gradient is cut off by the data holder, and noise is added to the first target gradient obtained by gathering of the cooperative party, so that personal data can be prevented from being leaked in a differential privacy mode in the model training process.

Description

Multi-party combined training method, device, system and equipment of business model

Technical Field

The present disclosure relates to the field of computer communications, and in particular, to a method, an apparatus, a system, and a device for joint multi-party training of a business model.

Background

Model training is an important component in artificial intelligence technology. Model training typically relies on a sample of the user. In general, the more samples of a user, the richer the feature dimension, and the higher the accuracy of the trained model.

However, the samples held by a single data holder are limited, thus hindering model training. In order to improve the accuracy of model training, federal learning is proposed in the industry. Federal learning proposes that multiple parties jointly train a business model by using respective samples. Federal learning includes both horizontal federal learning and vertical federal learning.

The horizontal federal learning system comprises a plurality of data holders and collaborators, and gradient interaction is needed between each data holder and the collaborators to realize iterative updating of model parameters of the business model. However, since the gradient is calculated based on the user privacy data, it is easy to acquire the user privacy data by inverting the gradient, resulting in leakage of the user privacy data.

Disclosure of Invention

In view of this, the present specification provides a multi-party joint training method, apparatus, system and device for a business model.

Specifically, the description is realized by the following technical scheme:

according to a first aspect of the present specification, there is provided a multi-party joint training method of a business model, the multiple parties including: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the method is applied to any data holder, and comprises the following steps:

and circularly iterating the model parameters of the following service models until an iteration stop condition is met:

selecting a target sample from the locally held samples;

determining the gradient of the current iteration of the service model based on the target sample, the model parameters of the previous iteration of the service model and a preset gradient algorithm;

truncating the gradient to generate a truncated gradient, sending the truncated gradient to the cooperator, determining a first target gradient by the cooperator based on the truncated gradient sent by each data holder, and adding a preset type of noise in the first target gradient to generate a second target gradient;

and receiving a second target gradient sent by the cooperative party, and determining the model parameters of the current iteration of the business model based on the second target gradient and the model parameters of the last iteration of the business model.

According to a second aspect of the present specification, there is provided a multi-party joint training method of a business model, the multiple parties including: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the method is applied to the collaborators, and comprises the following steps:

determining a first target gradient based on the truncation gradients sent by the data holders; the truncation gradient is calculated by each data holder based on a target sample selected from the samples held by the data holder, the model parameter of the last iteration of the business model and a preset gradient algorithm;

adding a preset type of noise in the first target gradient to generate a second target gradient;

and sending the second target gradient to each data holder so that each data holder determines the model parameters of the current iteration of the business model based on the second target gradient and the model parameters of the last iteration of the business model.

According to a third aspect of the present specification, there is provided a multi-party joint training system of a business model, the multiple parties including: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the samples held by each data holder are not identical, and the characteristic dimensions of the samples held by each data holder are identical;

the data holder is used for selecting a target sample from the locally held samples; determining the gradient of the current iteration of the service model based on the target sample, the model parameters of the previous iteration of the service model and a preset gradient algorithm; truncating the gradient to generate a truncated gradient, and sending the truncated gradient to the cooperator;

the cooperative party is used for determining a first target gradient based on the truncation gradients sent by the data holders; adding a preset type of noise in the first target gradient to generate a second target gradient; sending the second target gradient to each data holder;

and the data holder is used for receiving a second target gradient sent by the cooperative party, determining the model parameters of the current iteration of the business model based on the second target gradient and the model parameters of the last iteration of the business model, detecting whether iteration stopping conditions are met, if not, returning to the step of selecting a target sample from the locally held samples, if so, stopping the iteration of the model parameters of the business model, and determining that the training of the business model is finished.

According to a fourth aspect of the present specification, there is provided a multi-party joint training apparatus of a business model, the multiple parties including: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the method is applied to any data holder, and comprises the following steps:

a selection module that selects a target sample from locally held samples;

the determining module is used for determining the gradient of the current iteration of the business model based on the target sample, the model parameters of the previous iteration of the business model and a preset gradient algorithm;

the truncation module is used for truncating the gradient to generate a truncation gradient, sending the truncation gradient to the cooperative party, determining a first target gradient by the cooperative party based on the truncation gradient sent by each data holder, and adding noise of a preset type in the first target gradient to generate a second target gradient;

and the updating module is used for receiving a second target gradient sent by the cooperative party and determining the model parameters of the current iteration of the business model based on the second target gradient and the model parameters of the last iteration of the business model.

According to a fifth aspect of the present specification, there is provided a multi-party joint training apparatus of a business model, the multiple parties including: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the method is applied to the collaborators, and the samples held by each data holder are not identical and the characteristic dimensions of the samples held by each data holder are identical, and the device comprises:

the determining module is used for determining a first target gradient based on the truncation gradients sent by the data holders; the truncation gradient is calculated by each data holder based on a target sample selected from the samples held by the data holder, the model parameter of the last iteration of the business model and a preset gradient algorithm;

the generation module is used for adding preset type noise in the first target gradient to generate a second target gradient;

and the sending module is used for sending the second target gradient to each data holder so that each data holder determines the model parameters of the current iteration of the business model based on the second target gradient and the model parameters of the last iteration of the business model.

According to a sixth aspect of the present specification, there is provided an electronic apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the method of the first aspect by executing the executable instructions.

According to a seventh aspect of the present description, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the method of the first aspect.

According to an eighth aspect of the present specification, there is provided an electronic apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the method of the second aspect by executing the executable instructions.

According to a ninth aspect of the present description, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the method of the second aspect.

It can be seen from the above description that, in the process of multi-party joint training of a business model, the data holder and the collaborators do not directly interact with each other in gradient, but the data holder truncates the gradient of the current iteration and sends the truncated gradient to the collaborators, and the collaborators calculate a first target gradient based on the truncated gradient sent by each data holder, and add noise to the first target gradient to obtain a second target gradient, so that each data holder determines a model parameter of the current iteration based on the second target gradient and a model parameter of the previous iteration. The calculated gradient is cut off by the data holder, and noise is added to the first target gradient obtained by the cooperation party in a summary mode, so that the user privacy data are protected from being leaked in a differential privacy mode.

Drawings

FIG. 1 is a diagram of a multi-party joint training system for a business model, shown in an exemplary embodiment of the present description;

FIG. 2 is an interaction diagram illustrating a multi-party joint training method for a business model in an exemplary embodiment of the present specification;

FIG. 3 is a flow chart illustrating a method for multi-party joint training of a business model in an exemplary embodiment of the present description;

FIG. 4 is a flow chart illustrating a method for multi-party joint training of a business model in an exemplary embodiment of the present description;

FIG. 5 is an interaction diagram illustrating another method of multiparty joint training of business models in accordance with an exemplary embodiment of the present description;

FIG. 6 is a diagram illustrating a hardware configuration of an electronic device in accordance with an exemplary embodiment of the present disclosure;

FIG. 7 is a block diagram of a multi-party joint training facility for a business model according to an exemplary embodiment of the present disclosure;

FIG. 8 is a block diagram of a multi-party joint training apparatus for another business model, as shown in an exemplary embodiment of the present specification.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The description aims to provide a privacy protection method based on differential privacy, in the process of multi-party joint training of a business model, a data holder and a cooperative party do not directly interact gradients, the data holder cuts the gradient of the current iteration and then sends the cut gradient to the cooperative party, the cooperative party calculates a first target gradient based on the cut gradient sent by each data holder, noise is added into the first target gradient to obtain a second target gradient, and each data holder determines the model parameters of the current iteration based on the second target gradient and the model parameters of the previous iteration. The calculated gradient is cut off by the data holder, and noise is added to the first target gradient obtained by the cooperation party in a summary mode, so that the user privacy data are protected from being leaked in a differential privacy mode.

Before introducing the business model training method provided by the present specification, concepts related to the present specification will be introduced.

1) Business model

And (3) a service model: refers to a model in which business processes can be performed. The business model may be a neural network, a deep learning model, or the like. Of course, the business model may also be other linear models that can be supervised trained, and so on, and here, the business model is only exemplarily illustrated and is not specifically limited.

Furthermore, from a business perspective, the business model may be an object prediction model, an object classification model, an object recognition model, and so forth. And is not particularly limited herein.

For example, when the business model is an object classification model, the business model may check the identity information of the user to determine whether the user is a valid user or an invalid user.

Furthermore, the business model may also be a sensor model. A sensor model means that the data input by the model is sensor data. The sensor data may originate from a sensor of the mobile device, or a sensor of the wearable device. The source of the sensor data is not specifically limited herein.

3) Sample object, sample

In this embodiment, the sample object may be a user, or may be another object, for example, in the field of security, the sample object may be a vehicle such as a vehicle, and the like. The sample object is only exemplified here and is not particularly limited.

A sample, which may also be referred to as sample data, is the relevant data describing the sample object. The sample data includes a plurality of feature dimensions.

For example, assuming that the sample object is a user, the characteristic dimensions of the sample data of the sample object may include: age, gender, occupation, academic history of the user, user operational behavior, and the like. The feature dimensions are merely exemplary and are not specifically limited.

In addition, in this specification, the sample may be user data, vehicle data, or the like, and the sample data is only exemplary and not particularly limited.

Referring to fig. 1, fig. 1 is a schematic diagram of a multi-party joint training system of a business model according to an exemplary embodiment of the present disclosure.

The multi-party joint training system of the business model comprises: a plurality of data holders and collaborators;

1) data holder

Each data holder is configured with a complete business model. Each data holder holds a sample and a label.

The samples held by the data holders are not exactly the same, but the characteristic dimensions of the samples held by the data holders are the same. In other words, the characteristic dimensions of the samples held by the data holders are the same but the corresponding users are different.

For example, the data holder is a bank of multiple different regions. The user groups of the banks in different regions come from the regions where the banks are located respectively, so that the intersection of the user groups of the banks is very small, but the businesses of the banks are very similar, so that the characteristic dimensions of user data (namely samples) held by the banks are the same, but the user groups are not completely the same.

The data holder may be a server, a server cluster, a data center, a computer device, or a virtual machine in a server, and the like, and the hardware entity of the data holder is not specifically limited herein.

2) Collaborator

The collaborator may be a cloud server, a server cluster, a data center, a computer device independent of each data holder, or a virtual machine in a server, etc., and the hardware entity of the data holder is not specifically limited herein.

In the embodiment of the present specification, the data holder calculates the gradient of the current iteration based on the model parameter of the last iteration of the service model, the target sample selected by the current iteration, and a preset gradient algorithm, truncates the gradient, and sends the truncated gradient to the cooperator.

And the cooperative party receives the truncation gradients sent by the data holders, determines a first target gradient based on the truncation gradients sent by the data holders, and adds preset type noise in the first target gradient to obtain a second target gradient. Then, the cooperative party can send the second target gradient to the data holder, and the data holder obtains the model parameter of the current iteration based on the second target gradient and the model parameter of the previous iteration.

Referring to fig. 2, fig. 2 is an interaction diagram of a multi-party joint training method of a business model according to an exemplary embodiment of the present disclosure, where the multiple parties include: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the samples held by each data holder are not identical and the characteristic dimensions of the samples held by each data holder are identical, the method may include the steps shown below.

It should be noted that the data holder and the data collaborator perform multiple iterations of the model parameters according to the steps shown in step 202 to step 214 until the iteration stop condition is satisfied. Steps 202 to 216 are an iterative process.

Step 202: the data holder selects a target sample from the locally held samples.

In this implementation, the data holder may use the data acquired by the sensor of the terminal device of the user (the user data acquisition mode is not specifically limited here) as the sample held by the data holder. The data holder may store the sample in a local database, or in a database of another device. In using the locally held sample, it may be retrieved from a local database or database of another device. Here, the storage location of the locally held sample is merely exemplified and not particularly limited.

In embodiments of the present specification, at each iteration, the data holder may select a target sample from the locally held samples.

In an alternative implementation, the data holder may randomly select at least one sample from the locally held samples as the target sample.

Of course, the data holder may also select a target sample from the locally held samples according to preset selection rules. For example, the data holder may select the target sample in a round robin selection. The manner in which the data holder selects the target sample is not specifically limited here.

Step 204: and the data holder determines the gradient of the current iteration of the service model based on the target sample, the model parameters of the last iteration of the service model and a preset gradient algorithm.

In the embodiment of the present specification, the data holder is configured with not only the business model but also the loss function of the business model.

The data holder can obtain a gradient formula based on a loss function of the service model and a preset gradient algorithm.

For example, when the gradient algorithm is a gradient descent method, the data holder obtains the gradient formula by using a loss function to calculate a partial derivative of the model parameter. Of course, the gradient algorithm may be other algorithms, and the gradient algorithm is only exemplified and not particularly limited herein.

After obtaining the gradient formula, the data holder may input the model parameters and the target samples of the last iteration of the service model into the gradient formula to obtain the gradient of the current iteration.

Step 206: the data holder truncates the gradient to generate a truncated gradient.

In the embodiment of the description, in order to protect the private data of the user, the data holder does not directly send the calculated gradient of the iteration to the collaborator, but cuts the gradient of the iteration and sends the generated cut gradient to the collaborator, thereby avoiding leakage of the private data of the user due to reverse inference of the private data of the user through the gradient.

In an alternative implementation, the data holder may implement truncation of the gradient by a gradient truncation technique. Of course, the data holder may perform the truncation by using other truncation methods, which are only exemplary and not specifically limited.

Taking gradient truncation as an example:

the data holder may calculate a norm of the gradient according to a preset norm type, and then determine a truncated gradient according to a preset boundary value truncated by the gradient and the calculated norm.

For example, the norm type may be L2 norm here is only exemplary and is not specifically limited.

The "determining the truncation gradient according to the preset boundary value of the gradient truncation and the calculated norm" may be implemented according to an existing gradient truncation manner, and will not be described herein again.

Step 208: and the data holding direction sends the truncation gradient to the collaborator.

Step 210: the collaborator determines a first target gradient based on the truncated gradients sent by the data holders.

In this specification embodiment, the collaborator may receive the truncation gradient sent by each data owner. The collaborator may determine a first target gradient for updating the model parameters based on the truncated gradient sent by each data holder.

In an alternative determination manner, the collaborator may obtain the number of target samples selected by each data holder. The collaborator can determine the total number of the target samples of each data holder based on the acquired number of the target samples selected by each data holder. In addition, the collaborator may also sum up the truncation gradients of the data holders. The collaborator may determine the first target gradient based on the sum of the truncated gradients of each data holder, the total number of target samples for each data holder.

For example, the data holder may determine the first target gradient by the following formula.

Wherein, Delta_tIs a first target gradient;

|B_tl is the total number of target samples of each data holder;

Δ_c' is the truncated gradient sent by the c-th data holder.

Step 212: and the cooperative party adds preset type noise in the first target gradient to generate a second target gradient.

Wherein the preset type of noise may be gaussian noise. Of course, in practical applications, other noises such as laplacian noise may also be used, and the type of the noise is only illustrated by way of example and is not particularly limited.

In an alternative implementation, the cooperator may randomly generate a preset type of noise, and add the noise to the first target gradient to generate the second target gradient.

In order to better adapt the second target gradient to the update of the model parameters, the collaborator may also generate the second target gradient in another way. Specifically, a cooperator pre-configures a noise factor set, the noise factor set comprises a plurality of noise factors, the cooperator can generate a plurality of noises according to the plurality of noise factors, each noise is respectively added to a first target gradient to obtain a plurality of noise gradients, then the selected probability of the noise gradients is calculated to determine which noise gradient has a better effect on the iteration of the model parameters, and then the noise gradient with a good iteration effect is used as a second target gradient.

The second implementation is described below with reference to steps 2121 to 2123.

Step 2121: the cooperator may add noise corresponding to each preset noise factor in the first target gradient, respectively, to obtain a noise gradient set.

When the method is implemented, a cooperator pre-configures a noise factor set, the noise factor set comprises a plurality of noise factors, and the cooperator can generate a plurality of noises according to the plurality of noise factors and respectively add the noises to the first target gradient to obtain a plurality of noise gradients. The plurality of noise gradients constitute a set of noise gradients.

For example, assume that the noise factor set includes two noise factors, noise factor 1 and noise factor 2, respectively.

The cooperator may generate noise 1 corresponding to the noise factor 1, and then add the noise 1 to the first target gradient, resulting in a noise gradient 1.

The cooperator may generate noise 2 corresponding to the noise factor 2 and then add the noise 2 to the first target gradient, resulting in a noise gradient 2.

Alternatively, in this embodiment of the present specification, the cooperator may implement the calculation of the noise gradient set by the following formula:

Ω_Δ＝{Δ'＝Δ_t+N(0,σ²I)，σ∈Ω_σ}；

wherein omega_zA preset noise factor set; z is a noise factor in the set of noise factors;

C₁is a preset boundary value of gradient truncation;

σ is the intermediate result, Ω_σAn intermediate result set;

|B_tl is the total number of target samples of each data holder;

N(0,σ²I) to expect 0, the variance is σ²Normal distribution of I; i is an identity matrix;

Ω_Δfor the set of noise gradients corresponding to each noise factor, Δ' is the noise gradient.

Of course, in practical applications, the collaborator may also use other ways to obtain the noise gradient set, and the way to obtain the noise gradient set is not specifically limited here.

Step 2122: the cooperator calculates a selected probability of each noise gradient in the set of noise gradients

The implementation mode is as follows: the collaborator can realize the determination of the selected probability of the noise gradient through the interaction with each data holder.

When implemented, the collaborator may send the set of noise gradients to the data holder. Each data holder may calculate a loss function value corresponding to each noise gradient in the set of noise gradients based on the set of noise gradients, the model parameters of the last iteration, and the local target sample, and send the loss function value corresponding to each noise gradient to the cooperator. The cooperator determines the selected probability of each noise gradient based on the loss function value corresponding to each noise gradient transmitted by each data holder.

The first implementation of this implementation will be described below with reference to steps a to C.

Step A, the collaborator can send the noise gradient set to the data holder.

And step B, each data holder can calculate the loss function value corresponding to each noise gradient in the noise gradient set based on the noise gradient set, the model parameter of the last iteration, the local target sample and the label thereof, and sends the loss function value corresponding to each noise gradient to the cooperator.

During implementation, each data holder determines a temporary model parameter for each noise gradient in the noise gradient set based on the model parameter of the last iteration and the noise gradient, and substitutes the temporary model parameter, a local target sample and a label thereof into a loss function of the service model to obtain a loss function value corresponding to the noise gradient.

For example, assume that there are 3 noise gradients, noise gradient 1, noise gradient 2, and noise gradient 3, respectively.

For the noise gradient 1, the data holder may determine a temporary model parameter 1 based on the model parameter of the last iteration of the business model and the noise gradient 1, and substitute the temporary model parameter 1, the local target sample and the label thereof into the loss function of the business model to obtain a loss function value 1 corresponding to the noise gradient 1.

Similarly, the data holder can also obtain a loss function value 2 corresponding to the noise gradient 2 and a loss function value corresponding to the noise gradient 3.

Alternatively, the data holder may obtain the loss function value corresponding to each noise gradient by the following formula.

Ω_li＝{l_i＝L(d_i,θ+Δ')；Δ'∈Ω_Δ}；

Wherein omega_ΔA set of noise gradients, Δ' noise gradients;

d_iselecting a target sample for the ith data holder; theta is a model parameter of the last iteration, and theta + delta' is a temporary model parameter;

l () is a loss function;

l_ia loss function value calculated for the ith data holder; omega_liA set of penalty function values corresponding to each noise gradient calculated for the ith data holder.

In this embodiment, the data holder may calculate a loss function value corresponding to each noise gradient and then may transmit the loss function value corresponding to each noise gradient to the partner.

And C, the cooperative party determines the selected probability of each noise gradient based on the loss function value which is sent by each data holding party and corresponds to each noise gradient.

For example, assume that there are two data holders, data holder 1 and data holder 2, respectively.

Assume that the set of noise gradients includes: noise gradient 1, noise gradient 2, and noise gradient 3.

It is assumed that the data holders transmit noise gradients as shown in table 1.

Data holder	Noise gradient 1	Noise gradient 2	Noise gradient 3
				Data holder 1	Value of loss function 11	Value of loss function 12	Loss function value 13
Data holder 2	Loss function value 21	Loss function value 22	Loss function value 23

TABLE 1

After receiving the loss function values corresponding to the noise gradients sent by the data holders, the cooperator may determine the probability of selection of each noise gradient based on the loss function values corresponding to the noise gradients sent by the data holders.

Specifically, the cooperator may determine the chosen probability of the noise gradient 1 based on the loss function values 11 and 12.

The cooperator may determine a chosen probability of the noise gradient 2 based on the loss function values 12 and 22.

The cooperator may determine the chosen probability of the noise gradient 3 based on the loss function values 13 and 23.

A specific implementation of "determining the probability of selection of each noise gradient based on the loss function value corresponding to each noise gradient transmitted by each data holder" will be described below.

In this case, for each noise gradient in the set of noise gradients, the cooperator may determine a total loss function value corresponding to the noise gradient based on the loss function value of the noise gradient sent by each data holder, truncate the total loss function value, and determine the probability of selection of the noise gradient based on the truncated total loss function value.

For example, the selection probability of the noise gradient 1 in table 1 is still described as an example.

The cooperator may determine the total loss function value 1 corresponding to the noise gradient 1 based on the loss function value 11 corresponding to the noise gradient 1 transmitted from the data holder 1 and the loss function value 21 corresponding to the noise gradient 1 transmitted from the data holder 2, truncate the total loss function value 1, and determine the selected probability 1 of the noise gradient 1 based on the truncated total loss function value 1.

Optionally, in this embodiment of the present specification, when determining a total loss function value corresponding to the noise gradient based on the loss function value of the noise gradient sent by each data holder, the cooperator may add the loss function values of the noise gradient sent by each data holder to obtain the total loss function value corresponding to the noise gradient.

When the total loss function value corresponding to the noise gradient is truncated, the cooperator may truncate the total loss function value corresponding to each noise gradient based on the following formula:

Ω_u＝{u＝-ClipL,C₂；L∈Ω_L}；

where L is the total loss function value, Ω_LA set of total loss function values for each noise gradient.

C₂Is a preset cutoff boundary value, u is a total loss function value after cutoff, omega_uIs the set of truncated total loss function values.

In this embodiment, after the total loss function value corresponding to each noise gradient is truncated, the cooperator may determine the selected probability of each noise gradient based on the truncated total loss function corresponding to each noise gradient.

For example, the collaborator may determine the selected probability for each noise gradient based on the following formula.

∈ is a preset value, ∈ is ∈ defined in the differential privacy technology;

P is the selected probability, Ω_pIs a set of selected probabilities for each noise gradient.

It should be noted that, a temporary model is generated based on each noise gradient and the model parameter of the last iteration, the service processing effect of the temporary model corresponding to each noise gradient is evaluated through a loss function, and based on the evaluation result (i.e. the total loss function value), the selection probability of each noise gradient is determined, so that the selection probability of each noise gradient is associated with the service processing effect of the service model, and a second target gradient selected by a cooperative party based on the selection probability can cause the service model to be updated iteratively along the direction with good service effect.

Step 2123: and the cooperative party determines a second target gradient according to the selected probability of each noise gradient.

In this embodiment, after determining the selection probability of each noise gradient, the cooperator may select, as the second target gradient, a noise gradient with the highest selection probability in the noise gradient set.

Of course, the collaborator may select the second target gradient in other selection manners, such as selecting the noise gradient with the second highest probability as the second target gradient, which is only illustrated by way of example and is not specifically limited.

Step 214: the cooperative party sends the second target gradient to each data holder;

step 216: and the data holder determines the model parameters of the current iteration of the business model based on the second target gradient and the model parameters of the last iteration of the business model.

In this embodiment of the present specification, after receiving the second target gradient sent by the collaborator, the data holder may determine a model parameter of the current iteration of the business model based on the second target gradient and the model parameter of the last iteration of the business model.

Then, the data holder may detect whether an iteration stop condition is satisfied, stop the iteration if the iteration stop condition is satisfied, and determine that the service model training is completed, and if the iteration stop condition is not satisfied, return to step 202 to continue the next iteration.

Wherein the iteration stop condition comprises: converging the service model; alternatively, the first and second electrodes may be,

the iteration times exceed a preset iteration time threshold.

Specifically, after determining the model parameters of the current iteration of the business model, the data holder can detect whether the business model is converged, and if the business model is converged, the data holder stops the iteration and determines that the training of the business model is completed. If the business model does not converge, the process returns to step 202 for the next iteration.

Of course, after determining the model parameters of the current iteration of the service model, the data holder may also detect whether the iteration number exceeds a preset iteration number threshold. And if the iteration times exceed a preset iteration time threshold, stopping iteration and determining that the training of the service model is finished. If the iteration number does not exceed the preset iteration number threshold, the process returns to step 202 to perform the next iteration.

The iteration stop condition is only exemplarily described here, and is not particularly limited.

It can be known from the above description that, in the process of multi-party joint training of the business model, the data holders and the collaborators do not directly interact with the gradient, but the data holders cut the gradient of the current iteration and send the cut gradient to the collaborators, the collaborators calculate the first target gradient based on the cut gradient sent by the data holders, and add noise to the first target gradient to obtain the second target gradient, so that the data holders determine the model parameters of the current iteration based on the second target gradient and the model parameters of the previous iteration. The calculated gradient is cut off by the data holder, and noise is added to the first target gradient obtained by the cooperation party in a summary mode, so that the user privacy data are protected from being leaked in a differential privacy mode.

Referring to fig. 3, fig. 3 is a flowchart illustrating a multi-party joint training method for a business model according to an exemplary embodiment of the present disclosure, where the multiple parties include: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the method is applied to any data holder, and the method can comprise the following steps.

Step 302: the data holder selects a target sample from the locally held samples;

step 304: the data holder determines the gradient of the current iteration of the business model based on the target sample, the model parameters of the last iteration of the business model and a preset gradient algorithm;

step 306: the data holders cut the gradient to generate a cut gradient, the cut gradient is sent to the cooperative party, the cooperative party determines a first target gradient based on the cut gradient sent by each data holder, and noise of a preset type is added in the first target gradient to generate a second target gradient;

step 308: and the data holder receives a second target gradient sent by the cooperative party, and determines the model parameters of the current iteration of the business model based on the second target gradient and the model parameters of the last iteration of the business model.

It should be noted that the data holder may loop through steps 302 to 308 until the iteration stop condition is satisfied. Wherein the iteration stop condition may include: and the convergence of the service model or the iteration times exceed a preset threshold value.

After the data holder executes step 308, it may detect whether an iteration stop condition is satisfied, if not, return to step 302, and if the iteration stop condition is satisfied, stop the iteration, and determine that the service model training is completed.

In addition, it should be further noted that, for the specific implementation process of the above step 302 to step 308, refer to the above step 202 to step 216, which is not described herein again.

Referring to fig. 4, fig. 4 is a flowchart illustrating a multi-party joint training method for a business model according to an exemplary embodiment of the present disclosure, where the multiple parties include: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the samples held by each data holder are not identical and the characteristic dimensions of the samples held by each data holder are identical, and the method is applied to the collaborators and can comprise the following steps.

Step 402: the cooperative party determines a first target gradient based on the truncation gradient sent by each data holder; the truncation gradient is calculated by each data holder based on a target sample selected from local held samples, a model parameter of the last iteration of the service model and a preset gradient algorithm;

step 404: the cooperative party adds preset type noise in the first target gradient to generate a second target gradient;

step 406: and the cooperative party sends the second target gradient to each data holder so that each data holder determines the model parameters of the current iteration of the business model based on the second target gradient and the model parameters of the last iteration of the business model.

It should be noted that, for the implementation process from step 402 to step 406, reference may be made to step 202 to step 216, which is not described herein again.

Referring to fig. 5, fig. 5 is an interaction diagram of a multiparty joint training method of another business model according to an exemplary embodiment of the present disclosure.

Step 501: the data holder randomly selects at least one sample from the locally held samples as a target sample.

Step 502: and the data holder determines the gradient of the current iteration of the service model based on the target sample, the model parameters of the last iteration of the service model and a preset gradient algorithm.

The gradient algorithm may be a gradient descent method, or may be other gradient algorithms. The gradient algorithm is not specifically limited herein

When the gradient algorithm is a gradient descent method, the data holder adopts a mode of partial derivatives of model parameters by loss functions to obtain a gradient formula. Then, the data holder can substitute the model parameters and the target samples of the last iteration of the service model into the gradient formula to obtain the gradient of the current iteration.

Step 503: the data holder performs gradient truncation on the gradient to generate a truncated gradient.

When the data holder is realized, the data holder cuts the gradient based on a preset gradient cutting boundary value and the L2 norm of the calculated gradient to obtain a cutting gradient.

Step 504: the data-holding party sends a truncation gradient to the collaborator.

Step 505: and the cooperative party determines a first target gradient based on the sum of the truncation gradients sent by the data holders and the sum of the number of target samples sent by the data holders.

When implemented, the collaborator may implement step 505 based on the following formula:

wherein, Delta_tIs a first target gradient;

|B_tl is the sum of the number of target samples of each data holder;

Δ_c' is the truncated gradient sent by the c-th data holder.

Step 506: and the cooperative party adds noise corresponding to each preset noise factor in the first target gradient respectively to obtain a noise gradient set.

For example, assume that the noise factor set includes three noise factors, noise factor 1, noise factor 2, and noise factor 3, respectively.

The cooperator may generate noise 3 corresponding to the noise factor 3 and then add the noise 3 to the first target gradient, resulting in a noise gradient 3.

Specifically, the cooperator may add noise corresponding to each preset noise factor in the first target gradient, respectively, to obtain a noise gradient set, according to the following formula:

Ω_Δ＝{Δ'＝Δ_t+N(0,σ²I)，σ∈Ω_σ}；

C₁is a preset boundary value of gradient truncation;

σ is the intermediate result, Ω_σAn intermediate result set;

|B_tl is the total number of target samples of each data holder;

Step 507: the collaboration direction data holder sends a set of noise gradients.

Step 508: and the data holder calculates loss function values corresponding to the noise gradients in the noise gradient set based on the noise gradient set, the model parameters of the last iteration, local target samples and labels thereof.

For example: assume that the noise gradient set includes 3 noise gradients, noise gradient 1, noise gradient 2, and noise gradient 3, respectively.

For the noise gradient 1, the data holder may determine a temporary model parameter 1 based on the model parameter of the last iteration of the business model and the noise gradient 1, and input the temporary model parameter 1, the local target sample and the label thereof into the loss function of the business model to obtain a loss function value 1 corresponding to the noise gradient 1.

Similarly, the data holder can also obtain a loss function value 2 corresponding to the noise gradient 2 and a loss function value 3 corresponding to the noise gradient 3.

Step 509: the data holder transmits the loss function value of each noise gradient to the cooperator.

Step 510: and the cooperative party accumulates the loss function values sent by the data holders according to each noise gradient to obtain a total loss function value of the noise gradient, cuts the total loss function value, and determines the selected probability of the noise gradient based on the cut total loss function value.

Assume that the noise gradient set includes a noise gradient 1, a noise gradient 2, and a noise gradient 3.

Assume that the data holders send noise gradients as shown in table 2.

TABLE 2

For the noise gradient 1, the cooperator adds the loss function values sent by the respective data holders (i.e., calculates the sum of the loss function value 11 and the loss function value 21), resulting in a total loss function value 1. Then, the data holder truncates the total loss function value 1, and determines the selected probability 1 of the noise gradient 1 based on the truncated total loss function value 1.

Similarly, the cooperator may also determine the probability 2 of selection of the noise gradient 2 and the probability 3 of selection of the noise gradient 3 according to the above manner.

In implementation, the cooperator may implement the determination of the selected probability of the noise gradient based on the truncated total loss function value according to the following formula:

C₂is a preset cutoff boundary value, u is a total loss function value after cutoff, omega_uIs a set of truncated total loss function values corresponding to each noise gradient.

P is the selected probability, Ω_pSet of selected probabilities for noise gradients。

Step 511: and the cooperative party selects the noise gradient with the maximum selection probability in the noise gradient set as a second target gradient.

Step 512: the collaboration sends a second target gradient to the data holder.

Step 513: and the data holder determines the model parameters of the current iteration of the business model based on the second target gradient and the model parameters of the last iteration of the business model.

Step 514: the data holder detects whether an iteration stop condition is satisfied.

If the iteration stop condition is satisfied, step 515 is executed, i.e., the iteration is stopped, and the training of the business model is completed.

If the iteration stop condition is not satisfied, the process returns to step 501.

Wherein the iteration condition may include: and the convergence of the service model or the iteration times exceed a preset threshold value.

Step 515: stopping iteration and confirming completion of business model training

Referring to fig. 6, fig. 6 is a hardware structure diagram of an electronic device according to an exemplary embodiment of the present disclosure;

the electronic device includes: a communication interface 601, a processor 602, a machine-readable storage medium 603, and a bus 604; wherein the communication interface 601, the processor 602, and the machine-readable storage medium 603 communicate with each other via a bus 604. The processor 602 may perform the above-described multiparty joint training method of the business model by reading and executing machine executable instructions in the machine readable storage medium 603 corresponding to the multiparty joint training control logic of the business model.

The machine-readable storage medium 603 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: volatile memory, non-volatile memory, or similar storage media. In particular, the machine-readable storage medium 603 may be a RAM (random Access Memory), a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., a compact disk, a DVD, etc.), or similar storage medium, or a combination thereof.

The electronic device may be the data holder, and execute a business model training method performed by the data holder. Of course, the electronic device may also be the above-mentioned collaborator, and execute the business model training method performed by the above-mentioned collaborator.

Referring to fig. 7, fig. 7 is a block diagram of a multi-party joint training apparatus of a business model according to an exemplary embodiment of the present disclosure. The multiple parties include: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the method is applied to any data holder, and comprises the following steps:

a selection module 701, which selects a target sample from the locally held samples;

a determining module 702, configured to determine a gradient of the current iteration of the service model based on the target sample, a model parameter of a previous iteration of the service model, and a preset gradient algorithm;

a truncation module 703 configured to truncate the gradient to generate a truncation gradient, send the truncation gradient to the cooperator, determine a first target gradient based on the truncation gradient sent by each data holder by the cooperator, and add a preset type of noise to the first target gradient to generate a second target gradient;

and the updating module 704 is configured to receive a second target gradient sent by the collaborator, and determine a model parameter of the current iteration of the service model based on the second target gradient and the model parameter of the previous iteration of the service model.

Optionally, the truncation module 703 is configured to determine a norm of the gradient according to a preset norm type when the gradient is truncated to generate a truncated gradient; and determining a truncation gradient according to a preset boundary value of gradient truncation and the determined norm.

Optionally, when the selection module 701 selects a target sample from the locally held samples, at least one sample is randomly selected from the locally held samples as the target sample.

Optionally, the apparatus further comprises:

a receiving module 705 (not shown in fig. 7), configured to, when receiving a noise gradient set obtained by adding noise corresponding to each preset noise factor to a first target gradient by a cooperator, calculate a loss function value corresponding to each noise gradient in the noise gradient set based on the noise gradient set, a model parameter of a previous iteration, and a local target sample and a tag thereof;

a sending module 706 (not shown in fig. 7) that sends the loss function value corresponding to each noise gradient to the cooperator; the second target gradient is determined by the cooperator based on the loss function value corresponding to each noise gradient, and the selected probability of each noise gradient.

Optionally, the receiving module 705, when calculating a loss function value corresponding to each noise gradient in the noise gradient set based on the noise gradient set, the model parameter of the current iteration, the local target sample, and the label thereof, determines a temporary model parameter based on the model parameter of the previous iteration and the noise gradient for each noise gradient in the noise gradient set; and substituting the temporary model parameters, the local target sample and the label thereof into the loss function of the service model to obtain a loss function value corresponding to the noise gradient.

Optionally, the iteration stop condition is:

converging the service model; alternatively, the first and second electrodes may be,

the iteration times exceed a preset iteration time threshold.

Optionally, the preset norm type is L2 norm.

Optionally, the preset gradient algorithm is a gradient descent method.

Optionally, the sample object is a user, and the sample is user data.

Referring to fig. 8, fig. 8 is a block diagram of a multi-party joint training apparatus of another business model according to an exemplary embodiment of the present disclosure. The multiple parties include: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the method is applied to the collaborators, and the samples held by each data holder are not identical and the characteristic dimensions of the samples held by each data holder are identical, and the device comprises:

a determining module 801, configured to determine a first target gradient based on the truncation gradients sent by the data holders; the truncation gradient is calculated by each data holder based on a target sample selected from the samples held by the data holder, the model parameter of the last iteration of the business model and a preset gradient algorithm;

a generating module 802, configured to add a preset type of noise to the first target gradient to generate a second target gradient;

the sending module 803 sends the second target gradient to each data holder, so that each data holder determines the model parameter of the current iteration of the business model based on the second target gradient and the model parameter of the previous iteration of the business model.

Optionally, the determining module 801 is further configured to obtain the number of target samples selected by each data holder;

the determining module 801 calculates the sum of the truncation gradients of the data holders when determining the first target gradient based on the truncation gradients sent by the data holders; determining the total number of the target samples of each data holder based on the number of the obtained target samples selected by each data holder; based on the sum, the total number, a first target gradient is determined.

Optionally, the generating module 802 is configured to add a preset type of noise to the first target gradient to obtain a second target gradient, and is configured to add noise corresponding to each preset noise factor to the first target gradient to obtain a noise gradient set; calculating the selected probability of each noise gradient in the noise gradient set; and determining a second target gradient according to the selected probability of each noise gradient.

Optionally, the generating module 802 adds noise corresponding to each preset noise factor in the first target gradient to obtain each noise gradient by using the following formula:

Ω_Δ＝{Δ'＝Δ_t+N(0,σ²I)，σ∈Ω_σ}；

C₁is a preset boundary value of gradient truncation;

|B_tl is the total number of target samples of each data holder;

Optionally, when calculating the selected probability of each noise gradient in the noise gradient set, the generating module 802 sends the noise gradient set to each data holder, so that each data holder calculates a loss function value corresponding to each noise gradient in the noise gradient set based on the noise gradient set, the model parameter of the last iteration, the local target sample and the label thereof, and sends the loss function value corresponding to each noise gradient to the cooperator; the selected probability of each noise gradient is determined based on the loss function value corresponding to each noise gradient transmitted by each data holder.

Optionally, the generating module 802 is configured to, when determining the selected probability of each noise gradient based on the loss function value corresponding to each noise gradient sent by each data holder, determine a total loss function value corresponding to each noise gradient based on the loss function value of the noise gradient sent by each data holder for each noise gradient in the noise gradient set; truncating the total loss function value; based on the truncated total loss function value, a selected probability of the noise gradient is determined.

Optionally, the generating module 802 is configured to, when determining the second target gradient according to the probability selected by each noise gradient, select the noise gradient with the highest probability selected from the set of noise gradients as the second target gradient.

Optionally, the preset type of noise is gaussian noise.

Optionally, the sample object is a user, and the sample is user data.

Optionally, the service model is a sensor model.

In addition, the present specification further provides a multi-party joint training system of a business model, wherein the multiple parties include: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the samples held by each data holder are not identical, and the characteristic dimensions of the samples held by each data holder are identical;

Optionally, the data holder is configured to determine a norm of the gradient according to a preset norm type when the gradient is truncated to generate a truncated gradient; and determining a truncation gradient according to a preset boundary value of gradient truncation and the determined norm.

Optionally, the data holder, when selecting a target sample from the locally held samples, is configured to randomly select at least one sample from the locally held samples as the target sample.

Optionally, the collaborator is further configured to obtain the number of target samples selected by each data holder;

the cooperative party is used for calculating the sum of the truncation gradients of all the data holders when determining the first target gradient based on the truncation gradients sent by all the data holders; determining the total number of the target samples of each data holder based on the number of the obtained target samples selected by each data holder; based on the sum, the total number, a first target gradient is determined.

Optionally, the cooperator adds a preset type of noise to the first target gradient to obtain a second target gradient, and is configured to add noise corresponding to each preset noise factor to the first target gradient to obtain a noise gradient set; calculating the selected probability of each noise gradient in the noise gradient set; and determining a second target gradient according to the selected probability of each noise gradient.

Optionally, the cooperator adds noise corresponding to each preset noise factor in the first target gradient to obtain each noise gradient by using the following formula:

Ω_Δ＝{Δ'＝Δ_t+N(0,σ²I)，σ∈Ω_σ}；

C₁is a preset boundary value of gradient truncation;

|B_tl is the total number of target samples of each data holder;

Optionally, the cooperator is configured to send the noise gradient set to each data holder when calculating the selected probability of each noise gradient in the noise gradient set;

the data holder is used for calculating loss function values corresponding to the noise gradients in the noise gradient set based on the noise gradient set, the model parameters of the last iteration, the local target samples and the labels thereof, and sending the loss function values corresponding to the noise gradients to the cooperator;

and the cooperator is used for determining the selected probability of each noise gradient based on the loss function value which is sent by each data holder and corresponds to each noise gradient.

Optionally, when determining the selected probability of each noise gradient based on the loss function value corresponding to each noise gradient sent by each data holder, the cooperator is configured to determine, for each noise gradient in the noise gradient set, a total loss function value corresponding to the noise gradient based on the loss function value of the noise gradient sent by each data holder; truncating the total loss function value; based on the truncated total loss function value, a selected probability of the noise gradient is determined.

Optionally, when the data user calculates a loss function value corresponding to each noise gradient in the noise gradient set based on the noise gradient set, the model parameter of the previous iteration, the local target sample, and the label thereof, the data user is configured to determine a temporary model parameter for each noise gradient in the noise gradient set based on the model parameter of the previous iteration and the noise gradient; and substituting the temporary model parameters, the local target sample and the label thereof into the loss function of the service model to obtain a loss function value corresponding to the noise gradient.

Optionally, when determining the second target gradient according to the probability selected by each noise gradient, the cooperator is configured to select, as the second target gradient, the noise gradient with the highest probability selected from the set of noise gradients.

Optionally, the iteration stop condition is:

the iteration times exceed a preset iteration time threshold.

Optionally, the preset norm type is L2 norm.

Optionally, the preset gradient algorithm is a gradient descent method.

Optionally, the sample object is a user, and the sample is user data.

Optionally, the preset type of noise is gaussian noise.

Optionally, the service model is a sensor model.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A method for joint training of multiple parties to a business model, the multiple parties comprising: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the method is applied to any data holder, and comprises the following steps:

selecting a target sample from the locally held samples;

2. The method of claim 1, the truncating the gradient generating a truncated gradient, comprising:

determining the norm of the gradient according to a preset norm type;

and determining a truncation gradient according to a preset boundary value of gradient truncation and the determined norm.

3. The method of claim 1, the selecting a target sample from locally held samples comprising:

at least one sample is randomly selected from the locally held samples as a target sample.

4. The method of claim 1, further comprising:

when a noise gradient set obtained by adding noise corresponding to each preset noise factor in a first target gradient by a cooperative party is received, calculating a loss function value corresponding to each noise gradient in the noise gradient set based on the noise gradient set, a model parameter of last iteration, a local target sample and a label thereof;

sending the loss function values corresponding to the noise gradients to a cooperative party; the second target gradient is determined by the cooperator based on the loss function value corresponding to each noise gradient, and the selected probability of each noise gradient.

5. The method of claim 4, wherein calculating a loss function value corresponding to each noise gradient in the set of noise gradients based on the set of noise gradients, the model parameters of the current iteration, and the local target sample and its label comprises:

for each noise gradient in the noise gradient set, determining a temporary model parameter based on the model parameter of the last iteration and the noise gradient;

and substituting the temporary model parameters, the local target sample and the label thereof into the loss function of the service model to obtain a loss function value corresponding to the noise gradient.

6. The method of claim 1, the iteration stop condition being:

the iteration times exceed a preset iteration time threshold.

7. The method of claim 2, wherein the preset norm type is L2 norm.

8. The method of claim 1, wherein the predetermined gradient algorithm is a gradient descent method.

9. The method of claim 1, the sample object being a user, the sample being user data.

10. A method for joint training of multiple parties to a business model, the multiple parties comprising: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the method is applied to the collaborators, and comprises the following steps:

11. The method of claim 10, further comprising:

acquiring the number of target samples selected by each data holder;

the determining a first target gradient based on the truncation gradients sent by the data holders comprises:

calculating the sum of the truncation gradients of the data holders;

determining the total number of the target samples of each data holder based on the number of the obtained target samples selected by each data holder;

based on the sum, the total number, a first target gradient is determined.

12. The method of claim 10, wherein adding a predetermined type of noise to the first target gradient to obtain a second target gradient comprises:

respectively adding noise corresponding to each preset noise factor in the first target gradient to obtain a noise gradient set;

calculating the selected probability of each noise gradient in the noise gradient set;

and determining a second target gradient according to the selected probability of each noise gradient.

13. The method of claim 12, wherein the adding of the noise corresponding to each preset noise factor to the first target gradient is implemented by the following formula to obtain each noise gradient:

Ω_Δ＝{Δ'＝Δ_t+N(0,σ²I)，σ∈Ω_σ}；

C₁is a preset boundary value of gradient truncation;

|B_tl is the total number of target samples of each data holder;

14. The method of claim 12, the calculating a selected probability for each noise gradient in the set of noise gradients, comprising:

sending the noise gradient set to each data holder, so that each data holder calculates loss function values corresponding to each noise gradient in the noise gradient set based on the noise gradient set, model parameters of last iteration, local target samples and labels thereof, and sends the loss function values corresponding to each noise gradient to a cooperative party;

the selected probability of each noise gradient is determined based on the loss function value corresponding to each noise gradient transmitted by each data holder.

15. The method of claim 14, wherein determining the selected probability for each noise gradient based on the loss function value corresponding to each noise gradient sent by each data holder comprises:

aiming at each noise gradient in the noise gradient set, determining a total loss function value corresponding to the noise gradient based on the loss function value of the noise gradient sent by each data holder;

truncating the total loss function value;

based on the truncated total loss function value, a selected probability of the noise gradient is determined.

16. The method of claim 12, wherein determining a second target gradient based on the selected probability for each noise gradient comprises:

and selecting the noise gradient with the maximum probability as a second target gradient in the noise gradient set.

17. The method of claim 10, wherein the predetermined type of noise is gaussian noise.

18. The method of claim 10, the sample object being a user, the sample being user data.

19. The method of claim 10, the business model being a sensor model.

20. A multi-party joint training system for business models, the parties comprising: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the samples held by each data holder are not identical, and the characteristic dimensions of the samples held by each data holder are identical;

21. The system of claim 20, wherein the data holder, when truncating the gradient to generate a truncated gradient, is configured to determine a norm of the gradient according to a preset norm type; and determining a truncation gradient according to a preset boundary value of gradient truncation and the determined norm.

22. The system of claim 20, wherein the data holder, in selecting the target sample from the locally held samples, is configured to randomly select at least one sample from the locally held samples as the target sample.

23. The system of claim 20, wherein the collaborator is further configured to obtain the number of target samples selected by each data holder;

24. The system according to claim 20, wherein the cooperator adds a preset type of noise to the first target gradient to obtain a second target gradient, and is configured to add noise corresponding to each preset noise factor to the first target gradient to obtain a noise gradient set; calculating the selected probability of each noise gradient in the noise gradient set; and determining a second target gradient according to the selected probability of each noise gradient.

25. The system of claim 24, wherein the cooperator adds noise corresponding to each of the predetermined noise factors to the first target gradient to obtain each of the noise gradients by:

Ω_Δ＝{Δ'＝Δ_t+N(0,σ²I)，σ∈Ω_σ}；

C₁is a preset boundary value of gradient truncation;

|B_tl is the total number of target samples of each data holder;

26. The system of claim 24, the cooperator, in calculating a selected probability for each noise gradient in the set of noise gradients, to transmit the set of noise gradients to each data holder;

27. The system of claim 26, wherein the cooperator, in determining the selected probability for each noise gradient based on the loss function value corresponding to each noise gradient sent by each data holder, is configured to determine, for each noise gradient in the set of noise gradients, a total loss function value corresponding to that noise gradient based on the loss function value for that noise gradient sent by each data holder; truncating the total loss function value; based on the truncated total loss function value, a selected probability of the noise gradient is determined.

28. The system of claim 26, wherein the data consumer, when calculating the loss function value corresponding to each noise gradient in the set of noise gradients based on the set of noise gradients, the model parameters of the last iteration, and the local target sample and its label, is configured to determine, for each noise gradient in the set of noise gradients, a temporary model parameter based on the model parameters of the last iteration and the noise gradient; and substituting the temporary model parameters, the local target sample and the label thereof into the loss function of the service model to obtain a loss function value corresponding to the noise gradient.

29. The system of claim 24, wherein the cooperator, when determining the second target gradient based on the selected probability for each noise gradient, is configured to select the noise gradient with the highest selected probability as the second target gradient from the set of noise gradients.

30. A multi-party joint training apparatus for a business model, the multiple parties comprising: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the method is applied to any data holder, and comprises the following steps:

a selection module that selects a target sample from locally held samples;

31. The apparatus according to claim 30, wherein the truncation module determines a norm of the gradient according to a preset norm type when the gradient is truncated to generate a truncated gradient; and determining a truncation gradient according to a preset boundary value of gradient truncation and the determined norm.

32. The apparatus of claim 30, the selection module, when selecting the target sample from the locally held samples, randomly selects at least one sample from the locally held samples as the target sample.

33. The apparatus of claim 30, further comprising:

the receiving module is used for calculating loss function values corresponding to the noise gradients in the noise gradient set based on the noise gradient set, model parameters of previous iteration, local target samples and labels thereof when receiving a noise gradient set obtained by adding noise corresponding to each preset noise factor in the first target gradient by a cooperative party;

the transmitting module is used for transmitting the loss function values corresponding to the noise gradients to the cooperative party; the second target gradient is determined by the cooperator based on the loss function value corresponding to each noise gradient, and the selected probability of each noise gradient.

34. The apparatus of claim 33, wherein the receiving module, when calculating the loss function value corresponding to each noise gradient in the noise gradient set based on the noise gradient set, the model parameter of the current iteration, and the local target sample and its label, determines a temporary model parameter based on the model parameter of the previous iteration and the noise gradient for each noise gradient in the noise gradient set; and substituting the temporary model parameters, the local target sample and the label thereof into the loss function of the service model to obtain a loss function value corresponding to the noise gradient.

35. A multi-party joint training apparatus for a business model, the multiple parties comprising: a plurality of data holders and collaborators, each of the plurality of data holders having configured the business model; the method is applied to the collaborators, and the samples held by each data holder are not identical and the characteristic dimensions of the samples held by each data holder are identical, and the device comprises:

36. The apparatus of claim 35, wherein the determining module is further configured to obtain the number of target samples selected by each data holder;

the determining module is used for calculating the sum of the truncation gradients of all the data holders when determining the first target gradient based on the truncation gradients sent by all the data holders; determining the total number of the target samples of each data holder based on the number of the obtained target samples selected by each data holder; based on the sum, the total number, a first target gradient is determined.

37. The apparatus according to claim 35, wherein the generating module is configured to add a preset type of noise to the first target gradient to obtain a second target gradient, and is configured to add noise corresponding to each preset noise factor to the first target gradient to obtain a noise gradient set; calculating the selected probability of each noise gradient in the noise gradient set; and determining a second target gradient according to the selected probability of each noise gradient.

38. The apparatus of claim 37, wherein the generating module is configured to add noise corresponding to each preset noise factor to the first target gradient to obtain each noise gradient by the following formula:

Ω_Δ＝{Δ'＝Δ_t+N(0,σ²I)，σ∈Ω_σ}；

C₁is a preset boundary value of gradient truncation;

|B_tl is the total number of target samples of each data holder;

39. The apparatus of claim 37, the generating module, in calculating the selected probability for each noise gradient in the set of noise gradients, to send the set of noise gradients to each data holder for each data holder to calculate, based on the set of noise gradients, the model parameters of the last iteration, and the local target samples and their labels, loss function values corresponding to each noise gradient in the set of noise gradients, and to send the loss function values corresponding to each noise gradient to the cooperator; the selected probability of each noise gradient is determined based on the loss function value corresponding to each noise gradient transmitted by each data holder.

40. The apparatus of claim 39, wherein the generating module, when determining the selected probability of each noise gradient based on the loss function value corresponding to each noise gradient sent by each data holder, is configured to determine, for each noise gradient in the set of noise gradients, a total loss function value corresponding to the noise gradient based on the loss function value of the noise gradient sent by each data holder; truncating the total loss function value; based on the truncated total loss function value, a selected probability of the noise gradient is determined.

41. The apparatus of claim 37, wherein the generating module, when determining the second target gradient according to the selected probability of each noise gradient, is configured to select the noise gradient with the highest selected probability as the second target gradient in the set of noise gradients.

42. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the method of any one of claims 1-9 by executing the executable instructions.

43. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 9.

44. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the method of any one of claims 10-19 by executing the executable instructions.

45. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the method according to any one of claims 10-19.