CN114912624A

CN114912624A - Longitudinal federal learning method and device for business model

Info

Publication number: CN114912624A
Application number: CN202210380964.8A
Authority: CN
Inventors: 吴慧雯; 王力
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2022-08-16

Abstract

The embodiment of the specification provides a method and a device for jointly updating a model, and provides a longitudinal federal learning method for a business model, wherein the business model comprises a global model arranged on a service side and local models arranged on training members, the first training member can process first feature data of a plurality of training samples of a current batch through the local first local model to obtain a first intermediate tensor, then perform a clipping operation based on a preset first clipping threshold value on the first intermediate tensor to form a first reduction tensor, and then add a first target noise meeting differential privacy to the first reduction tensor to obtain a first distribution tensor so as to provide the first distribution tensor for the service side, so that the service side processes the first distribution tensor through the global model, and thus trains the global model and the first local model. This way, data privacy can be effectively protected and privacy loss can be controlled.

Description

Longitudinal federal learning method and device for business model

Technical Field

One or more embodiments of the present disclosure relate to the field of security computing technologies, and in particular, to a method and an apparatus for longitudinal federal learning for business models.

Background

With the rapid development of deep learning, artificial intelligence technology is showing its advantages in almost every industry. However, big data driven artificial intelligence presents many difficulties in real-world situations. For example, data islanding is severe, utilization is low, and cost is always high. Single training members of some industries may also suffer from limited or poor data quality. In addition, due to industry competition, privacy security and complex management procedures, even data integration between different departments of the same company may face tremendous resistance, and data integration costs are high.

Federal learning is proposed in such a context. Federated learning is a framework based on distributed machine learning, and the main idea is to build a machine learning model based on a data set distributed on a plurality of devices, and simultaneously prevent data leakage. Under this framework, clients (e.g., mobile devices) cooperatively train the model under the coordination of a central server, while training data can remain local to the client, without uploading the data to a data center like traditional machine learning methods. As such, trained personnel may need some processing of the data before uploading the local data to the server. Therefore, under the condition that the data of the training members are uploaded to the server, how to protect the data privacy and avoid reversely deducing local data according to the uploaded data has important significance in the federal learning process.

Disclosure of Invention

One or more embodiments of the present specification describe a method and apparatus for longitudinal federal learning for business models to address one or more of the problems mentioned in the background.

According to a first aspect, there is provided a longitudinal federated learning method for a business model, the business model including a global model provided to a service side and a first local model provided to a first training member, the method being performed by the first training member and comprising: processing first characteristic data of a plurality of training samples of a current batch through a local first local model to obtain a first intermediate tensor; performing a clipping operation based on a preset first clipping threshold value on the first intermediate tensor to form a first protocol tensor; adding a first target noise conforming to differential privacy to the first reduction tensor to obtain a first release tensor to provide for the service party, and enabling the service party to process the first release tensor through the global model so as to train the global model and the first local model.

In one embodiment, the clipping operation at the preset first clipping threshold includes: if the current norm value of the first intermediate tensor exceeds the first clipping threshold value, determining the proportion of the first clipping threshold value to the current norm value, and clipping the first intermediate tensor according to the proportion.

In one embodiment, the first clipping threshold is determined by one of: selecting from a set of candidate thresholds; taking the average value of historical intermediate tensors determined in a historical period or the release tensors of the historical intermediate tensors; and taking the median of the historical intermediate tensor determined in the historical period or the release tensor of the historical intermediate tensor.

In one embodiment, the adding the first target noise conforming to differential privacy over the first reduced tensor comprises: acquiring a noise factor determined based on a predetermined privacy budget; determining a first noise parameter using the noise factor and the first clipping threshold; the first target noise is sampled based on a first noise profile defined by the first noise parameter.

In one embodiment, the first noise distribution is a gaussian distribution, and the first noise parameter is a first variance corresponding to the gaussian distribution.

In one embodiment, the first variance is positively correlated with the absolute value of the product of the noise factor times the first clipping threshold.

In one embodiment, the first training member further holds first label data for a current batch of a number of training samples, the first label data corresponding to a first label tensor, the method further comprising: adding a second target noise conforming to difference privacy to the first label tensor to obtain a second release tensor to provide for the server, so that the server determines a comparison result of an output result of the global model and the second release tensor to be used for training the global model and the first local model.

In one embodiment, the adding a second target noise that conforms to differential privacy on the first tag tensor comprises: acquiring a noise factor determined based on a predetermined privacy budget; determining a second noise parameter using the noise factor and a norm of the first label tensor; sampling the second target noise based on a second noise profile defined by the second noise parameter.

In one embodiment, the first tag tensor is a multidimensional tensor for a plurality of service objects, a single service object corresponds to a single tag vector, and adding a second object noise meeting the difference privacy to the first tag tensor to obtain a second release tensor includes: adding target noises meeting the difference privacy respectively aiming at each label vector corresponding to each service target to obtain each release vector; a second publication tensor is determined based on the respective publication vectors.

In one embodiment, for a single label vector corresponding to a single business target, the corresponding single target noise is determined by: performing clipping operation based on a preset single clipping threshold value aiming at the single label vector to form a single reduction vector; adding a corresponding single target noise to the single reduced vector to form a single release tensor.

In one embodiment, the single clipping threshold is determined by one of: in agreement with the first clipping threshold; selecting from a set of candidate thresholds; taking the average value of each element in the single label vector; and taking the median of each element in the single label vector.

In one embodiment, the single target noise conforms to a gaussian distribution, and the single noise parameter corresponding to the single target noise is a single variance of the gaussian distribution, the single variance being determined according to the single clipping threshold and a noise factor determined based on a predetermined privacy budget.

According to a second aspect, there is provided a longitudinal federated learning apparatus for business models, where the business models include a global model provided to a service side and a first local model provided to a first training member, and the apparatus is provided to the first training member, and includes:

the processing unit is configured to process first feature data of a plurality of training samples of a local current batch through a local first local model to obtain a first intermediate tensor;

the clipping unit is configured to perform clipping operation based on a preset first clipping threshold value on the first intermediate tensor to form a first reduction tensor;

the issuing unit is configured to add a first target noise meeting differential privacy to the first reduction tensor to obtain a first issuing tensor, and provide the first issuing tensor to the service party, so that the service party processes the first issuing tensor through the global model, and the training of the global model and the first local model is performed.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and the processor, when executing the executable code, implements the method of the first aspect.

By the method and the device provided by the embodiment of the specification, in the longitudinal federal learning process, before the data of a single training member about a training sample is released to a service party, the noise meeting the difference privacy is cut and added. The data can be preliminarily fuzzified by cutting, the data privacy can be protected, controllable noise can be added to the noise meeting the difference privacy under the preset privacy precaution, and the accuracy of the result is ensured. Due to the adoption of the mode, the data privacy can be effectively protected, and the privacy loss is controllable, so that the privacy and the accuracy of the central longitudinal federal learning can be improved, redundant operation is avoided, and the learning efficiency of the business model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 shows a flow diagram of a longitudinal federated learning method for a business model, according to one embodiment

FIG. 2 is a business model architecture diagram illustrating one aspect of vertical federated learning;

FIG. 3 shows a schematic block diagram of a longitudinal federated learning apparatus for a business model, in accordance with one embodiment.

Detailed Description

The technical solution provided in the present specification is described below with reference to the accompanying drawings.

Some concepts that may be referred to in this specification are first described.

Federal Learning (Federated Learning), which may also be referred to as federal machine Learning, joint Learning, league Learning, and the like. The Federation machine learning is a machine learning framework, and can effectively help a plurality of organizations to carry out data use and machine learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations.

In particular, assuming that enterprise A and enterprise B each build a task model, individual tasks may be categorical or predictive, and these tasks have also been approved by the respective users when obtaining data. However, the model at each end may not be able to be built or perform satisfactorily due to incomplete data, such as lack of tag data for enterprise a, lack of user profile data for enterprise B, or insufficient data and insufficient sample size to build a good model. The problem to be solved by federal learning is how to build high-quality models on each end of a and B, the training of the models is used for the data of each enterprise, such as a and B, and the owned data of each enterprise is not known by other parties, namely, a common model is built without violating data privacy regulations. This common model is just like the optimal model that the parties aggregate the data together. In this way, the built model serves only its own targets in the region of each party.

The various organizations for federal learning may be referred to as training members. Each training member can hold different business data and can participate in the joint training of the business model through equipment, a computer, a server and the like. The service data may be various data such as characters, pictures, voice, animation, video, and the like. Generally, the business data held by each training member has correlation, and the business party corresponding to each training member may also have correlation. For example, among a plurality of business parties relating to financial services, the business party 1 is a bank which provides a business such as savings and loan to a user, and can hold data such as the age, sex, income/expenditure line, loan amount, and deposit amount of the user, the business party 2 can hold data such as loan record, investment record, and payment due time of the user, and the business party 3 is a shopping site which holds data such as the shopping habit, payment habit, and payment account of the user. For another example, among a plurality of business parties related to medical services, each business party may be each hospital, physical examination organization, etc., for example, the business party 1 is the hospital a, diagnosis records corresponding to the age, sex, symptom, diagnosis result, treatment plan, treatment result, etc. of the user are used as local business data, the business party 2 may be the physical examination organization B, physical examination record data corresponding to the age, sex, symptom, physical examination conclusion, etc. of the user, etc. A single training member may hold business data for one business or may hold business data for multiple business parties. The goal of federal learning is to train a model that can better handle such business data, and thus the federally learned model can also be referred to as a business model.

Federal learning is divided into horizontal federal learning and vertical federal learning. In the horizontal federal learning, the feature coincidence of the samples in the sample sets of different training members is high, but the sample sources are different. For example, the plurality of sample sets correspond to customers of different banks. The data characteristics of general bank management are similar, but customers are different, so that the model can be trained in a horizontal federal learning mode. Different data sets id in the longitudinal federal study are high in coincidence (records such as user identification numbers or telephone numbers are consistent), but the characteristics are different. For example, a bank and a hospital which face a user group (such as residents in a small county) are consistent, the coincidence degree of a large number of people in samples of the bank and the hospital is high, but the characteristics are different, the bank data may correspond to characteristic information such as deposit, loan and the like, and the hospital data may correspond to characteristic information such as physiological indexes, health conditions, treatment records and the like. The data set training model of the united bank and hospital can be called longitudinal federal learning.

For a vertical federal learning scene of vertical segmentation data, a business model is generally divided into two parts, wherein one part is a local model held by each training member and used for processing local characteristic data to obtain an intermediate result, and the other part is a global model and used for processing the intermediate result of each training member to obtain a final output result. In the longitudinal federated learning scenario, a centerless multi-party security computing MPC architecture can be used by multiple training members, and at this time, the global model can be distributed among the training members and jointly computed by the multiple training members. The method is usually high in computational power consumption, frequent in communication and high in communication quantity, and is difficult to expand into large-scale nonlinear operation. A centralized third-party auxiliary framework can also be adopted in the longitudinal federal learning scene, and the third party is used as a service center and can also be called as a service party. At this point, the global model may be located on the server side. On one hand, each training member needs to provide the intermediate data after feature data processing to the server, and on the other hand, the training member holding the label data needs to provide the label data to the server. And the data provided by the training member to the service party is at risk of privacy disclosure.

In view of this, the present specification provides a data transfer scheme based on differential privacy based on a centralized implementation architecture of a longitudinal federated learning scenario, where a training member adds noise to an intermediate result obtained by processing local feature data using a local model by using a differential privacy mechanism, and then sends the intermediate result to a service party. Therefore, the strong dependence on computing power and communication in a vertical data segmentation scene can be reduced, and splitting learning in a lightweight efficient privacy protection vertical scene is realized.

The technical idea of the present specification is described below with reference to a specific example shown in fig. 1.

As shown in fig. 1, it is a longitudinal federal learning procedure for business models. The longitudinal federal learning process can be implemented by a service party and a plurality of training members together. The individual training member may be any computing, processing capable device, platform, or cluster of devices that holds part of the feature data of the training sample, such as a business party that implements a business. In alternative implementations, a single training member may also hold some or all of the tag data. The service party may be a third party or other trusted business party.

FIG. 2 illustrates a model architecture under a centralized implementation architecture of a vertical federated learning scenario. In the longitudinal federal learning, the business model can be divided into a plurality of local models and global models, each local model is distributed in each training member, and the global models are arranged on the service side. The longitudinal federal learned business model can be a multi-layer neural network, a linear regression model, or the like. As shown in fig. 2, the local models are juxtaposed. The single training member can process the local characteristic data of the current training sample through the local model to obtain a corresponding intermediate result and provide the intermediate result for the server. And the server side receives the intermediate results sent by the training members respectively and processes the intermediate results through the global model so as to obtain global output. In the case that the business model is a multi-layer neural network, each intermediate result sent by each training member may be merged at the first layer of the global model and processed by subsequent layers. For training members holding label data, the label data can also be provided to the service party, so that the service party can compare the global output with the label data, and then reversely transmit the gradient to train the global model and each local model.

The embodiment of the present specification is improved based on the architecture shown in fig. 2, and when a single training member provides an intermediate result to a server, the intermediate result is subjected to encryption processing based on differential privacy, so that the intermediate result satisfying the differential privacy is provided to the server, and the privacy of local feature data is protected. Under the condition that the training member also holds the tag data, the encryption processing based on the differential privacy can be carried out on the tag data, and the privacy of the local tag data is protected.

For convenience of description, any one of the plurality of training members is referred to as a first training member, and the local model provided to the first training member is referred to as a first local model. Then, as shown in fig. 1, the longitudinal federal learning procedure for business models performed by the first training member may include: step 101, processing first feature data of a plurality of training samples of a local current batch through a local first local model to obtain a first intermediate tensor; 102, performing a clipping operation based on a preset first clipping threshold value on the first intermediate tensor to form a first reduction tensor; step 103, adding a first target noise meeting the difference privacy to the first reduction tensor to obtain a first distribution tensor, and providing the first distribution tensor to the server, so that the server processes the first distribution tensor through the global model, and thus training the global model and the first local model is performed.

First, in step 101, first feature data of a plurality of training samples of a current batch is processed through a local first local model, so as to obtain a first intermediate tensor. It is appreciated that in a longitudinal federal learning scenario, a first training adult may hold first feature data of a training sample.

The first characteristic data may include one or more characteristics. The first feature data may be extracted from the local data in advance. For example, one training sample corresponds to one user, and the local data held by the first training member is financing, loan and repayment data of the user, so that the characteristics of financing type, financing amount, financing income, loan frequency, loan amount, repayment timeliness and the like can be extracted from the local data as the first characteristic data of the corresponding training sample. The current batch of training samples may include one or more training samples. The training samples of the current batch can be sampled from the local data set by each training member through a consistent privacy protection-based method, and the sampling results of each training member are aligned with each other.

The first training member can process the first feature data through a local first local model, and the processing result is marked as a first intermediate result. The first intermediate result is typically described in terms of a tensor, and may therefore be referred to as a first intermediate tensor. The first intermediate tensor can be a one-dimensional tensor (vector), a two-dimensional tensor (matrix), or other tensor form, which is not limited herein.

Then, in step 102, a clipping operation based on a preset first clipping threshold value is performed on the first intermediate tensor to form a first reduced tensor. The cropping operation may limit the first intermediate tensor to a range, thereby blurring the first intermediate tensor to some degree.

Here, the first clipping threshold is used to describe the numerical range defined for the first intermediate tensor. In practice, it may be determined whether to crop the corresponding element value by comparing the element value in the first intermediate tensor or the norm value of the first intermediate tensor with a first cropping threshold. And when the element value in the first intermediate tensor or the norm value of the first intermediate tensor is compared with the first clipping threshold and exceeds the range defined by the first clipping threshold, clipping the element value in the first intermediate tensor. In one particular example, the first clipping threshold may be compared to a 2-norm of the first intermediate tensor. For example, taking the first intermediate tensor as h, performing a clipping operation on h can be expressed as:

where h is used to represent a first intermediate tensor, C is a first clipping threshold, | |) ₂ Representing a second order norm. This equation represents the 2-norm | h | of the first intermediate tensor h ₂ If the first intermediate tensor is greater than the first clipping threshold value C, the first intermediate tensor h is divided into a first clipping threshold value C and a current norm value h | ₂ Is scaled down (i.e., clipped). The clipping operation corresponds to a specification of the first intermediate tensor, so that the clipped tensor can be used

Referred to as the first reduced tensor.

In practice, the clipping operation may be performed based on a predetermined clipping threshold value by using a first-order norm of the first intermediate tensor h or other values describing the property of the first intermediate tensor h, which is not limited herein.

According to one embodiment, under the privacy mechanism of differential privacy, the clipping threshold may have some candidate values, such as 1, 2, 0.8, etc., according to the data size. At this time, the first clipping threshold may be selected from the candidate threshold set. For example, if the element values of the first intermediate tensor 90% are below 1, the first clipping threshold may be selected to be 1.

According to another embodiment, the first clipping threshold is an average of historical intermediate tensors or historical release tensors for historical intermediate tensors determined by the first training member in the current longitudinal federated learning. For example, if the current cycle is the nth cycle, the first clipping threshold may be the average C ═ Mean (| h) of the historical intermediate tensors of the previous n cycles ₁ || ₂ ，…，||h _n || ₂ )。

According to one embodiment, the first clipping threshold is a history of or for historical intermediate tensors determined by the first training member in the current longitudinal federated learning in the historical periodThe median of the tensor is published. For example, the current period is the nth period, and the first clipping threshold may be a Median of the historical intermediate tensors of the previous n periods, such as C ═ media (| h) ₁ || ₂ ，…，||h _n || ₂ )。

In other embodiments, the first clipping threshold may also be determined in other manners, which are not limited herein. It should be noted that the training members may use a uniform first clipping threshold, or may each determine a clipping threshold suitable for the local data characteristics according to the local data characteristics.

Next, in step 103, a first target noise satisfying the difference privacy is added to the first reduced tensor to obtain a first distribution tensor, and the first distribution tensor is provided to the service side. In this way, the server can perform training of the global model and the first local model by processing the first distribution tensor through the global model instead of processing the first intermediate tensor.

Differential privacy dp (differential privacy) is a means in cryptography that aims to maximize the accuracy of data queries while minimizing the chances of identifying their records when querying from statistical databases. A random algorithm M is provided, and PM is a set formed by all possible outputs of M. For any two adjacent data sets x and x '(i.e., x and x' differ by only one data record) and any subset of PM

If the random algorithm M satisfies:

the algorithm M is said to provide epsilon-differential privacy protection, where the parameter epsilon is referred to as the privacy protection budget to balance the degree and accuracy of privacy protection. ε may be generally predetermined. The closer ε is to 0, e ^ε The closer to 1, the closer the processing results of the random algorithm to the two neighboring data sets x and x', the stronger the degree of privacy protection.

In practice, the above strict epsilon-difference privacy can be relaxed to some extent to achieve (epsilon, delta) difference privacy, i.e., as shown in the following formulaThe following steps:

where δ is a relaxation term, also called tolerance, which can be understood as the probability that strict differential privacy cannot be achieved.

Implementations of differential privacy include noise mechanisms, exponential mechanisms, and the like. In the case of a noise mechanism, the magnitude of the added noise is typically determined according to the sensitivity of the query function. The sensitivity indicates the maximum difference of the query result when the query function queries a pair of adjacent data sets x and x'.

In the embodiment shown in fig. 1, differential privacy may be implemented using a noise mechanism. In various examples, the first target noise may be laplacian noise that satisfies epsilon-differential privacy, or gaussian noise that satisfies (epsilon, delta) differential privacy, or the like. The determination and addition of the target noise can be implemented in various ways. The noise addition process is described below with gaussian noise satisfying (epsilon, delta) differential privacy as an example.

In particular, the noise factor may be determined based on a preset privacy budget. This noise factor is used to measure the amount of noise currently added. For example, under the mechanism of (ε, δ) differential privacy:

as can be seen from the expression for the noise factor, the noise factor is determined by the privacy preserving budget epsilon and the tolerance delta. In one longitudinal federal learning process, s is a determination given the privacy budget. Thus, the noise factor may be predetermined and used as a hyperparameter in each training member and repeatedly used in each update period.

The noise factor s describes the amount of noise currently added, thereby facilitating control of privacy costs. While the clipping operation based on the first clipping threshold zooms the intermediate result (first intermediate tensor) according to the first clipping threshold, and accordingly, is betterPrivacy costs are controlled and the noise parameter based on the noise factor s is also scaled according to a first financial threshold. Therefore, it may be considered that the noise parameter to be added is determined by the noise factor s and the first clipping threshold. In Gaussian noise, for example, the distribution satisfied by the noise is Gaussian

The noise parameters may include a mean 0 and a variance σ ² 。

It will be appreciated that to limit the magnitude of the noise, the mean of the gaussian noise is typically 0, and the noise parameter determined by the noise factor s and the first clipping threshold may be the variance σ of the gaussian distribution ² . For the first intermediate tensor, the variance σ of the noise distribution (e.g., first noise distribution) defined by the corresponding noise parameter (e.g., first noise parameter) is determined ² Then, it can be based on the variance σ ² Random noise is sampled from the formed Gaussian distribution and is superposed on the first intermediate tensor to obtain a first distribution tensor.

As a specific example, the first variance in the first noise distribution corresponding to the first intermediate tensor can be positively correlated with the absolute value of the product of the noise factor s multiplied by the first clipping threshold C, as: sigma ² ₁ ＝s ² C ² . The first noise profile may be denoted as N (0, s) ² C ² ). In Gaussian distribution N (0, s) ² C ² ) And performing down-random sampling to obtain a noise tensor with the same size as the first intermediate tensor, and superposing the noise tensor to the first intermediate tensor as a first noise tensor to obtain a first distribution tensor perturbed to the first intermediate tensor.

The first training member may provide the first post tensor to the server. Similarly, other training members can also process the local feature data of the current batch of training samples by using the local model, and the obtained other release tensors of the other intermediate vectors added with the noise are provided for the service party. And the server side processes each release tensor through the global model so as to obtain global output. The global output may be a prediction result of predicting the service, or a classification result of classifying the service, or the like.

It can be understood that in supervised machine learning, the output result of the business model is usually supervised by using the sample label, the undetermined parameter of the global model is adjusted by the server side through comparison between the sample label and the output result of the business model, and each training member adjusts the undetermined parameter of the local model, so that the output result of the business model is close to the sample label, thereby training the global model and the local model.

In longitudinal federal learning, tag data is usually held by a portion of training members. A single sample label may describe a single business object, for example, a classified object category, or may describe multiple business objects, for example, a probability described in each object category, or multiple object categories with top ranking predicted for a sample subject (e.g., a user), for example, in a business model with predicted user interest, an output of the business model may be a top-three object category with user interest. Accordingly, in the case of describing a single business object, a single sample label may correspond to a single value, and the labels of multiple training samples form a one-dimensional tensor (e.g., vector), and in the case of describing multiple business objects, a single sample label corresponds to a one-dimensional tensor, and the labels of multiple training samples form a two-dimensional tensor (e.g., matrix).

The sample tag data may be held by one or more training members. The training members holding the sample label data may provide the label data to the service. Considering that privacy of local data may be leaked due to direct provision of the tag, the training member may add noise conforming to differential privacy to the tag data before providing the tag data, thereby protecting data privacy. The following describes a process of adding noise conforming to differential privacy to tag data, taking the first training member as an example.

Assume that the first training member also holds first label data of several training samples of the current batch, and the first label data corresponds to a first label tensor. The first training member may add a second target noise that meets the differential privacy to the first tag tensor to obtain a second release tensor to provide to the service party. It is understood that the first label tensor can be a one-dimensional tensor or a two-dimensional tensor.

In one possible design, the first training member may determine a second noise parameter using the noise factor s, described above, and the norm of the first label tensor y, and then sample a second target noise based on a second noise profile defined by the second noise parameter. Taking gaussian difference privacy as an example, the noise parameters may include mean and variance. In order to conveniently control the size of the noise without affecting the distribution of the first label tensor, the noise parameter mean value is still 0, and the variance is positively correlated with the absolute value of the product of the noise factor s and the first label tensor y, for example: sigma ² ₂ ＝s ² y ² . Wherein, y ² The second-order norm, which may represent the first label tensor, is, for example, y ^T y. The first training member may be in Gaussian distribution N (0, s) ² y ² ) And randomly sampling the noise tensor with the size consistent with that of the first label tensor, and superposing the noise tensor on the first label tensor to form a second release tensor.

In another possible design, a cropping operation may also be performed on the first label tensor to perform preliminary blurring before adding noise to it.

According to one embodiment, the clipping operation on the first label tensor can use a similar approach to the first intermediate tensor, such as uniform clipping of the first label tensor based on the second clipping threshold. Wherein the second clipping threshold may be determined in a manner similar to the first clipping threshold, such as by: selecting from a set of candidate thresholds; taking the average value of each element in the first tag tensor; the median of each element in a single tag vector is taken. Optionally, the second clipping threshold may also coincide with the first clipping threshold.

According to another embodiment, in the case that the first tag tensor is a two-dimensional tensor, considering the correlation between the data distribution and the business targets, a certain difference may exist between the data of each business target, and the cropping operation of the first tag tensor can be performed on each business target respectively. Specifically, a single reduced vector is obtained by clipping a single label vector in each service target, and a single target noise conforming to differential privacy is added to obtain a single issue vector. Each of the issue vectors constitutes a second issue tensor. For a single business target, the first training member may perform a clipping operation based on a preset single clipping threshold to form a single reduction vector, and then add corresponding single target noise to the single reduction vector to form a single release tensor. The single clipping threshold here may be: consistent with the first clipping threshold, selected from the set of candidate thresholds, taking the average of each element in a single label vector, or taking the median of each element in a single label vector.

In other embodiments, the first label tensor can be further clipped by other manners, which is not described herein again. After clipping is completed, noise parameters of differential privacy can be determined according to corresponding clipping thresholds, and corresponding target noise is sampled based on noise distribution defined by the noise parameters. Under the differential privacy gaussian mechanism, the noise parameters may include a mean of 0 and a variance determined based on the privacy factor. In the case of clipping the tag tensor, the variance of the corresponding gaussian distribution can be defined using the privacy factor and the clipping threshold. For example, in the case where the respective label vectors corresponding to the respective traffic targets are respectively clipped, the variance in the noise parameter corresponding to a single label vector is related to a respective single clipping threshold C', such as: sigma ² ₀ ＝s ² C＇ ² 。

In this way, when the total privacy budget (e, δ) is met, the first training member may add noise to both the local first intermediate processing result and the first tag data (if any) and then release the result, so as to control the privacy loss of the longitudinal federal learning within (e, δ).

In reviewing the above process, the technical idea provided by the present specification is to clip and add noise satisfying differential privacy before a single training member releases data on training samples to a server side in a longitudinal federal learning process. The data can be preliminarily fuzzified by cutting, the data privacy can be protected, controllable noise can be added to the noise meeting the difference privacy under the preset privacy precaution, and the accuracy of the result is ensured. Due to the adoption of the mode, the data privacy can be effectively protected, and the privacy loss is controllable, so that the privacy and the accuracy of the central longitudinal federal learning can be improved, redundant operation is avoided, and the learning efficiency of the business model is improved.

According to an embodiment of another aspect, a longitudinal federal learning device for business models is also provided. The business model for performing the joint learning may include a global model provided to the server and local models provided to the training members. Wherein any one of the training members is referred to as a first training member, and the first training member is provided with a longitudinal federal learning device aiming at a business model.

FIG. 3 illustrates a vertical federal learning device 300 for business models in one embodiment. As shown in fig. 3, the longitudinal federal learning device 300 for business models includes:

the processing unit 31 is configured to process first feature data of a plurality of training samples of a local current batch through a local first local model to obtain a first intermediate tensor;

a clipping unit 32 configured to perform a clipping operation based on a preset first clipping threshold value on the first intermediate tensor to form a first reduced tensor;

the issuing unit 33 is configured to add a first target noise meeting the difference privacy to the first reduction tensor to obtain a first issuing tensor, and provide the first issuing tensor to the service side, so that the service side processes the first issuing tensor through the global model, and thus the global model and the first local model are trained.

It should be noted that the apparatus 300 shown in fig. 3 corresponds to the method described in fig. 1, and the corresponding description in the method embodiment of fig. 1 is also applicable to the apparatus 300, which is not repeated herein.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 1 and the like.

According to an embodiment of still another aspect, there is also provided a computing device including a memory and a processor, the memory having stored therein executable code, the processor implementing the method described in connection with fig. 1 and so on when executing the executable code.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above embodiments are only intended to be specific embodiments of the technical concept of the present disclosure, and should not be used to limit the scope of the technical concept of the present disclosure, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical concept of the embodiments of the present disclosure should be included in the scope of the technical concept of the present disclosure.

Claims

1. A longitudinal federated learning method for a business model that includes a global model provided to a server and a first local model provided to a first training member, the method performed by the first training member, comprising:

processing first characteristic data of a plurality of training samples of a current batch through a local first local model to obtain a first intermediate tensor;

performing a clipping operation based on a preset first clipping threshold value on the first intermediate tensor to form a first protocol tensor;

adding a first target noise conforming to differential privacy to the first reduction tensor to obtain a first distribution tensor to provide for the service party, and enabling the service party to process the first distribution tensor through the global model so as to train the global model and the first local model.

2. The method of claim 1, the clipping operation at a preset first clipping threshold comprising:

if the current norm value of the first intermediate tensor exceeds the first clipping threshold value, determining the proportion of the first clipping threshold value and the current norm value, and clipping the first intermediate tensor according to the proportion.

3. The method of claim 1, the first clipping threshold determined by one of:

selecting from a set of candidate thresholds;

taking the average value of the historical intermediate tensor determined in the historical period or the historical release tensor aiming at the historical intermediate tensor;

and taking the median of the historical intermediate tensor determined in the historical period or the historical release tensor aiming at the historical intermediate tensor.

4. The method of any of claims 1-3, said adding a first target noise compliant with differential privacy over said first reduced tensor comprises:

acquiring a noise factor determined based on a predetermined privacy budget;

determining a first noise parameter using the noise factor and the first clipping threshold;

the first target noise is sampled based on a first noise profile defined by the first noise parameter.

5. The method of claim 4, wherein the first noise distribution is a Gaussian distribution and the first noise parameter comprises a first variance corresponding to the Gaussian distribution.

6. The method of claim 5, wherein the first variance is positively correlated with an absolute value of a product of the noise factor times the first clipping threshold.

7. The method of claim 1, wherein the first training member further holds first label data for a number of training samples of a current batch, the first label data corresponding to a first label tensor, the method further comprising:

and adding a second target noise conforming to differential privacy on the first label tensor to obtain a second release tensor to provide for the service party, so that the service party determines a comparison result of an output result of the global model and the second release tensor to be used for training the global model and the first local model.

8. The method of claim 7, wherein the adding a second target noise that conforms to differential privacy on the first tag tensor comprises:

acquiring a noise factor determined based on a predetermined privacy budget;

determining a second noise parameter using the noise factor and a norm of the first label tensor;

sampling the second target noise based on a second noise profile defined by the second noise parameter.

9. The method of claim 7, wherein the first tag tensor is a multi-dimensional tensor for a plurality of business objects, a single business object corresponding to a single tag vector, and adding a second object noise that meets differential privacy on the first tag tensor to obtain a second release tensor, comprises:

adding target noises meeting the difference privacy respectively aiming at each label vector corresponding to each service target to obtain each release vector;

a second publication tensor is determined based on the respective publication vectors.

10. The method of claim 9, wherein for a single label vector corresponding to a single traffic target, a corresponding single target noise is determined by:

performing clipping operation based on a preset single clipping threshold value aiming at the single label vector to form a single reduction vector;

adding a corresponding single target noise to the single reduced vector to form a single release tensor.

11. The method of claim 10, wherein the single clipping threshold is determined by one of:

consistent with the first clipping threshold;

selecting from a set of candidate thresholds;

taking the average value of each element in the single label vector;

and taking the median of each element in the single label vector.

12. The method according to claim 9 or 10, wherein the single target noise conforms to a gaussian distribution, and the single noise parameter corresponding to the single target noise is a single variance of the gaussian distribution, the single variance being determined according to the single clipping threshold and a noise factor determined based on a predetermined privacy budget.

13. A longitudinal federated learning apparatus for a business model, the business model including a global model provided to a server and a first local model provided to a first training member, the apparatus provided to the first training member, comprising:

the issuing unit is configured to add a first target noise meeting difference privacy to the first reduction tensor to obtain a first issue tensor to provide to the service party, so that the service party processes the first issue tensor through the global model to train the global model and the first local model.

14. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-12.

15. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-12.