CN112052480A - Privacy protection method, system and related equipment in model training process - Google Patents

Privacy protection method, system and related equipment in model training process Download PDF

Info

Publication number
CN112052480A
CN112052480A CN202010953756.3A CN202010953756A CN112052480A CN 112052480 A CN112052480 A CN 112052480A CN 202010953756 A CN202010953756 A CN 202010953756A CN 112052480 A CN112052480 A CN 112052480A
Authority
CN
China
Prior art keywords
target
model
server
clients
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010953756.3A
Other languages
Chinese (zh)
Inventor
刘洋
李泽睿
张伟哲
徐睿峰
王轩
蒋琳
廖清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202010953756.3A priority Critical patent/CN112052480A/en
Publication of CN112052480A publication Critical patent/CN112052480A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a privacy protection method, a privacy protection system and related equipment in a model training process, which are used for realizing privacy protection in the model training process. The method provided by the embodiment of the invention comprises the following steps: receiving a target model sent by a server, and receiving a selection instruction sent by the server, wherein the selection instruction is used for indicating a part of randomly selected clients; the selected clients participate in the target model training with the probability P, and the unselected clients participate in the target model training with the probability (1-P); all the clients which are determined to participate in the target model training are used as target clients, the target models are trained by adopting local data respectively, and model parameter updating values of each trained target model are calculated; and processing each group of model parameter update values according to a preset differential privacy algorithm to generate update data, and returning the update data to the server, so that the server generates the global model of the training round according to all the update data.

Description

Privacy protection method, system and related equipment in model training process
Technical Field
The invention relates to the technical field of privacy protection in a model training process, in particular to a privacy protection method, a privacy protection system and related equipment in the model training process.
Background
The development and application of machine learning are based on the collection and analysis of big data, and fusion analysis of data of multiple data sources is often needed. In the fusion analysis scenario, the private data of each participant contains a large amount of private information, and the privacy is leaked by simply performing centralized collection and analysis on the data.
A federated learning mechanism provided by Google (Google) teams realizes privacy protection joint modeling, in each round of training process, a server selects part of clients to participate in training, a global model is issued, each client trains the model by using locally stored data and returns model parameter update values, and the server generates a final global model according to the return values.
However, the federated learning mechanism relies on a trusted server, otherwise the original data value returned to the server by the client is exposed to a malicious server. Secondly, the client return value is also easily intercepted during transmission, resulting in privacy disclosure.
Disclosure of Invention
The embodiment of the invention provides a privacy protection method, a privacy protection system and related equipment in a model training process, which are used for realizing privacy protection in the model training process.
The first aspect of the embodiments of the present invention provides a privacy protection method in a model training process, which may include:
receiving a target model sent by a server, and receiving a selection instruction sent by the server, wherein the selection instruction is used for indicating a part of randomly selected clients;
the selected clients participate in the target model training with the probability P, and the unselected clients participate in the target model training with the probability (1-P);
all the clients which are determined to participate in the target model training are used as target clients, the target models are trained by adopting local data respectively, and model parameter update values of each trained target model are calculated;
and processing each group of model parameter update values according to a preset differential privacy algorithm to generate update data, and returning the update data to the server, so that the server generates a global model of the current training according to all the update data.
Optionally, as a possible implementation manner, the privacy protection method in the model training process in the embodiment of the present invention may further include:
obtaining a privacy budget parameter according to the formula P ═ e)/(1+e) The probability P is calculated.
Optionally, as a possible implementation manner, in the embodiment of the present invention, the processing the update value of each group of model parameters according to the preset differential privacy algorithm to generate update data includes:
and generating Gaussian noise of each group of model parameter updating values by adopting a Gaussian noise mechanism, and performing superposition processing on the Gaussian noise of each group of model parameter updating values and the corresponding model parameter updating values to generate updating data.
Optionally, as a possible implementation manner, in the embodiment of the present invention, the generating gaussian noise for each set of model parameter update values by using a gaussian noise mechanism includes:
after each round of training of the target client, calculating the two norms of the matrix corresponding to the model parameter updating value, and calculating the average value S of the two norms of N rounds;
randomly generating variance of σ2S2Gaussian noise.
Optionally, as a possible implementation manner, the privacy protection method in the model training process in the embodiment of the present invention may further include:
and training the target model by adopting a batch gradient descent method, and setting the weight of Gaussian noise as 1/B, wherein B is the batch number.
Optionally, as a possible implementation manner, in the embodiment of the present invention, returning the update data to the server includes:
and overlapping the updating data returned by each target client by adopting an independent adder to obtain an overlapping value, and returning the overlapping value to the server.
Optionally, as a possible implementation manner, the privacy protection method in the model training process in the embodiment of the present invention may further include:
the server calculates the expected value M of the number of the target clients according to the probability P;
and the server side calculates the mean value of the model parameter updating values according to the expected value M and the superposition value.
The second aspect of the embodiment of the invention provides a privacy protection system in a model training process, which comprises a client and a server, wherein the client is used for receiving a module and receiving a target model sent by the server, and receiving a selection instruction sent by the server, and the selection instruction is used for indicating a part of randomly selected clients;
the selected clients participate in the target model training with the probability P, and the unselected clients participate in the target model training with the probability (1-P);
all the clients which are determined to participate in the target model training are used as target clients, the target clients respectively adopt local data to train the target models, and model parameter update values of each trained target model are calculated;
and processing each group of model parameter update values according to a preset differential privacy algorithm to generate update data, and returning the update data to the server, so that the server generates a global model of the current training according to all the update data.
Optionally, as a possible implementation manner, the privacy protection system in the embodiment of the present invention may further include:
a calculating module for obtaining the privacy budget parameter according to the formula P ═ (e))/(1+e) The probability P is calculated.
Optionally, as a possible implementation manner, the client in the embodiment of the present invention may be further configured to: and generating Gaussian noise of each group of model parameter updating values by adopting a Gaussian noise mechanism, and performing superposition processing on the Gaussian noise of each group of model parameter updating values and the corresponding model parameter updating values to generate updating data.
Optionally, as a possible implementation manner, the client in the embodiment of the present invention may be further configured to:
after each round of training of the target client, calculating the two norms of the matrix corresponding to the model parameter updating value, and calculating the average value S of the two norms of N rounds;
randomly generating variance of σ2S2Gaussian noise.
Optionally, as a possible implementation manner, the client in the embodiment of the present invention may be further configured to:
and training the target model by adopting a batch gradient descent method, and setting the weight of Gaussian noise as 1/B, wherein B is the batch number.
Optionally, as a possible implementation manner, the privacy protection system in the embodiment of the present invention may further include an independent adder, where the independent adder is configured to add up the update data returned by each target client to obtain an added value, and return the added value to the server.
Optionally, as a possible implementation manner, the server in the embodiment of the present invention may be configured to: calculating an expected value M of the number of the target clients according to the probability P; and calculating the mean value of the updated values of the model parameters according to the expected value M and the superposition value.
A third aspect of embodiments of the present invention provides a computer apparatus, which includes a processor, and the processor is configured to implement the steps in any one of the possible implementation manners of the first aspect and the first aspect when executing a computer program stored in a memory.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in any one of the possible implementations of the first aspect and the first aspect.
According to the technical scheme, the embodiment of the invention has the following advantages:
in the embodiment of the invention, the server side can randomly select part of the clients, and all the clients randomly respond to the training request; and processing the model parameter updating value of each trained target model according to a preset differential privacy algorithm to generate updating data, and returning the updating data to the server, so that the server generates the global model of the current training according to all the updating data. Compared with the prior art, all clients randomly respond to training requests, cannot determine client lists participating in training, can effectively prevent private data from being leaked, and the data returned to the server are processed through a differential privacy algorithm, so that the private data leakage caused by intercepting the data can be effectively prevented.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a privacy protection method in a model training process according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an embodiment of a specific application of a privacy protection method in a model training process according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of another embodiment of the application of the privacy protection method in the model training process according to the embodiment of the present invention;
FIG. 4 is a diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a privacy protection method, a privacy protection system and related equipment in a model training process, which are used for realizing.
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the existing federal learning mechanism, a model training process is dispersed to a plurality of client equipment local areas, and in each training process, a server selects part of clients to participate in training and issues a global model. Each client trains the model by using the locally stored data and returns model update values, and the server generates a final global model according to the return values. If the server is not credible, the original data value returned to the server by the client is exposed to the malicious server; secondly, the client return value is also easily intercepted during transmission, resulting in privacy disclosure. The embodiment of the invention disturbs the selection mechanism of the participating client to realize the random response of the client, thereby ensuring that a server and an attacker cannot determine whether a specific client participates in training; and the parameter updating value is subjected to differential privacy protection locally at the client, so that the server and an attacker cannot deduce the original data of the client.
For convenience of understanding, a specific flow in the embodiment of the present invention will be described below from a client side, and referring to fig. 1, an embodiment of a method for privacy protection in a model training process in the embodiment of the present invention may include:
101. receiving a target model sent by a server, and receiving a selection instruction sent by the server, wherein the selection instruction is used for indicating a part of randomly selected clients;
on the basis of the existing federal learning mechanism, in order to prevent the security risk caused by the leakage of the selected clients, the server preliminarily and randomly selects part of the clients to participate in the training of the target model. Optionally, as a possible implementation manner, the server may send a selection instruction to the randomly selected client, where the selection instruction is used to indicate that part of the clients are randomly selected, and the clients that do not receive the selection instruction may default to be not selected.
102. The selected clients participate in the target model training with the probability P, and the unselected clients participate in the target model training with the probability (1-P);
in order to further prevent privacy disclosure, in the embodiment of the invention, selected clients participate in the target model training with the probability P, and unselected clients participate in the target model training with the probability (1-P). Through the operation, the server and the external equipment cannot accurately know which client participates in the current round of training, so that privacy disclosure can be prevented. For example, when P is 0.7, the selected client determines whether to participate in the current round of training of the target model with a probability of 0.7, and the unselected client determines whether to participate in the training of the target model with a probability of 0.3.
The probability P may be set according to a user requirement, or may be set randomly, which is not limited herein. Optionally, as a possible implementation manner, the client may obtain the privacy budget parameter set by the user according to the formula P ═ e (where P is a set value of the privacy budget parameter set by the user))/(1+e) A probability P is calculated, wherein a larger value of the privacy budget parameter within a set range represents a looser privacy protection, e.g. within a range of 0-10,values may be 0.2, 0.4, 0.6, 0.8, 1, 2, 4, 6, 8, 10, etc.
103. All the clients which are determined to participate in the target model training are used as target clients, the target models are trained by adopting local data respectively, and model parameter updating values of each trained target model are calculated;
after the client determines to participate in the current round of training, the client can be used as a target client. The target client may respectively train the target model using the local data, and calculate a model parameter update value of each trained target model. Wherein, the model parameter updating value is often stored in the form of a matrix.
104. And processing each group of model parameter update values according to a preset differential privacy algorithm to generate update data, and returning the update data to the server, so that the server generates the global model of the training round according to all the update data.
In order to prevent privacy leakage in the data transmission process, in the embodiment of the present invention, the client may process each set of model parameter update values according to a preset differential privacy algorithm to generate update data, and return the update data to the server, so that the server generates a global model of the current round of training according to all the update data.
The differential privacy algorithm is a provable privacy protection mechanism, and an attacker cannot deduce a specific record by adding a proper amount of noise to aggregated data. The specific differential privacy implementation mechanism is not limited herein.
As an exemplary possible implementation manner, processing each set of model parameter update values according to the preset differential privacy algorithm to generate update data may include: and generating Gaussian noise of each group of model parameter updating values by adopting a Gaussian noise mechanism, and performing superposition processing on the Gaussian noise of each group of model parameter updating values and the corresponding model parameter updating values to generate updating data.
In the embodiment of the invention, the server side can randomly select part of the clients, and all the clients randomly respond to the training request; and processing the model parameter updating value of each trained target model according to a preset differential privacy algorithm to generate updating data, and returning the updating data to the server, so that the server generates the global model of the current training according to all the updating data. Compared with the prior art, all clients randomly respond to training requests, cannot determine client lists participating in training, can effectively prevent private data from being leaked, and the data returned to the server are processed through a differential privacy algorithm, so that the private data leakage caused by intercepting the data can be effectively prevented.
On the basis of the above embodiment shown in fig. 1, optionally, as a possible implementation manner, generating gaussian noise for each set of model parameter update values by using a gaussian noise mechanism may include: after each round of training of the target client, calculating the two norms of the matrix corresponding to the model parameter updating value, and calculating the average value S of the two norms of N rounds; randomly generating variance of σ2S2Gaussian noise.
In practical application, local data of a client in a round of training process can be divided into B (batch gradient descent training mode can be adopted in each round, local data is divided into B batches, and B is not less than a positive integer of 1) batches for multi-round training, after each round of training of a target client, two norms of a matrix corresponding to a model parameter updating value are calculated, and an average value S of the two norms of N rounds is calculated, wherein sigma is a preset gaussian noise parameter and can be set within a preset range, and the larger sigma is, the larger noise is, and the more strict privacy protection is represented.
In practical application, in order to retain the noise in the same direction as the gradient direction as much as possible, in the embodiment of the present invention, a batch gradient descent method may be adopted in the client to train the target model, and a weight of 1/B is added to the gaussian noise, where B is the batch number.
On the basis of the above embodiment, in order to prevent the server from obtaining the list of the clients that finally participate in the current round of training through the communication of the clients, optionally, an independent adder is arranged at the server in the embodiment of the present invention to superimpose the update data returned by each target client to obtain a superimposed value, and the superimposed value is returned to the server. After the superposition value is obtained in the above mode, the server cannot acquire the list of the clients participating in training and the number of the clients, so that the leakage of privacy data can be effectively prevented.
After the server side obtains the superimposed value returned by the adder, if the server side needs to obtain the mean value of the model parameter update values generated in the current round of training, the server side can calculate the expected value M of the number of the target client sides according to the probability P, and then calculate the mean value of the model parameter update values according to the expected value M and the superimposed value. Specifically, the expected value M is K ═ K [ P × K '/K + (1-K '/K) × (1-P) ], where K is the total number of clients and K ' is the number of randomly selected partial clients of the server.
In actual application, model training can be realized based on multiple rounds of training, the number of rounds of training can be calculated according to a fixed sum sigma after each round of training is finished according to a privacy budget parameter, a privacy tolerance and a Gaussian noise parameter sigma (and a parameter for difference privacy, sigma is a Gaussian noise parameter) set by a user. When the boundary is reached (the boundary is determined by the total number of the clients and the data amount of the clients, and the specific calculation process can be determined by referring to the Moments Accountant algorithm and the time Accountant algorithm), the training is stopped to ensure the privacy. By the method, the training turns under the fixed privacy budget can reach the maximum value, and the relationship between the usability and the privacy of the model is well balanced.
For convenience of understanding, the privacy protection method in the model training process in the embodiment of the present invention will be described below with reference to a specific application embodiment, please refer to fig. 2, which is a specific embodiment that may specifically include the following steps:
201. the server side issues the model to be trained to each client side;
because the model training needs a plurality of rounds, in each training round, the model needing to be trained is usually the global model obtained in the previous round of training;
202. the server randomly samples the clients, and selects a part of the clients from all the clients to participate in training;
part of client data is enough to support the completion of training, and meanwhile, the random sampling operation can reduce the privacy disclosure risk to a certain extent;
203. the client changes the response state so as to disturb the actual participation situation;
under the disturbance, the client selected by the server may not participate in training, and the client not selected may participate in training.
204. The client side which finally responds trains the issued global model by using local private data, and calculates a model update value;
205. the client adaptively adds noise to the model update value to realize differential privacy protection, and returns;
206. the independent adder aggregates return values of the clients and transmits the return values to the server;
and an adder is used for receiving and aggregating return values, so that the server side cannot know which clients are trained.
207. And the server estimates the number of the actual participating clients and performs average operation on the return values of the clients so as to complete the update of the global model.
Referring to fig. 3, the second embodiment may specifically include the following steps:
301. client random response and adaptive noise addition;
in each round of training process, the server randomly selects part of the clients to participate in the training. We adjust this server-dominated client selection mechanism. After sampling at the server end, each client end completes state parameter initialization according to sampling results (the state parameter is 1 to indicate that the client end participates in model training, and 0 indicates that training is not performed), the client end is set to be 1 when selected by the server end, and the client end is set to be 0 when not selected. The response probability P (P ═ (e)) is calculated from the privacy budget (the degree of privacy disclosure allowed is generally expressed by the privacy budget, and is a variable that needs to be set up))/(1+e) The random response satisfies differential privacy). Each client keeps the original state parameter unchanged with the probability of P, and simultaneously overturns the state parameter with the probability of (1-P). Client with state parameter 1 utilizes itAnd completing model training by local data, and adding noise to the obtained model update value. And the client in state 0 sets the update value to 0. Finally, each client returns an update value to the adder.
302. Adaptively noise adding;
differential privacy is achieved using a Gaussian mechanism where the noise variance is σ2S2. Where σ is adjusted prior to training. After each round of training of the client, the model parameter update value is calculated and the two norms of the model parameter update value are calculated. At the end of training, S is set to the mean of these two norms. Thereby adjusting the noise addition according to the actual update value of each client. When the update value is too large, the amount of noise becomes large, so as to reduce the correlation between the update value and the original data. Meanwhile, we add a weight 1/B to the gaussian noise, where B is the batch number (B is the batch number of the client data slicing, used to represent the client data amount), to preserve the same noise as the gradient direction as much as possible.
303. The actual number of participating clients is estimated.
Assuming that the total number of the clients is K, the server samples K' clients to participate in training. In the case of a response probability of P, the probability of each client participating in the training may be expressed as [ P × K '/K + (1-K'/K) × (1-P) ]. The number of actually participating training clients can be estimated as K P K/K + (1-K'/K) × (1-P) ]. By constructing the maximum likelihood function, this value can be proven to be a maximum likelihood estimate of the actual number of participating clients.
304. Privacy boundary computation
Given a privacy budget, a privacy tolerance, and a noise parameter σ (and is a parameter for differential privacy, σ is a parameter for gaussian noise), at the end of each round of training, a calculation is made based on the fixed sum σ. When the boundary is reached (the boundary is determined by the total number of the clients and the data amount of the clients, and the specific calculation process can be determined by referring to the Moments Accountant algorithm and the time Accountant algorithm), the training is stopped to ensure the privacy. By the method, the training round under the fixed privacy budget can reach the maximum value, and the relationship between the usability and the privacy of the model is well balanced.
In the embodiment, the random response mechanism is adopted to realize the hiding of the participation condition of the client to the server, so that the privacy of the client can be well protected; (due to disturbance of response probability, the number of the clients finally participating in training fluctuates up and down in the sampling number of the server, but experiments prove that the convergence of the global model is not influenced), the local self-adaptive differential privacy mechanism locally completes the protection operation of the model updating value at the client, so that the privacy leakage risk of the original updating value in the transmission process is reduced, and the infringement of an untrusted server can be avoided.
The embodiment of the invention also provides a privacy protection system, which comprises a client and a server, wherein the client is used for receiving the module and receiving the target model sent by the server and receiving the selection instruction sent by the server, and the selection instruction is used for indicating a part of randomly selected clients;
the selected clients participate in the target model training with the probability P, and the unselected clients participate in the target model training with the probability (1-P);
all the clients which are determined to participate in the target model training are used as target clients, the target clients respectively adopt local data to train the target models, and model parameter update values of each trained target model are calculated;
and processing each group of model parameter update values according to a preset differential privacy algorithm to generate update data, and returning the update data to the server, so that the server generates the global model of the training round according to all the update data.
Optionally, as a possible implementation manner, the privacy protection system in the embodiment of the present invention may further include:
a calculating module for obtaining the privacy budget parameter according to the formula P ═ (e))/(1+e) The probability P is calculated.
Optionally, as a possible implementation manner, the client in the embodiment of the present invention may be further configured to: and generating Gaussian noise of each group of model parameter updating values by adopting a Gaussian noise mechanism, and performing superposition processing on the Gaussian noise of each group of model parameter updating values and the corresponding model parameter updating values to generate updating data.
Optionally, as a possible implementation manner, the client in the embodiment of the present invention may be further configured to:
after each round of training of the target client, calculating the two norms of the matrix corresponding to the model parameter updating value, and calculating the average value S of the two norms of N rounds;
randomly generating variance of σ2S2Gaussian noise.
Optionally, as a possible implementation manner, the client in the embodiment of the present invention may be further configured to:
and training the target model by adopting a batch gradient descent method, and setting the weight of Gaussian noise as 1/B, wherein B is the batch number.
Optionally, as a possible implementation manner, the privacy protection system in the embodiment of the present invention may further include an independent adder, where the independent adder is configured to add the update data returned by each target client to obtain an added value, and return the added value to the server.
Optionally, as a possible implementation manner, the server in the embodiment of the present invention may be configured to: calculating an expected value M of the number of the target clients according to the probability P; and calculating the mean value of the updated values of the model parameters according to the expected value M and the superposition value.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
With reference to fig. 4, a privacy protection system in a model training process in an embodiment of the present invention is described above from the perspective of a modular functional entity, and a computer apparatus in an embodiment of the present invention is described below from the perspective of hardware processing:
the computer device 1 may include a memory 11, a processor 12 and an input output bus 13. The processor 11, when executing the computer program, implements the steps in the embodiment of the privacy protection method in the model training process shown in fig. 1, for example, the steps 101 to 104 shown in fig. 1. Alternatively, the processor, when executing the computer program, implements the functions of each module or unit in the above-described device embodiments.
In some embodiments of the present invention, the processor is specifically configured to implement the following steps:
receiving a target model sent by a server, and receiving a selection instruction sent by the server, wherein the selection instruction is used for indicating a part of randomly selected clients;
the selected clients participate in the target model training with the probability P, and the unselected clients participate in the target model training with the probability (1-P);
all the clients which are determined to participate in the target model training are used as target clients, the target models are trained by adopting local data respectively, and model parameter updating values of each trained target model are calculated;
and processing each group of model parameter update values according to a preset differential privacy algorithm to generate update data, and returning the update data to the server, so that the server generates the global model of the training round according to all the update data.
Optionally, as a possible implementation manner, the processor may be further configured to implement the following steps:
obtaining a privacy budget parameter according to the formula P ═ e)/(1+e) The probability P is calculated.
Optionally, as a possible implementation manner, the processor may be further configured to implement the following steps:
and generating Gaussian noise of each group of model parameter updating values by adopting a Gaussian noise mechanism, and performing superposition processing on the Gaussian noise of each group of model parameter updating values and the corresponding model parameter updating values to generate updating data.
Optionally, as a possible implementation manner, the processor may be further configured to implement the following steps:
after each round of training of the target client, calculating the two norms of the matrix corresponding to the model parameter updating value, and calculating the average value S of the two norms of N rounds;
random generatorResultant variance is σ2S2Gaussian noise.
Optionally, as a possible implementation manner, the processor may be further configured to implement the following steps:
and training the target model by adopting a batch gradient descent method, and setting the weight of Gaussian noise as 1/B, wherein B is the batch number.
Optionally, as a possible implementation manner, the processor may be further configured to implement the following steps:
and overlapping the updating data returned by each target client by adopting an independent adder to obtain an overlapped value, and returning the overlapped value to the server.
Optionally, as a possible implementation manner, the processor may be further configured to implement the following steps:
calculating an expected value M of the number of the target clients according to the probability P;
and calculating the mean value of the updated values of the model parameters according to the expected value M and the superposition value.
The memory 11 includes at least one type of readable storage medium, and the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the computer device 1, for example a hard disk of the computer device 1. The memory 11 may also be an external storage device of the computer apparatus 1 in other embodiments, such as a plug-in hard disk provided on the computer apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit and an external storage device of the computer apparatus 1. The memory 11 may be used not only to store application software installed in the computer apparatus 1 and various types of data, such as codes of the computer program 01, but also to temporarily store data that has been output or is to be output.
The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 11 or Processing data, such as executing the computer program 01.
The input/output bus 13 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc.
Further, the computer apparatus may further include a wired or wireless network interface 14, and the network interface 14 may optionally include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the computer apparatus 1 and other electronic devices.
Optionally, the computer device 1 may further include a user interface, the user interface may include a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally, the user interface may further include a standard wired interface and a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the computer device 1 and for displaying a visualized user interface.
Fig. 4 shows only the computer arrangement 1 with the components 11-14 and the computer program 01, it being understood by a person skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the computer arrangement 1, but may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
The present invention also provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of:
receiving a target model sent by a server, and receiving a selection instruction sent by the server, wherein the selection instruction is used for indicating a part of randomly selected clients;
the selected clients participate in the target model training with the probability P, and the unselected clients participate in the target model training with the probability (1-P);
all the clients which are determined to participate in the target model training are used as target clients, the target models are trained by adopting local data respectively, and model parameter updating values of each trained target model are calculated;
and processing each group of model parameter update values according to a preset differential privacy algorithm to generate update data, and returning the update data to the server, so that the server generates the global model of the training round according to all the update data.
Optionally, as a possible implementation manner, the processor may be further configured to implement the following steps:
obtaining a privacy budget parameter according to the formula P ═ e)/(1+e) The probability P is calculated.
Optionally, as a possible implementation manner, the processor may be further configured to implement the following steps:
and generating Gaussian noise of each group of model parameter updating values by adopting a Gaussian noise mechanism, and performing superposition processing on the Gaussian noise of each group of model parameter updating values and the corresponding model parameter updating values to generate updating data.
Optionally, as a possible implementation manner, the processor may be further configured to implement the following steps:
after each round of training of the target client, calculating the two norms of the matrix corresponding to the model parameter updating value, and calculating the average value S of the two norms of N rounds;
randomly generating variance of σ2S2Gaussian noise.
Optionally, as a possible implementation manner, the processor may be further configured to implement the following steps:
and training the target model by adopting a batch gradient descent method, and setting the weight of Gaussian noise as 1/B, wherein B is the batch number.
Optionally, as a possible implementation manner, the processor may be further configured to implement the following steps:
and overlapping the updating data returned by each target client by adopting an independent adder to obtain an overlapped value, and returning the overlapped value to the server.
Optionally, as a possible implementation manner, the processor may be further configured to implement the following steps:
calculating an expected value M of the number of the target clients according to the probability P;
and calculating the mean value of the updated values of the model parameters according to the expected value M and the superposition value.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A privacy protection method in a model training process is characterized by comprising the following steps:
receiving a target model sent by a server, and receiving a selection instruction sent by the server, wherein the selection instruction is used for indicating a part of randomly selected clients;
the selected clients participate in the target model training with the probability P, and the unselected clients participate in the target model training with the probability (1-P);
all the clients which are determined to participate in the target model training are used as target clients, the target models are trained by adopting local data respectively, and model parameter update values of each trained target model are calculated;
and processing each group of model parameter update values according to a preset differential privacy algorithm to generate update data, and returning the update data to the server, so that the server generates a global model of the current training according to all the update data.
2. The method of claim 1, further comprising:
obtaining a privacy budget parameter according to the formula P ═ e)/(1+e) The probability P is calculated.
3. The method according to claim 1 or 2, wherein the processing each set of model parameter update values according to the preset differential privacy algorithm to generate update data comprises:
and generating Gaussian noise of each group of model parameter updating values by adopting a Gaussian noise mechanism, and performing superposition processing on the Gaussian noise of each group of model parameter updating values and the corresponding model parameter updating values to generate updating data.
4. The method of claim 3, wherein generating the Gaussian noise for each set of model parameter updates using the Gaussian noise mechanism comprises:
after each round of training of the target client, calculating the two norms of the matrix corresponding to the model parameter updating value, and calculating the average value S of the two norms of N rounds;
randomly generating variance of σ2S2Gaussian noise.
5. The method of claim 4, further comprising:
and training the target model by adopting a batch gradient descent method, and setting the weight of Gaussian noise as 1/B, wherein B is the batch number.
6. The method of claim 5, wherein returning the update data to the server comprises:
and overlapping the updating data returned by each target client by adopting an independent adder to obtain an overlapping value, and returning the overlapping value to the server.
7. The method of claim 6, further comprising:
the server calculates the expected value M of the number of the target clients according to the probability P;
and the server side calculates the mean value of the model parameter updating values according to the expected value M and the superposition value.
8. The privacy protection system is characterized by comprising a client and a server, wherein the client is used for receiving a module for receiving a target model sent by the server and receiving a selection instruction sent by the server, and the selection instruction is used for indicating a part of randomly selected clients;
the selected clients participate in the target model training with the probability P, and the unselected clients participate in the target model training with the probability (1-P);
all the clients which are determined to participate in the target model training are used as target clients, the target clients respectively adopt local data to train the target models, and model parameter update values of each trained target model are calculated;
and processing each group of model parameter update values according to a preset differential privacy algorithm to generate update data, and returning the update data to the server, so that the server generates a global model of the current training according to all the update data.
9. A computer arrangement, characterized in that the computer arrangement comprises a processor for implementing the steps of the method according to any one of claims 1 to 7 when executing a computer program stored in a memory.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implementing the steps of the method according to any one of claims 1 to 7.
CN202010953756.3A 2020-09-11 2020-09-11 Privacy protection method, system and related equipment in model training process Pending CN112052480A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010953756.3A CN112052480A (en) 2020-09-11 2020-09-11 Privacy protection method, system and related equipment in model training process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010953756.3A CN112052480A (en) 2020-09-11 2020-09-11 Privacy protection method, system and related equipment in model training process

Publications (1)

Publication Number Publication Date
CN112052480A true CN112052480A (en) 2020-12-08

Family

ID=73611201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010953756.3A Pending CN112052480A (en) 2020-09-11 2020-09-11 Privacy protection method, system and related equipment in model training process

Country Status (1)

Country Link
CN (1) CN112052480A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580106A (en) * 2021-01-26 2021-03-30 证通股份有限公司 Multi-source data processing system and multi-source data processing method
CN113282960A (en) * 2021-06-11 2021-08-20 北京邮电大学 Privacy calculation method, device, system and equipment based on federal learning
CN113762525A (en) * 2021-09-07 2021-12-07 桂林理工大学 Federal learning model training method with differential privacy protection
CN114239860A (en) * 2021-12-07 2022-03-25 支付宝(杭州)信息技术有限公司 Model training method and device based on privacy protection
CN114662705A (en) * 2022-03-18 2022-06-24 腾讯科技(深圳)有限公司 Federal learning method, device, electronic equipment and computer readable storage medium
CN117557870A (en) * 2024-01-08 2024-02-13 之江实验室 Classification model training method and system based on federal learning client selection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147188A1 (en) * 2017-11-16 2019-05-16 Microsoft Technology Licensing, Llc Hardware protection for differential privacy
CN111611610A (en) * 2020-04-12 2020-09-01 西安电子科技大学 Federal learning information processing method, system, storage medium, program, and terminal

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147188A1 (en) * 2017-11-16 2019-05-16 Microsoft Technology Licensing, Llc Hardware protection for differential privacy
CN111611610A (en) * 2020-04-12 2020-09-01 西安电子科技大学 Federal learning information processing method, system, storage medium, program, and terminal

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580106A (en) * 2021-01-26 2021-03-30 证通股份有限公司 Multi-source data processing system and multi-source data processing method
CN113282960A (en) * 2021-06-11 2021-08-20 北京邮电大学 Privacy calculation method, device, system and equipment based on federal learning
CN113282960B (en) * 2021-06-11 2023-02-17 北京邮电大学 Privacy calculation method, device, system and equipment based on federal learning
CN113762525A (en) * 2021-09-07 2021-12-07 桂林理工大学 Federal learning model training method with differential privacy protection
CN113762525B (en) * 2021-09-07 2024-04-05 桂林理工大学 Federal learning model training method with differential privacy protection
CN114239860A (en) * 2021-12-07 2022-03-25 支付宝(杭州)信息技术有限公司 Model training method and device based on privacy protection
CN114662705A (en) * 2022-03-18 2022-06-24 腾讯科技(深圳)有限公司 Federal learning method, device, electronic equipment and computer readable storage medium
CN117557870A (en) * 2024-01-08 2024-02-13 之江实验室 Classification model training method and system based on federal learning client selection
CN117557870B (en) * 2024-01-08 2024-04-23 之江实验室 Classification model training method and system based on federal learning client selection

Similar Documents

Publication Publication Date Title
CN112052480A (en) Privacy protection method, system and related equipment in model training process
CN111477290B (en) Federal learning and image classification method, system and terminal for protecting user privacy
JP6585301B2 (en) Dynamic update of CAPTCHA challenge
EP2998848B1 (en) Method, device, and apparatus for controlling screen rotation
CN104836781B (en) Distinguish the method and device for accessing user identity
CN110610093A (en) Distributed training in parametric datasets
CN114819190A (en) Model training method, device, system and storage medium based on federal learning
WO2010063878A1 (en) Methods, apparatuses, and computer program products in social services
CN112994981B (en) Method and device for adjusting time delay data, electronic equipment and storage medium
US20170187725A1 (en) User verification
CN108920037B (en) Method and device for displaying virtual three-dimensional space of house
CN110874638B (en) Behavior analysis-oriented meta-knowledge federation method, device, electronic equipment and system
CN107948177A (en) Verify the generation method and device of questionnaire
CN110222297B (en) Identification method of tag user and related equipment
KR102445530B1 (en) Method and apparatus for visualization of public welfare activities
CN108696530B (en) Online encrypted data security assessment method and device
CN113610535B (en) Risk monitoring method and device suitable for consumption stage business process
CN113361380A (en) Human body key point detection model training method, detection method and device
CN111538410A (en) Method and device for determining target algorithm in VR scene and computing equipment
CN111695012A (en) Method and device for acquiring internet surfing information, electronic equipment and computer storage medium
CN105871834B (en) A kind of method and apparatus calculating malice index
CN110580200A (en) Data synchronization method and device
CN112016123B (en) Verification method and device of privacy protection algorithm and electronic equipment
CN111386546B (en) System and method for assessing risk
US20160321439A1 (en) Connection Prediction As Identity Verification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination