CN113204787B

CN113204787B - Block chain-based federated learning privacy protection method, system, device and medium

Info

Publication number: CN113204787B
Application number: CN202110493191.XA
Authority: CN
Inventors: 殷丽华; 孙哲; 冯纪元; 操志强; 李超; 李然
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2021-05-06
Filing date: 2021-05-06
Publication date: 2022-05-31
Anticipated expiration: 2041-05-06
Also published as: CN113204787A

Abstract

The invention provides a block chain-based federal learning privacy protection method, a system, equipment and a medium, wherein a main public/private key, a decryption key and an encryption key of each node are generated by a trusted third party according to a weight vector of each node, the main public key, the decryption key and the encryption key are sent to each node, each node downloads an initial model for training after an initial block is created by a main node and written in an initial model for issuing, each node uploads the initial model to a block chain after obtaining an encryption model by encryption of the encryption key, each node competes for generating the power of a model aggregation block after all the encryption models of each node are uploaded to the block chain, the node obtaining the power aggregates the encryption models of each node to generate a global model according to the main public key and the decryption key, uploads the global model to the block chain, and then the main node downloads the global model and performs ideal model judgment, the protection of node sources and model content privacy is increased, the service calculation cost is reduced, and the learning efficiency and the service quality are improved.

Description

Block chain-based federated learning privacy protection method, system, device and medium

Technical Field

The invention relates to the technical field of federal learning, in particular to a block chain-based federal learning privacy protection method, a block chain-based federal learning privacy protection system, computer equipment and a storage medium.

Background

The federated learning is a machine learning architecture which decomposes centralized machine learning into distributed machine learning, and a method for distributing machine learning tasks to terminal equipment nodes for learning and then aggregating gradient results generated by all terminal equipment node learning to obtain a final training result effectively helps each user break a data island, and performs extensive and deep machine learning research under the condition of meeting the requirements of user privacy protection, data safety and government regulations. Although federal learning has been widely applied in the fields of digital image processing, natural language processing, text voice processing and the like, traditional federal learning completely depends on a central server when calculating and updating a model, and once the central server is attacked, the problem that the whole training process cannot be normally carried out is always troubling users and becomes a problem for research of numerous scholars.

The existing solutions are the federal learning method of P2P and the federal learning method based on blockchains. Although the federal learning method of P2P can solve the problem that the central server is attacked, which will cause the whole training process to be unable to be normally performed, it will increase the communication pressure between the participants (terminal device nodes), so there are learners who propose a block chain-based federal learning method, which can ensure the credibility and traceability of the participants while reducing the communication pressure between the participants, but it still has the following disadvantages: 1) only the data in the block chain can be guaranteed not to be tampered, and the privacy of the data content in the block chain in the federal study cannot be protected; 2) neglecting the protection of client weight in the process of the CoPont learning training, an attacker can indirectly deduce the source of the training data with a certain probability according to the result of the model analysis; 3) the computational cost of the aggregation model is large for the server.

Therefore, it is desirable to provide a federate learning method that increases protection on client attribute source privacy and model data content privacy, reduces calculation cost of a service provider, and improves federate learning efficiency and model service quality on the basis of ensuring the existing federate learning advantages based on a blockchain.

Disclosure of Invention

The invention aims to provide a block chain-based federated learning privacy protection method, which can be used for solving the problems that the existing block chain-based federated learning can only ensure that the data in a block chain is not tampered, the privacy of the data content in the block chain in the federated learning time zone cannot be protected, the protection of the client weight is neglected in the training process, and the calculation cost of a server aggregation model is high, and simultaneously can be used for improving the federated learning efficiency and the model service quality.

In order to achieve the above object, it is necessary to provide a block chain-based federal learning privacy protection method, system, computer device and storage medium for solving the above technical problems.

In a first aspect, an embodiment of the present invention provides a block chain-based federal learning privacy protection method, where the method includes the following steps:

generating a main public key, a main private key, a decryption key and an encryption key of each node by a trusted third party according to the weight vector of each node in advance, and sending the main public key, the decryption key and the encryption key to the corresponding nodes; the encryption key is private to each node;

creating an initial block by a main node, writing an initial model into the initial block and issuing the initial model; the initial model is obtained based on public data set training;

downloading the initial model by each node, training to obtain a local model, encrypting the local model by adopting the encryption key to obtain an encryption model and uploading the encryption model to a block chain;

responding to the fact that all the encryption models of all the nodes are uploaded to a block chain, enabling all the nodes to compete for the right of generating the aggregation block, enabling the nodes which obtain the right of generating the aggregation block to aggregate the encryption models of all the nodes according to the main public key and the decryption key to generate a global model, and uploading the global model to the block chain;

and downloading the global model by the main node, judging whether the global model is an ideal model, if so, stopping iteration, otherwise, entering the next round of training.

Further, the step of generating, by a trusted third party, a master public key, a master private key, a decryption key, and an encryption key of each node in advance according to the weight vector of each node includes:

sending a preset weight vector to the trusted third party by each node;

generating a weight matrix by the trusted third party according to the weight vector of each node, and generating the master public key and the master private key according to the weight matrix;

and generating the decryption key and the encryption key by the trusted third party according to the weight matrix and the main private key.

Further, the step of downloading the initial model by each node, training to obtain a local model, encrypting the local model by using the encryption key to obtain an encryption model, and uploading the encryption model to the block chain includes:

downloading the initial block by each node and obtaining the initial model;

training the initial model according to local data to obtain the local model;

according to the encryption key, operating an encryption algorithm of multi-input function encryption on the local model to obtain the encryption model;

generating the right of an encryption model block according to the competition of a consensus mechanism of the block chain, writing the encryption model, an aggregation code and the iteration times into the encryption model block, and uploading the encryption model block to the block chain; the aggregation code is a first preset value.

Further, the step of responding to the encryption model of each node being uploaded to the block chain in full, competing the right of generating the aggregation block by each node, and aggregating the encryption models of each node according to the master public key and the decryption key by the node obtaining the right of generating the aggregation block to generate a global model and uploading the global model to the block chain comprises:

each node competes for generating a model aggregation block right according to a common identification mechanism of the block chain, and the node which obtains the model aggregation block right downloads the encryption model blocks corresponding to other nodes to obtain the encryption models corresponding to other nodes;

according to the main public key and the decryption key, a decryption algorithm of multi-input function encryption is operated on the encryption model of each node to obtain the global model, and an uplink request is sent to the block chain, so that the block chain performs block packing processing and broadcast consensus verification on the global model;

in response to the uplink request, determining a block chain storage address of the global model by a full node of the block chain, generating the model aggregation block, and broadcasting the model aggregation block to other full nodes in the block chain for consensus check; the model aggregation block records the global model and the block chain storage address;

in response to the success of the consensus check, the full node stores the global model, the aggregation code and the iteration times to the block chain storage address, and broadcasts the model aggregation block to each node of the block chain for synchronization; and setting the aggregation code as a second preset value.

Further, the step of downloading the encryption model block corresponding to the other node by the node which obtains the authority to generate the model aggregation block and acquiring the encryption model corresponding to the other node includes:

judging whether the encryption model block meets the aggregation requirement or not; the aggregation requirement is that the aggregation codes are all first preset values, and the iteration times are all current iteration times;

and if the encryption model block meets the aggregation requirement, acquiring the encryption model corresponding to the encryption model block.

Further, the downloading, by the host node, the global model, and determining whether the global model is an ideal model, if so, stopping iteration, otherwise, entering a next round of training includes:

downloading the model aggregation block corresponding to the current round number by the main node, judging whether the aggregation code corresponding to the model aggregation block is the second preset value or not, and if the aggregation code is the second preset value, acquiring the global model;

testing the accuracy of the global model according to the public data set, and judging whether the global model is an ideal model according to whether the accuracy is converged;

if the global model is not an ideal model, sending a continuous training broadcast message, otherwise, sending a stop training broadcast message;

and responding to the continuous training broadcast message, downloading the global model by each node, and starting the next training round.

Further, the downloading, by the nodes, the global model in response to the continuous training broadcast message comprises:

downloading the model aggregation block by each node, and judging whether the aggregation code is a second preset value or not;

if the aggregation code is the second preset value, judging whether the iteration times are the current iteration times;

and if the iteration times are the current iteration times, acquiring the global model.

In a second aspect, an embodiment of the present invention provides a block chain-based federal learning privacy protection system, where the system includes:

the weight encryption module is used for generating a main public key, a main private key, a decryption key and an encryption key of each node by a trusted third party according to the weight vector of each node in advance and sending the main public key, the decryption key and the encryption key to the corresponding nodes; the encryption key is private to each node;

the initial modeling module is used for creating an initial block by the main node, writing an initial model into the initial block and issuing the initial model; the initial model is obtained based on public data set training;

the local training module is used for downloading the initial model by each node, training to obtain a local model, encrypting the local model by adopting the encryption key to obtain an encryption model and uploading the encryption model to a block chain;

the model aggregation module is used for responding to the fact that all the encryption models of all the nodes are uploaded to a block chain, enabling all the nodes to compete for generating the right of the aggregation block, enabling the nodes which obtain the right of the generation of the aggregation block to aggregate the encryption models of all the nodes to generate a global model according to the main public key and the decryption key, and uploading the global model to the block chain;

and the model testing module is used for downloading the global model by the main node and judging whether the global model is an ideal model or not, if so, stopping iteration, otherwise, entering the next round of training.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method when executing the computer program.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the above method.

The application provides a block chain-based federal learning privacy protection method, a system, computer equipment and a storage medium, through the method, a main public/private key, a decryption key and an encryption key of each node are generated through weight encryption by a trusted third party according to a weight vector preset by each node and are sent to the corresponding node, after an initial block is created by a main node and written in an initial model for release, each node downloads the initial model for training, an encryption model is obtained through encryption of the encryption key, then the encryption model is uploaded to a block chain after competition based on a workload certification mechanism of the block chain, after all encryption models of each node are uploaded to the block chain, each node competes for generating a right of a model aggregation block based on a consensus mechanism of the block chain, downloads encryption models of other nodes from the node obtaining the right, and the encryption models of each node are aggregated to generate a global model according to the main public key and the decryption key and then are uploaded to the block chain And the main node downloads the global model and judges the ideal model to determine whether to continue the iteration. Compared with the prior art, the block chain-based federated learning privacy protection method has the advantages that the problem that weight protection is neglected in client gradient fusion in the conventional Flchain is solved, the effects of protecting client attribute source privacy and model data content privacy are achieved, model aggregation operation is transferred to participants, the effect of reducing the calculation cost of a service provider is achieved, federated learning is deployed on the block chain, the participants can be promoted to actively participate in federated learning, and the effects of improving federated learning efficiency and guaranteeing model service quality are achieved.

Drawings

FIG. 1 is a schematic diagram of an application scenario of a block chain-based federated learning privacy protection method in an embodiment of the present invention;

FIG. 2 is a block chain-based framework diagram of a federated learning privacy protection method according to an embodiment of the present invention;

FIG. 3 is a block diagram of a prior art Federal learning model of P2P;

FIG. 4 is a flow chart of a block chain-based federated learning privacy protection method according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart illustrating the step S11 of FIG. 4, in which the trusted third party generates the primary public/private key, the decryption key and the encryption key of each node based on the encryption of the weight vector of each node;

fig. 6 is a schematic diagram of a block structure in a block chain of a block header new aggregation code and iteration times designed in the embodiment of the present invention;

FIG. 7 is a schematic flowchart illustrating a process in which each node in step S13 in FIG. 4 downloads an initial model for training to obtain an encryption model and uploads the encryption model to a block chain;

fig. 8 is a schematic flowchart illustrating a process in which each node in step S14 in fig. 4 competes for generating an aggregate model block including a global model and uploads a block chain;

fig. 9 is a schematic flowchart of the main node downloading the global model and performing ideal model judgment in step S15 in fig. 4;

FIG. 10 is a schematic structural diagram of a block chain-based federated learning privacy protection system in an embodiment of the present invention;

fig. 11 is an internal structural diagram of a computer device in the embodiment of the present invention.

Detailed Description

In order to make the purpose, technical solution and advantages of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments, and it is obvious that the embodiments described below are part of the embodiments of the present invention, and are used for illustrating the present invention only, but not for limiting the scope of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The block chain-based federal learning privacy protection method can be applied to a terminal or a server shown in fig. 1. The terminal can be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, and the server can be implemented by an independent server or a server cluster formed by a plurality of servers. As shown in fig. 2, the block chain-based framework of the federal learning privacy protection method may be configured such that a server is deployed as a master node of a block chain, a plurality of terminal devices are deployed as participating nodes of the block chain on demand, creating an initial block in the server main node and writing the initial block into an initial model for each terminal device to participate in the downloading of the training node and obtain a local model by using local data training, obtaining an encryption model by using encryption key encryption and then uploading the encryption model to a block chain, after all the terminal devices finish uploading the encryption model, all the nodes compete to obtain the right of generating the aggregation global model, uploading to a block chain after the node equipment generates a global model in an aggregation way, downloading the global model by a server main node, calculating the accuracy of the model to judge whether to carry out next round of training, stopping iteration when the accuracy of the global model is converged, and correspondingly applying the global model at the moment as an ideal model for federal learning. The block chain-based federated learning privacy protection method effectively solves the problem of communication pressure among all participants in a P2P federated learning model shown in figure 3, and meanwhile, on the basis of ensuring the credibility and traceability of the participants in federated learning based on the characteristics of the block chain, the client attribute source privacy and the model data content privacy are protected, the calculation cost of a service provider is reduced, and the technical effects of improving the federated learning efficiency and guaranteeing the model service quality are achieved.

The blockchain comprises a plurality of blockchain terminal devices which are called nodes for short, and the blockchain is divided into a public chain, a alliance chain and a private chain according to the admittance form of the constituent nodes of the blockchain. In order to truly use the decentralized characteristic of the block chain, the invention carries out corresponding design based on the application scene of the alliance chain, the nodes in the block chain are divided into a full node and a light node, wherein the full node guarantees the safety and the accuracy of the data on the block chain through the consensus verification of the block packed data, the light node does not participate in the consensus verification of the data and is only responsible for synchronizing the data information after the consensus verification, and each light node is required to be connected to a full node so as to synchronize the current state of the block chain and participate in the operation management of the whole block chain.

In one embodiment, as shown in fig. 4, a block chain-based federal learned privacy protection method is provided, which includes the following steps:

s11, generating a main public key, a main private key, a decryption key and an encryption key of each node by a trusted third party according to the weight vector of each node in advance, and sending the main public key, the decryption key and the encryption key to each corresponding node; the encryption key is private to each node;

in this embodiment, based on consideration of protecting a training data source, a multi-input function encryption method is used to protect each weight of participating nodes ignored in the prior art, and a third trusted third party generates a corresponding main public/private key, a corresponding decryption key, and an encryption key of each node according to a weight vector set by each node, as shown in fig. 5, in advance, according to the weight vector of each node, the step S11 of generating the main public key, the main private key, the decryption key, and the encryption key of each node by the trusted third party includes:

s111, each node sends a preset weight vector to the trusted third party;

wherein the weight vector y_iAnd i is 1 and …, n is a column vector with n dimensions, the ith element of the vector represents the weight value of the ith node, the range of the weight value is (0 and 1), and the element values of all other positions of the vector are 0. Specific weight vector y_iThe value of the ith element can be set according to the actual application requirements, for example, each node is determined according to actual contribution degree or the size of local data volume participating in training and is sent to a trusted third party, and the trusted third party manages and uses the value, so that the potential risk of data source leakage caused by directly exposing the weight of each node to the model aggregation node is avoided.

S112, the trusted third party generates a weight matrix according to the weight vector of each node, and generates the master public key and the master private key according to the weight matrix;

wherein the weight matrix is a weight vector y sent by a trusted third party according to each node_iN, i is 1, …, n, and n is directly combined according to the node numbers to obtain n x n dimensional matrix y (y is equal to₁,…,y_n). After obtaining the weight matrix, the trusted third party randomly generates a master public key mpk and a master private key msk based on a function encryption algorithm capable of realizing function encryption, such as a pseudo random number generator PRNG (linear congruence, BBS, ANSI X9.17, RC4, and other methods).

S113, the trusted third party generates the decryption key and the encryption key according to the weight matrix and the main private key.

The decryption key skf is used for model aggregation obtained by subsequent node training, and is generated by using a key generation algorithm, taking the weight matrix y and the master private key as inputs, through vector, matrix multiplication and linear combination, and the specific key generation algorithm can be selected according to actual requirements, and is not limited here, for example, a table can be used1, performing function encryption to obtain the target product. At the same time, the encryption key msk of each node_iI is 1, …, n, which is a component of the above-mentioned master secret key msk and can be obtained by running a key generation algorithm of multi-input function encryption. Trusted third party generates encryption key msk of each node_iAfter i is 1, …, n and the decryption key skf used by the model aggregation, the master public key mpk, the decryption key skf and the encryption key msk are added_iThe information is sent to each corresponding node, so that the risk of leakage of information such as training models and data on the main node side caused by perception of the main node on the weight of the nodes participating in training is effectively avoided, and deeper protection is provided for the privacy of federal learning.

Table 1 example multiple input function encryption-key generation algorithm

S12, creating an initial block by the master node, writing an initial model into the initial block and issuing the initial model; the initial model is obtained based on public data set training;

the master node can be appointed from the nodes added with the block chain according to actual training requirements, and is mainly used for obtaining an initial model based on public data set training, creating and issuing an initial block for storing the initial model used in the first training round for downloading and using by participating in the training nodes, and subsequently testing the accuracy of the global model obtained by each training round based on public data to obtain an applicable ideal model.

The embodiment does not limit the type of the initial model, and is suitable for federal learning training of various machine learning and deep learning models, such as linear regression, neural networks, convolutional neural networks, decision trees, support vector machines, bayesian classifiers and the like. It should be noted that, because many rounds of actual federal learning training are needed to obtain an ideal model meeting the service requirement, the initial model is obtained by the master node according to the public data set training only during the first round of training, and is not a global model aggregated according to preset rules in a strict sense, and the training models used in the subsequent iterative training are all obtained by aggregating weight vectors of the training nodes involved in the previous round of training and are published on the block chain.

The creation of the initial block and how to write the initial model into the initial block and then release the initial model on the block chain are obtained by adopting a corresponding method in the existing block chain, and details are not repeated here. It should be noted that the initial block is a conventional block header structure, and the subsequently packed encryption model block and the aggregation model block of the present invention both adopt a newly designed block structure as shown in fig. 6, that is, the block includes a block header and a block body, the block header adds an aggregation code AggregationFlag and an iteration number IntegerNum that have two possibilities of a first preset value and a second preset value to the conventional block header structure, and the block body is used for storing the encryption model/global model. When the aggregation code in the block header is a first preset value, the differentiated packed local encryption model is identified, and when the aggregation code is a second preset value, the differentiated packed global model is identified, if the aggregation code value is 0, the block contains the local encryption model, and if the aggregation code value is 1, the block contains the global model; the iteration times show that the encryption model/global model contained in the block is in the training of the second round, and are used for recording the iteration training times, ensuring the correctness of the downloaded model in the subsequent iteration training and further improving the efficiency of the federal learning training.

S13, downloading the initial model by each node, training to obtain a local model, encrypting the local model by the encryption key to obtain an encryption model, and uploading the encryption model to a block chain;

wherein, the local data refers to the private data of each node participating in training. Each node locally uses an initial model obtained by own local data to perform one round of iterative training to obtain a corresponding local model, and uploads the local model to a block chain after obtaining an encryption model by adopting multi-input function encryption, as shown in fig. 7, the steps of downloading the initial model by each node, training to obtain the local model, encrypting the local model by adopting the encryption key to obtain the encryption model, and uploading the encryption model to the block chain include:

s131, downloading the initial block by each node, and acquiring the initial model;

the initial block is not correspondingly improved in the block header as described above, that is, the block header does not have a record of the aggregation code and the iteration number, and each node only needs to download the corresponding block according to the synchronously obtained information to obtain the initial model stored in the block body.

S132, training the initial model according to local data to obtain the local model;

the training method of the initial model is related to the type of the initial model, and only needs to be trained according to the training method corresponding to the initial model, which is not described in detail herein.

S133, according to the encryption key, operating an encryption algorithm encrypted by a multi-input function on the local model to obtain the encryption model;

the multi-input function encryption is an encryption scheme of a scheme in which a plurality of users having encryption keys participate, and the users having decryption keys can obtain function values of secret data without obtaining any other information related to plaintext. The encryption key is generated by a trusted third party based on the weight vector of each node by adopting a function encryption algorithm, each node locally encrypts the trained local model by using the encryption key, and uploads the encrypted model to a block chain for release, so that the privacy of the model in block release is well ensured.

S134, generating the right of an encryption model block according to the competition of a consensus mechanism of the block chain, writing the encryption model, an aggregation code and the iteration times into the encryption model block, and uploading the encryption model block to the block chain; the aggregation code is a first preset value.

The common identification mechanism can be preferentially selected according to the actual application requirements, for example, any one of a workload certification mechanism, a rights and interests certification mechanism, a right authorization certification mechanism or a Pool verification Pool can be used for managing the rights of all the participating training nodes in competing to generate the encryption model block. The workload certification mechanism is a common consensus mechanism for the blockchain, and is a requirement which must be met when a new transaction message (namely a new block) to be added into the blockchain is generated for node workload certification, wherein block link points compete for accounting rights by calculating numerical solutions of random hash, and the capability of obtaining a correct numerical solution to compete for generating the block is a concrete representation of node computing power. In this embodiment, each node participating in training may preferably adopt a workload certification mechanism to compete to obtain the right of generating the encryption model block and uploading the encryption model block to the block chain, so that the nodes of the participating parties can be promoted to actively participate in federal learning, and the efficiency of federal learning is further improved while the quality of service of the model is guaranteed.

After each node obtains the right of the encryption model block, the encryption model block is generated, the information such as the encryption model, the aggregation code, the iteration times and the like is written into the encryption model block and uploaded, the encryption model is written into the block body of the block, the aggregation code set as a first preset value and the iteration times recording the current iteration rounds are respectively written into the positions corresponding to the block heads. Only the block field concerned by the present invention is described here, and other information writing in the block can be realized by referring to the prior art, which is not described herein again.

S14, responding to the fact that all the encryption models of all the nodes are uploaded to a block chain, enabling all the nodes to compete for the right of generating an aggregation block, enabling the nodes obtaining the right of generating the aggregation block to aggregate the encryption models of all the nodes according to the main public key and the decryption key to generate a global model and upload the global model to the block chain;

the aggregation of the global model is realized by using a decryption algorithm corresponding to the multi-input function encryption, that is, the master public key mpk, the decryption key skf and the encryption model of each node are used as inputs, and the global model (federate average model) is calculated and output. In this embodiment, based on the consideration that the problem of aggravating the computational cost of the service provider due to the fact that the existing federal study places the task of aggregating to obtain the global model on the server (host node) for processing is solved, the decentralized feature of the block chain is fully utilized, and the task of aggregating to obtain the global model is not placed on the fixed host node for processing, but all nodes participating in training compete for the aggregated model and the right of packaging and uploading the global model to the block chain. As shown in fig. 8, the step of responding to the encryption models of the nodes all uploaded to the block chain, competing for the right to generate an aggregation block by the nodes, and aggregating the encryption models of the nodes to generate a global model and upload the global model to the block chain according to the master public key and the decryption key by the node obtaining the right to generate the aggregation block comprises that S14 comprises:

s141, each node competes for generating a model aggregation block right according to a consensus mechanism of the block chain, and the node which obtains the model aggregation block right downloads the encryption model blocks corresponding to other nodes to obtain the encryption models corresponding to other nodes;

the common identification mechanism may also be preferentially selected according to the actual application requirements, for example, any one of a workload certification mechanism, a rights and interests certification mechanism, a right authorization certification mechanism, or a Pool verification Pool may be used to manage the rights of all the nodes participating in the training to compete to generate the model aggregation block, which only needs to be maintained with the mechanism used for uploading the encryption model, and the specific method is not described herein again.

The node for generating the power of the model aggregation block is obtained according to the competition mechanism, the encryption model blocks corresponding to other nodes are downloaded according to the synchronized block information, and whether the encryption model blocks meet the aggregation requirement is judged, namely after the aggregation code in the encryption model blocks is confirmed to be a first preset value and the iteration times are the current iteration rounds, the encryption models corresponding to the encryption model blocks are obtained, so that the correctness of the encryption models for aggregation of the current rounds is guaranteed, the training efficiency is improved, and meanwhile, the accuracy of the global model obtained through aggregation is guaranteed.

And S142, running a decryption algorithm of multi-input function encryption on the encryption model of each node according to the main public key and the decryption key to obtain the global model, and initiating an uplink request to the block chain so that the block chain performs block packing processing and broadcast consensus verification on the global model.

The master public key and the decryption key are generated according to the encryption algorithm encrypted by the multi-input function, and the global model obtained by the decryption key and the encryption model of each node is the inherent characteristic of the function encryption scheme, namely the aggregated federal average model can be directly obtained after decryption is completed, and the details are not described here. The method comprises the steps of obtaining nodes for generating model aggregation block power, using a master public key and a decryption key to aggregate encryption models of all the nodes to obtain a global model, actively calling an uplink interface opened by a block chain, initiating an uplink request to all the nodes in the block chain, wherein the uplink request carries the global model so as to ensure that the global model can be shared on the block chain, and the method is beneficial to carrying out correctness judgment on the global model by a subsequent master node and carrying out management and control on whether continuous training is needed.

S143, responding to the uplink request, determining a block chain storage address of the global model by the whole nodes of the block chain, generating the model aggregation block, and broadcasting the model aggregation block to other whole nodes in the block chain for consensus check;

after receiving the uplink request, the full nodes in the block chain first determine the block chain storage address corresponding to the global model, generate a model aggregation block recording the global model and the corresponding block chain storage address, and then broadcast the model aggregation block to other full nodes of the block chain for corresponding consensus check.

And S144, responding to the success of the consensus check, storing the global model, the aggregation code and the iteration times to the block chain storage address by the full node, and broadcasting the model aggregation block to each node of the block chain for synchronization.

After the consensus check is successful, the model aggregation block is broadcasted to all nodes (all full nodes and light nodes) of the block chain for synchronization, the full nodes store the global model to the block chain storage address, and the block chain storage address is used as the address of the model aggregation block.

The operation of aggregating the encryption models of the participating training nodes to obtain the global model is completed by releasing the encryption models to the participating training nodes through a consensus mechanism, so that the effect of reducing the calculation cost of the service provider is achieved while the federal learning service quality and the learning efficiency are ensured.

S15, downloading the global model by the main node, judging whether the global model is an ideal model, if so, stopping iteration, otherwise, entering the next round of training;

after each round of training is completed, the master node checks the accuracy of the global model obtained by the current training, judges whether the accuracy is converged to determine whether to inform the nodes participating in the training to continue iterative training until the ideal model is obtained, as shown in fig. 9, the master node downloads the global model and judges whether the global model is the ideal model, if the global model is the ideal model, iteration is stopped, otherwise, the step S15 of the next round of training includes:

s151, downloading the model aggregation block corresponding to the current round number by the main node, judging whether the aggregation code corresponding to the model aggregation block is the second preset value or not, and if the aggregation code is the second preset value, acquiring the global model;

the master node can download the model aggregation block corresponding to the current round number according to the obtained synchronization information, assist in judging whether the downloaded model aggregation block is correct or not by combining whether the aggregation code in the block header is a second preset value or not, and obtain the global model stored in the block on the basis of ensuring the correctness of the model aggregation block.

S152, testing the accuracy of the global model according to the public data set, and judging whether the global model is an ideal model according to whether the accuracy is converged;

in this embodiment, the accuracy of the global model obtained by current iteration is tested according to the public data set stored in the master node, and the current accuracy result is compared with the previous result, if the difference between the accuracy for consecutive times (for example, three times) does not exceed a preset threshold (for example, 1%), it is determined that the accuracy is converged, and the global model with the highest accuracy is used as the final ideal model, otherwise, the iterative training needs to be continued. It should be noted that, the method for determining the ideal model and the criterion for determining whether the accuracy of the global model converges in this embodiment are only exemplary descriptions, and may be adjusted according to requirements in practical applications.

S153, if the global model is not the ideal model, sending a continuous training broadcast message, otherwise, sending a stop training broadcast message;

the master node sends a test result (such as accuracy) and a training instruction whether to perform next round of training to all nodes of the whole block chain in a form of broadcast message according to an actual judgment result, and the nodes participating in training judge whether to continuously download the global model for local training according to the training instruction of the master node. It should be noted that different instructions may be set according to needs in the training instruction in the broadcast message, for example, a training instruction of 1 indicates that training needs to be continued, and a training instruction of 0 indicates that training needs to be stopped, and it is within the scope of the present invention to determine the communication mechanisms of the master node and the participating training nodes in advance clearly, and to implement the function of controlling whether the training task of the participating training nodes continues through the master node.

And S154, responding to the continuous training broadcast message, downloading the global model by each node, and starting the next round of training.

And after the master node tests the accuracy of the global model obtained by current training, the broadcast message is continuously trained, and the broadcast message is found when the accuracy convergence requirement is not met. When the nodes participating in the training receive the broadcast message and recognize that the training instruction is to continue training, each node participating in the training downloads the latest global model and starts a new round of iterative training. It should be noted that the difference between the non-initial download training model and the initial download training model of each node participating in training is that the initial block created by the initial download master node obtains the initial model, but not the aggregation model block released last time, after the accuracy check of the aggregation model block is completed, the latest global model is obtained, and the step of downloading the global model specifically corresponding to the latest global model includes: downloading the model aggregation block by each node, and judging whether the aggregation code is a second preset value or not; if the aggregation code is the second preset value, judging whether the iteration times are the current iteration times; and if the iteration times are the current iteration times, acquiring the global model.

The embodiment of the application designs a frame of a block chain-based federal learning privacy protection method, which utilizes the excellent characteristics (such as decentralization, non-tampering property, traceability and the like) of the block chain, generates a main public/private key, an encryption key and a decryption key by a trusted third party according to the weight vector of each node in advance, creates an initial block by a main node, publishes an initial model obtained by training the initial block in the block chain, downloads each node and trains according to local data to obtain an encryption model uploading block chain, aggregates and generates a global model and uploads the block chain by each node participating in training through a consensus mechanism, and then carries out accuracy evaluation on the global model by the main node to control whether the participating training nodes need to continue training until an ideal model is obtained, thereby realizing the reduction of the calculation cost of a service provider in the federal learning, the method has the advantages that the service quality of a federated learning model is effectively guaranteed, the federated learning efficiency is improved, meanwhile, the problem of protecting the weight and data content of each participant (client/node) is effectively solved by introducing a multi-input function encryption algorithm with hidden functions, and the technical effect of effectively avoiding privacy disclosure risks is achieved.

It should be noted that, although the steps in the above-described flowcharts are shown in sequence as indicated by arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise.

In one embodiment, as shown in fig. 10, there is provided a block chain-based federal learned privacy protection system, the system comprising:

the weight encryption module 1 is used for generating a master public key, a master private key, a decryption key and an encryption key of each node by a trusted third party according to the weight vector of each node in advance, and sending the master public key, the decryption key and the encryption key to each corresponding node; the encryption key is private to each node;

the initial modeling module 2 is used for creating an initial block by the main node, writing an initial model into the initial block and issuing the initial model; the initial model is obtained based on public data set training;

the local training module 3 is used for downloading the initial model by each node, training to obtain a local model, encrypting the local model by adopting the encryption key to obtain an encryption model and uploading the encryption model to a block chain;

the model aggregation module 4 is configured to respond that all the encryption models of the nodes are uploaded to the block chain, compete for the right of generating an aggregation block by the nodes, and aggregate the encryption models of the nodes according to the master public key and the decryption key to generate a global model and upload the global model to the block chain;

and the model testing module 5 is used for downloading the global model by the main node and judging whether the global model is an ideal model or not, if so, stopping iteration, otherwise, entering the next round of training.

For specific limitations of a block chain-based federal learned privacy protection system, reference may be made to the above limitations of a block chain-based federal learned privacy protection method, which are not described herein again. The modules in the block chain-based federal learning privacy protection system can be implemented in whole or in part by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

It should be noted that the block chain-based federal learning privacy protection method and system in the embodiments of the present invention may be applied to regional power utilization prediction and power supply adjustment planning of several power companies, so as to implement a power data calculation scenario for reasonable distribution of power supply. The method or the system is adopted to complete safe federal learning of a power consumption prediction model trained among various power companies, and the specific application of realizing reasonable distribution of the electric quantity is as follows: each electric power company participating in training sends a preset weight vector to a trusted third party including a Certificate Authority (CA) for generating a corresponding weight matrix, a main public/private key, a decryption key and an encryption key of each node, wherein the main public/private key, the decryption key and the encryption key of each electric power company are generated by performing function encryption on the weight of each electric power company; the power department creates an initial block, trains out an initial model by using a public data set, writes the initial model into the initial block and releases the initial model; each power company downloads an initial block, trains to obtain a local model by using an initial model and local electricity consumption data recorded by the initial block, encrypts the local model by using respective encryption keys to run an encryption algorithm encrypted by a multi-input function, generates the right of a new block (encryption model block) through competition of a workload certification mechanism (PoW) of a block chain after obtaining a corresponding encryption model, writes the encryption model, an aggregation code AggregationFlag marked as the encryption model and an iteration number IntegerNum for recording the current iteration number into the new block (encryption model block) and uploads the new block (encryption model block) to the chain; after all electric power companies upload new blocks containing encryption models (encryption model blocks with aggregation code values being a first preset value), each electric power company competes for generating the power of the model aggregation blocks through a consensus mechanism of a block chain, the electric power company obtaining the power of the model aggregation blocks downloads all encryption model blocks containing the encryption models of other electric power companies (the aggregation code values being the first preset value and the iteration times being the current iteration round numbers), a decryption algorithm of multi-input function encryption is operated by using a main public key and a decryption key to obtain a global model (a federal mean model), and the global model, the aggregation codes marked as the global model and the iteration times recording the current iteration round numbers are written into the model aggregation blocks and uploaded on the chain; the power department downloads a model aggregation block containing a global model (the aggregation code value is a second preset value, the iteration times are the current iteration rounds), the accuracy of the global model is tested by using a public data set, whether the global model is an ideal model is judged according to whether the accuracy is converged, if the global model is not the ideal model, a continuous training broadcast message is sent, and if the accuracy is not the ideal model, a training stopping broadcast message is sent; after receiving the continuous training broadcast message, all power companies download the latest global model and start the next round of training, otherwise, after receiving the training stopping broadcast message, stopping the training; after the electric power department sends the broadcast message of stopping training, the global model (federal evaluation model) obtained by current training can be used as a final power consumption prediction model to provide service, the power consumption of all parts throughout the year is analyzed, and the monthly power resource supply of the electric power company and the distribution of the electric power company in all parts are reasonably planned.

Fig. 11 shows an internal structure diagram of a computer device in one embodiment, and the computer device may specifically be a terminal or a server. As shown in fig. 11, the computer apparatus includes a processor, a memory, a network interface, a display, and an input device, which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a block chain based federated learned privacy preserving method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those of ordinary skill in the art that the architecture shown in FIG. 11 is merely a block diagram of some of the structures associated with the present solution and is not intended to limit the computing devices to which the present solution may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a similar arrangement of components.

In one embodiment, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the steps of the above method being performed when the computer program is executed by the processor.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned method.

To sum up, the embodiment of the present invention provides a block chain-based federal learned privacy protection method, system, computer device and storage medium, wherein the block chain-based federal learned privacy protection method uses a trusted third party to perform weight encryption according to a weight vector preset by each node to generate a main public/private key, a decryption key and an encryption key of each node, and sends the generated keys to the corresponding node, after an initial block is created by a master node and written in an initial model for issuance, each node downloads the initial model for training, obtains an encryption model by encryption key encryption, uploads the obtained keys to a block chain after competition based on a workload certification mechanism of the block chain, when all the encryption models of each node are uploaded to the block chain, each node competes to generate a right of a model aggregation block based on a consensus mechanism of the block chain, and downloads the encryption models of other nodes by the node obtaining the right, and according to the main public key and the decryption key, the encryption models of all the nodes are aggregated to generate a global model, the global model is uploaded to a block chain, and then the main node downloads the global model and carries out ideal model judgment to determine whether to continue iteration. Compared with the prior art, the block chain-based federal learning privacy protection method not only solves the problem that the existing Flchain neglects weight protection in the gradient fusion of clients, achieves the effect of protecting the privacy of client attribute sources and the privacy of model data contents, but also transfers model aggregation operation to participants, achieves the effect of reducing the calculation cost of a service provider, and also deploys the federal learning on the block chain, so that the participants can be promoted to actively participate in the federal learning, and the technical effects of improving the federal learning efficiency and guaranteeing the service quality of the model are achieved.

The embodiments in this specification are described in a progressive manner, and all the same or similar parts of the embodiments are directly referred to each other, and each embodiment is described with emphasis on differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. It should be noted that, the technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express some preferred embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, various modifications and substitutions can be made without departing from the technical principle of the present invention, and these should be construed as the protection scope of the present application. Therefore, the protection scope of the present patent shall be subject to the protection scope of the claims.

Claims

1. A block chain-based federated learning privacy protection method is characterized by comprising the following steps:

generating a master public key, a master private key, a decryption key and an encryption key of each node by a trusted third party according to the weight vector of each node in advance, and sending the master public key, the decryption key and the encryption key to each corresponding node; the encryption key is private to each node;

downloading the global model by the main node, judging whether the global model is an ideal model or not, if so, stopping iteration, otherwise, entering the next round of training;

the steps of downloading the initial model by each node, training to obtain a local model, encrypting the local model by using the encryption key to obtain an encryption model, and uploading the encryption model to a block chain include:

downloading the initial block by each node and acquiring the initial model;

training the initial model according to local data to obtain the local model;

generating the right of an encryption model block according to the competition of a consensus mechanism of the block chain, writing the encryption model, an aggregation code and the iteration times into the encryption model block, and uploading the encryption model block to the block chain; the aggregation code is a first preset value;

the step of responding to the encryption model of each node being uploaded to the block chain in full, competing the right of generating the aggregation block by each node, and aggregating the encryption models of each node according to the master public key and the decryption key by the node obtaining the right of generating the aggregation block to generate a global model and uploading the global model to the block chain comprises:

each node competes for generating a model aggregation block right according to a consensus mechanism of the block chain, and the node which obtains the model aggregation block right downloads the encryption model blocks corresponding to other nodes to obtain the encryption models corresponding to other nodes;

2. The federal learned privacy protection method of claim 1, wherein the step of generating a master public key, a master private key, a decryption key, and an encryption key of each node by a trusted third party in advance according to the weight vector of each node comprises:

sending a preset weight vector to the trusted third party by each node;

3. The federal learned privacy protection method of claim 1, wherein the downloading, by the node that obtains the power to generate the model aggregation block, the encryption model block corresponding to another node, and the obtaining the encryption model corresponding to another node comprises:

4. The federal learning privacy protection method of claim 3, wherein the downloading of the global model by the host node and the determination of whether the global model is an ideal model, if so, stopping the iteration, otherwise, entering a next training round comprises:

5. The federal learned privacy protection method of claim 4, wherein the downloading by the nodes the global model in response to the continuous training broadcast message comprises:

6. A block chain based federal learned privacy protection system capable of performing the federal learned privacy protection method of any of claims 1-5, the system comprising:

7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 5 are implemented when the computer program is executed by the processor.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of one of claims 1 to 5.