WO2020224205A1 - 基于区块链的安全协作深度学习方法及装置 - Google Patents

基于区块链的安全协作深度学习方法及装置 Download PDF

Info

Publication number
WO2020224205A1
WO2020224205A1 PCT/CN2019/114984 CN2019114984W WO2020224205A1 WO 2020224205 A1 WO2020224205 A1 WO 2020224205A1 CN 2019114984 W CN2019114984 W CN 2019114984W WO 2020224205 A1 WO2020224205 A1 WO 2020224205A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameter
global model
parameter change
change set
client
Prior art date
Application number
PCT/CN2019/114984
Other languages
English (en)
French (fr)
Inventor
徐恪
张智超
吴波
李琦
徐松松
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Priority to US17/012,494 priority Critical patent/US11954592B2/en
Publication of WO2020224205A1 publication Critical patent/WO2020224205A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
    • H04L9/3239Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving non-keyed hash functions, e.g. modification detection codes [MDCs], MD5, SHA or RIPEMD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/126Applying verification of the received information the source of the received data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0618Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation
    • H04L9/0637Modes of operation, e.g. cipher block chaining [CBC], electronic codebook [ECB] or Galois/counter mode [GCM]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/50Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees

Definitions

  • the feedback of this application belongs to the field of distributed machine learning technology, and particularly relates to a secure collaborative deep learning method and device based on blockchain.
  • the blockchain can be regarded as an open and credible distributed ledger (or database). Many blocks are connected in turn to form a chain-like storage structure, and the consistency and non-tamperability of data records are guaranteed through a consensus mechanism.
  • the smart contract on the blockchain is an automatically executed electronic contract code stored on the blockchain, and the programming language of the smart contract has Turing completeness, and the contract code of the corresponding function can be written according to the needs.
  • External applications can interact with blockchain data by calling the interface functions of the contract code, and complete functions such as consensus on optimal parameters in collaborative deep learning.
  • the feedback of this application aims to solve one of the technical problems in related technologies at least to a certain extent.
  • one purpose of the feedback of this application is to propose a secure collaborative deep learning method based on blockchain, which ensures the privacy characteristics of all user data sources, the security of the training process and the final training model during the collaborative training process. High generalization and high accuracy rate.
  • Another purpose of the feedback of this application is to propose a secure collaborative deep learning device based on blockchain.
  • the present application feedbacks that one embodiment of the present application proposes a secure collaborative deep learning method based on blockchain, including:
  • S1 Obtain a global model, an optimal parameter change set and an evaluation matrix, and initialize the parameters of the global model, the optimal parameter change set, and the evaluation matrix;
  • the secure collaborative deep learning method based on the blockchain in the feedback embodiment of this application realizes the storage of key data by using the blockchain as a trusted infrastructure, and uses the trusted execution results of smart contracts to achieve consensus election of optimal parameters. Participating users use a global parameter server as an intermediate bridge to change the parameters of the interactive model. Users conduct collaborative training through parameter sharing, which not only allows the model to learn the characteristics of all data sources, but also protects data privacy. Users elect the optimal parameters through the consensus contract to ensure the smooth convergence of the model.
  • the secure collaborative deep learning method based on the blockchain of the foregoing embodiment fed back according to the present application may also have the following additional technical features:
  • each user terminal trains the global model according to a training data set to generate a parameter change set, and screens the parameter change set according to a preset method, including:
  • Each user terminal trains the global model according to the local training data set, and calculates the parameter change set, the formula is:
  • ⁇ ′ i is the parameter value after the global model training
  • ⁇ i is the parameter value before the global model training
  • ⁇ i is the change amount of the global model parameter
  • is the screening parameter ratio
  • ⁇ g is the parameter set of the global model.
  • the S2 further includes:
  • Each user terminal adds a time stamp to the filtered parameter change set and signs it.
  • the S3 further includes:
  • Each user terminal verifies the received parameter change set after screening
  • the received storage transaction number it is verified whether the hash value of the filtered parameter change set correspondingly stored on the blockchain is consistent with the received filtered parameter change set.
  • S4 further includes:
  • a user-side set of optimal parameters is selected A collection of user terminals for obtaining the optimal parameters Corresponding to the optimal parameter change set, updating the global model according to the optimal parameter change set,
  • mark Mi as all the evaluation values of the i-th client on other clients
  • sort M i in descending order
  • mark as According to The ranking position in, the score of the j-th client is:
  • m is the total number of participating clients
  • p j is the number of the j-th client
  • u i is the i-th user terminal
  • s(j; u i ) is the score obtained by the j-th client terminal under the evaluation of u i , based on the total score
  • the initialization module is used to obtain the global model, the optimal parameter change set and the evaluation matrix, and initialize the parameters of the global model, the optimal parameter change set and the evaluation matrix;
  • the training module is used to obtain the download instruction of the global model, and send the download instruction to multiple client terminals so that the multiple client terminals download the global model, and each client terminal compares the global model according to the training data set.
  • the model is trained to generate a parameter change set, and the parameter change set is filtered according to a preset method;
  • the evaluation module is used to store the hash value of the parameter change set filtered by each client into the blockchain, generate the corresponding storage transaction number, and combine the filtered parameter change set with the corresponding
  • the stored transaction number is sent to each client terminal, so that each client terminal verifies and validates the received filtered parameter change set and the corresponding stored transaction number according to the verification data set. After the evaluation, multiple evaluation values between the users are generated, and the multiple evaluation values are stored in the blockchain;
  • the update module is used to update the evaluation matrix according to the multiple evaluation values, select the optimal parameter change set according to the updated evaluation matrix and the preset blockchain consensus contract, and according to the most Updating the global model with a set of optimal parameter changes;
  • the iteration module is used to iterate until the global model meets the preset condition.
  • the secure collaborative deep learning device based on the blockchain in the feedback embodiment of the present application realizes the storage of key data by using the blockchain as a trusted infrastructure, and uses the trusted execution results of smart contracts to achieve consensus election of optimal parameters. Participating users use a global parameter server as an intermediate bridge to change the parameters of the interactive model. Users conduct collaborative training through parameter sharing, which not only allows the model to learn the characteristics of all data sources, but also protects data privacy. Users elect the optimal parameters through the consensus contract to ensure the smooth convergence of the model.
  • the secure collaborative deep learning device based on the blockchain of the foregoing embodiment fed back according to the present application may also have the following additional technical features:
  • each user terminal trains the global model according to a training data set to generate a parameter change set, and screens the parameter change set according to a preset method, including:
  • Each user terminal trains the global model according to the local training data set, and calculates the parameter change set, the formula is:
  • ⁇ ′ i is the parameter value after the global model training
  • ⁇ i is the parameter value before the global model training
  • ⁇ i is the change amount of the global model parameter
  • is the screening parameter ratio
  • ⁇ g is the parameter set of the global model.
  • training module is also used for:
  • Each user terminal adds a time stamp to the filtered parameter change set and signs it.
  • the evaluation module includes: a verification unit,
  • the verification unit is configured to verify the received parameter change set after screening by each user terminal;
  • the verification unit is specifically configured to verify the hash value of the filtered parameter change set correspondingly stored on the blockchain and the received filtered parameter according to the received storage transaction number Whether the change amount set is consistent.
  • update module is specifically used for:
  • a user-side set of optimal parameters is selected A collection of user terminals for obtaining the optimal parameters Corresponding to the optimal parameter change set, updating the global model according to the optimal parameter change set,
  • mark Mi as all the evaluation values of the i-th client on other clients
  • sort M i in descending order
  • mark as According to The ranking position in, the score of the j-th client is:
  • m is the total number of participating clients
  • p j is the number of the j-th client
  • u i is the i-th user terminal
  • s(j; u i ) is the score obtained by the j-th client terminal under the evaluation of u i , based on the total score
  • FIG. 1 is a flowchart of a secure collaborative deep learning method based on blockchain according to an embodiment of the present application
  • FIG. 2 is a flowchart of a secure collaborative deep learning method based on blockchain based on another embodiment of the feedback of the present application
  • FIG. 3 is a schematic diagram of collaborative learning entities and their connection relationships according to an embodiment of feedback of the present application
  • FIG. 4 is an operation sequence diagram when a user participates in collaboration according to an embodiment of feedback of the present application
  • FIG. 5 is a schematic diagram of the interaction between the parameter server and the blockchain according to an embodiment of the feedback of the present application
  • Fig. 6 is a logic flow chart of the content of a smart contract according to an embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a secure collaborative deep learning device based on blockchain according to an embodiment of the present application.
  • the present invention proposes a collaborative deep learning method based on blockchain. Aiming at various business scenarios in practice, using the training goals of different users and the relevant characteristics between training data sets, a set of Under the premise of the data set, the technical mechanism of the same deep model can be cooperatively trained.
  • This mechanism not only allows users with the same training goal to jointly train to obtain a deep model with strong generalization ability and high accuracy, but also allows users to not need to disclose the data set, which protects the privacy of the data set.
  • the consensus mechanism of the contract on the chain is used to reach a consensus on the optimal parameters among training users
  • Fig. 1 is a flowchart of a secure collaborative deep learning method based on blockchain according to an embodiment of the present application.
  • the blockchain-based secure collaborative deep learning method includes the following steps:
  • Step S1 Obtain the global model, the optimal parameter change set and the evaluation matrix, and initialize the parameters of the global model, the optimal parameter change set and the evaluation matrix.
  • the parameters of the global model are first initialized, and at the same time, the optimal parameter change set for subsequent use is initialized.
  • a global parameter server is used as an intermediate bridge to realize the information interaction between the client and the blockchain, and to realize the training and update of the global model in a collaborative learning manner.
  • the global parameter server initializes the global model according to the settings of the participating users, and aggregates the optimal parameter amount into the global model. Assuming that there are m users, the parameters of n (n ⁇ m) users are elected as the optimal set of each round of training to update the global model.
  • the global model has a total of k parameters.
  • Step S2 Obtain the download instruction of the global model, and send the download instruction to multiple client terminals so that the multiple client terminals download the global model.
  • Each client terminal trains the global model according to the training data set to generate a parameter change set, and according to the preset The set method filters the parameter change set.
  • a download instruction is issued to all client terminals through the global parameter server, and all client terminals download the initialized global model and start the first round of collaborative training process.
  • each client uses its own locally stored training data set to train the global model to generate a parameter change set.
  • the global model contains multiple parameters, and multiple parameters will occur after training. change.
  • Each client will also filter the generated parameter change set according to a preset method.
  • stochastic gradient descent SGD method can be used to train the global model, or other methods can be used for training, and the training method can be selected for training according to actual needs.
  • each client trains the global model according to the training data set to generate a parameter change set, and filters the parameter change set according to a preset method, including:
  • Each user terminal trains the global model according to the local training data set, and calculates the parameter change set, the formula is:
  • ⁇ ′ i is the parameter value after global model training
  • ⁇ i is the parameter value before global model training
  • ⁇ i is the amount of global model parameter change
  • is the ratio of screening parameters
  • ⁇ g is the parameter set of the global model.
  • Step S3 Store the hash value of the parameter change set filtered by each client in the blockchain, generate the corresponding stored transaction number, and send the filtered parameter change set and the corresponding stored transaction number to each Multiple client terminals, so that each client terminal verifies and evaluates the received filtered parameter change set and the corresponding stored transaction number according to the verification data set to generate multiple evaluation values between the client terminals.
  • the evaluation value is stored in the blockchain.
  • S3 also includes:
  • Each client verifies the set of parameter changes received after screening
  • the received storage transaction number it is verified whether the hash value of the filtered parameter change set correspondingly stored on the blockchain is consistent with the received filtered parameter change set.
  • the global parameter server receives the parameter change set obtained through training uploaded by the user and stores it locally, and records the hash value Hash_para of the parameter change set to the blockchain to obtain the corresponding stored transaction number, which is recorded as Tx-ID.
  • H is a hash function.
  • the global parameter server After determining that all the clients have uploaded the filtered parameter change set to the blockchain and generated the corresponding storage transaction number, the global parameter server then sets the parameter change set uploaded by all the clients and the corresponding storage transaction The number is sent to all the client terminals, and all client terminals use the locally stored verification data set to verify and score the received parameter change set and the corresponding stored transaction number to obtain the evaluation value. Among them, each client The client scores and obtains a one-to-one corresponding evaluation value, which is stored in the blockchain.
  • the local verification data set is used to calculate the parameter to update the corresponding F1-score value, call the function of the smart contract, and record it in the blockchain as the basis for the best parameter selection.
  • Step S4 update the evaluation matrix according to multiple evaluation values, select the optimal parameter change set according to the updated evaluation matrix and the preset blockchain consensus contract, and update the global model according to the optimal parameter change set .
  • S4 further includes:
  • the client set with the optimal parameters is selected A collection of clients to obtain optimal parameters
  • the corresponding optimal parameter change amount set, and the global model is updated according to the optimal parameter change amount set.
  • each user’s parameter changes are voted and scored. The n users with the highest scores are recorded as the optimal parameter set, and then the server updates with their corresponding uploaded parameter changes. model.
  • M ij represents the evaluation value of the i-th user to the j-th user.
  • m is the total number of participating users
  • p j is the jth user in In the location. According to the above formula, the total score of the j-th user:
  • u i is the i-th user
  • s(j; u i ) is the score of the jth user under the evaluation of u i . Based on the total score, the user set with the best parameters:
  • the optimal parameter change set is obtained, the user set of the optimal parameter is first obtained, and then the optimal parameter change set is obtained according to the parameter change set corresponding to the optimal parameter user set, and the optimal parameter change is used Update the global model with the quantity set.
  • the user set of the optimal parameters is updated, and the parameter changes of all corresponding users are aggregated.
  • the global parameter server first averages the changes of all user models to ⁇ i , adds the obtained value to the corresponding parameter, and performs this operation cyclically to traverse all model parameters, and finally realizes the update of the model.
  • Step S5 iterate S2, S3, and S4 to update the global model until the global model meets the preset conditions, and the iterative process ends.
  • the global parameter server sends an instruction to download the latest global model to the client, and then performs training and update.
  • the global model meets preset conditions, such as when The accuracy of the model reaches the user's expected value, or when the number of training rounds is sufficient, the iteration ends, and the sign of ending the collaborative training can be set according to actual needs.
  • the collaborative learning method fed back by this application establishes a privacy-friendly distributed collaborative learning mechanism. Participants save their own data sets locally and collaborate with other participants through parameter sharing. Participants do not disclose their data sets during the collaboration process. At the same time, in actual application scenarios, the quality of the data set is universal.
  • the feedback method of this application can not only ensure the consistency of parameter interaction and the privacy of the data set, but it can also be changed through the election of consensus contracts. Ensure the smooth convergence of the global model.
  • FIG. 2 it shows the flow chart of the initial deployment of the system instance, which mainly includes 5 steps:
  • Step 1 The m users participating in the collaborative learning training negotiate a common deep model structure.
  • the model is maintained globally by the parameter server.
  • Step 2 The parameter server is initialized. It mainly consists of two parts. First, initialize the user list with optimal parameters, then initialize the depth model randomly, and notify all participating users to download.
  • Step 3 Deploy the consensus contract.
  • the contract first initializes the evaluation matrix M, and secondly, some important parameters need to be set in the consensus contract, such as selecting the number n of the optimal parameter users in each round of training.
  • Step 4 All users participating in collaborative training download the initialized depth model from the parameter server. Note that it should be ensured that the model structure of the initial training of all users is consistent. Therefore, a global parameter server is required to initialize the model randomly, and all users are trained on the basis of the same random initial model.
  • Step 5 Each user prepares the training data set and the verification data set, and uses the training data set to train the initialized depth model to start the collaborative learning process.
  • the feedback method of this application allows users with the same training target to jointly train the target model without losing the protection of private data. It allows a global parameter server to collect model parameters submitted by users in each round of training and maintain the global model. At the same time, each user uses his own verification data set to evaluate the upload parameters, and achieves optimal parameter consensus through smart contracts. Finally, the global parameter server aggregates the optimal parameters in each round of training to obtain the final collaboratively trained global model.
  • the first part is the user group (multiple clients).
  • Each user has its own training data set and verification data set, and conducts local training through stochastic gradient descent.
  • the user selects the corresponding parameter change list, attaches a timestamp and signs and uploads it to the parameter server to prevent others from copying (or replaying) the corresponding parameters.
  • all users should download the latest parameter change and use their own verification data set to calculate the evaluation value F1-score (or other verification methods to obtain the corresponding evaluation value) , And then synchronize the corresponding evaluation results to the blockchain smart contract.
  • each user should have the same training goal, such as the same depth model.
  • the second part is the parameter server.
  • the parameter server conducts data interaction with users and the blockchain, such as uploading and downloading model parameters, and transaction broadcasting of parameters corresponding to hash values.
  • the parameter server also maintains the global model, and uses the parameter changes uploaded by the user in the optimal parameter set to update the global model.
  • the parameter server should store each user's public key to verify the user's signature data.
  • the parameter change hash uploaded by all users needs to be appended to the data field of the blockchain transaction, and the transaction number corresponding to the downloaded parameter is returned to each user, namely Tx-ID is used to verify the consistency of parameters and prevent the occurrence of the above malicious behaviors.
  • the third part is the blockchain and smart contracts on the chain.
  • the parameter change hash value uploaded by each user needs to be attached to the transaction data segment and broadcast to the blockchain network to ensure that the recorded hash value cannot be tampered with by the server.
  • this implementation scheme recommends the use of a consortium chain with better performance, such as the use of open source consortium chain projects such as Hyperledger Fabric.
  • the built blockchain must support the operation of smart contracts.
  • Ethereum supports solidity contracts, and fabric supports high-level programming languages such as Golang and Java.
  • a smart contract is a computer agreement designed to digitize the negotiation or performance of a contract, so as to facilitate verification or enforcement. Smart contracts allow credible transactions without a third party.
  • the smart contract in the plan must run on the blockchain, which provides a credible execution environment for the smart contract. Therefore, the consensus contract in the feedback method of the present application is based on the above characteristics to ensure that a consensus on the optimal parameters can be formed in the user group, thereby ensuring the smooth convergence of the global model, and avoiding the influence of malicious users or low-quality parameters.
  • Fig. 4 shows the operation sequence flow chart required by each user in the user group in this embodiment, which mainly includes two stages, and each stage includes 6 steps:
  • Step 1 At the beginning of each round of training, the user needs to download the latest global model from the server for this round of training.
  • Step 2 The user uses his own verification data set for local training.
  • the training methods of all users need to be consistent, such as the use of stochastic gradient descent. After training the model with the local validation data set, it is necessary to calculate the change of each parameter
  • Step 3 Put Sort in descending order, select the group of parameter changes with the largest amount of change To upload. Note that the ratio selected at this time will affect the efficiency of the system operation, and the selected ratio is recorded as ⁇ , namely The larger the ratio ⁇ , the higher the update degree of the uploaded model, which can slightly increase the rate of global model convergence, but the corresponding communication bandwidth is also larger, because each client needs to interact with the server more parameters. Therefore, it is recommended that ⁇ can be located in the interval [0.01, 0.2], adjust the upload ratio according to the overall size of the actual model parameters, and consider the two important factors of communication efficiency and convergence rate.
  • Step 4 Put And after signing the corresponding timestamp, upload it to the server to prevent malicious actions such as simple replay attacks.
  • Step 5 The server feedbacks and records the transaction Tx-ID on the chain. After the user receives the Tx-ID, it verifies whether the storage on the chain is consistent with the upload parameters, so as to prevent the server from tampering with the parameters and sending them to other users.
  • Step 6 The server notifies other users to download the latest parameter update.
  • Step 1 When other users upload new parameter updates, the server notifies the current user to download and evaluate the upload parameters.
  • Step 2 The user downloads the parameter update and the corresponding blockchain transaction Tx-ID.
  • Step 3 Query the parameter hash Hash_para stored on the chain, compare the downloaded parameters to update the corresponding hash value, and ensure that the downloaded parameters will not be maliciously tampered with by the server.
  • Step 4 Use the local verification data set to evaluate the parameters. It should be noted that the evaluation methods need to be consistent, such as using commonly used evaluation methods such as F1 and accuracy to verify the pros and cons of other users’ upload parameters.
  • Step 5 The user synchronizes the corresponding evaluation value to the blockchain consensus contract.
  • the consensus contract needs to set aside the corresponding contract interface for users to call.
  • Step 6 the contract event is triggered, and the server is notified of the corresponding evaluation result.
  • the server needs to monitor the contract events on the blockchain. Once the event is triggered, the server captures the corresponding event type and executes the corresponding callback response according to the event classification. For example, once the user submits the latest evaluation value, the server needs to capture the corresponding contract event to ensure the consistency of the content and the data on the chain.
  • Step 1 When the user uploads the parameter, the server appends the hash value Hash_para of the corresponding parameter to the transaction data segment and records it on the blockchain.
  • Step 2 The blockchain feeds back the corresponding transaction Tx-ID.
  • Tx-ID is a hash value that uniquely identifies a transaction.
  • the server feeds back this Tx-ID to the user to verify the consistency of download parameters.
  • Step 3 The server registers for the monitoring service.
  • the consensus contract needs to write the corresponding contract event, and the server monitors the corresponding event callback response and handles the event accordingly.
  • Step 4 The contract function is called by the user, and the corresponding event is thrown, which is captured by the server. After the server captures the corresponding event, it will respond accordingly according to the event type. For example, when the user evaluates the parameters and obtains the corresponding evaluation result, the contract will trigger the corresponding event, and the server needs to capture the corresponding event and synchronize the data in time.
  • Step 1 preparation and deployment of contract content.
  • the function interface and the corresponding event type need to be defined in the contract.
  • Step 2 Initialize the internal parameters of the contract. Including but not limited to the number of optimal parameter users n, evaluation matrix M, etc.
  • the optimal number of sets n affects the rate of model convergence. For example, a larger number n means that the server will choose more parameters to aggregate into the global model each time. If there are a large number of low-quality data sets or malicious users, May introduce negative effects to the global model. Therefore, the value of n should adapt to the number of users actually participating in collaborative training and the differences between data sets.
  • Step 3 Waiting to receive the user's evaluation value.
  • Step 4 After receiving the user's evaluation value, update the value of the corresponding element in the evaluation matrix M, and throw a corresponding event to notify the server of the latest user evaluation parameter. Then judge whether the evaluation among all users has been received. If there are still users who have not uploaded parameters in the current round, skip to step 3. Otherwise, when each round of training times out or all users have trained and evaluated the model, Go to step 5.
  • Step 5 elect the best parameter user set in the current round And notify the server. After the server receives the optimal parameter set in the latest round, according to The amount of parameter changes uploaded by each user within Aggregate to the global model, and notify all users to download the latest global model, and then start the next round of training process.
  • the sign of the end of the collaborative training process can be agreed by the user group. For example, when the accuracy of the model reaches the user's expected value, or when there are enough training rounds, the collaborative training process is stopped, and each user can download the latest update from the parameter server. Deep model. According to the size of the actual model, the parameter server needs to allocate suitable bandwidth for each user to ensure the continuity of the collaborative training process.
  • the storage of key data is realized by using the blockchain as a trusted infrastructure, and the trusted execution results of smart contracts are used to achieve consensus election of optimal parameters.
  • All participating users use a global parameter server as an intermediate bridge to interact with model parameter changes.
  • Users conduct collaborative training through parameter sharing, which not only allows the model to learn the characteristics of all data sources, but also protects data privacy.
  • Users elect the optimal parameters through the consensus contract to ensure the smooth convergence of the model.
  • Fig. 7 is a schematic structural diagram of a secure collaborative deep learning device based on blockchain according to an embodiment of the present application.
  • the blockchain-based secure collaborative deep learning device includes: an initialization module 100, a training module 200, an evaluation module 300, an update module 400, and an iteration module 500.
  • the initialization module 100 is used to obtain the global model, the optimal parameter change set and the evaluation matrix, and initialize the parameters of the global model, the optimal parameter change set and the evaluation matrix.
  • the training module 200 is used to obtain download instructions for the global model, and send the download instructions to multiple client terminals so that the multiple client terminals download the global model.
  • Each client terminal trains the global model according to the training data set to generate a parameter change set, And filter the parameter change set according to the preset method.
  • the evaluation module 300 is used to store the hash value of the parameter change set filtered by each client into the blockchain, generate the corresponding storage transaction number, and combine the filtered parameter change set and the corresponding stored transaction number Send to each client so that each client verifies and evaluates the received filtered parameter change set and the corresponding stored transaction number according to the verification data set to generate multiple evaluation values between the clients, Store multiple evaluation values in the blockchain.
  • the update module 400 is used to update the evaluation matrix according to multiple evaluation values, select the optimal parameter change set according to the updated evaluation matrix and the preset blockchain consensus contract, and compare the global parameter change set according to the optimal parameter change set.
  • the model is updated.
  • the iteration module 500 is used to iterate until the global model meets the preset condition.
  • This device enables users to conduct collaborative training without disclosing private data sets, which not only protects the privacy of each other’s data, but also allows the global model to learn the characteristics of all data sources through parameter sharing, improving the performance of the global model Accuracy and generalization ability.
  • each client trains the global model according to the training data set to generate a parameter change set, and filters the parameter change set according to a preset method, including:
  • Each user terminal trains the global model according to the local training data set, and calculates the parameter change set, the formula is:
  • ⁇ ′ i is the parameter value after global model training
  • ⁇ i is the parameter value before global model training
  • ⁇ i is the amount of global model parameter change
  • is the ratio of screening parameters
  • ⁇ g is the parameter set of the global model.
  • training module is also used for
  • Each client adds a time stamp to the filtered parameter change set and signs it.
  • the evaluation module includes: a verification unit,
  • the verification unit is used for each client to verify the received parameter change set after screening
  • the verification unit is specifically configured to verify whether the hash value of the filtered parameter change set correspondingly stored on the blockchain is consistent with the received filtered parameter change set according to the received storage transaction number.
  • update module is specifically used for
  • the client set with the optimal parameters is selected A collection of clients to obtain optimal parameters
  • the corresponding optimal parameter change amount set, and the global model is updated according to the optimal parameter change amount set,
  • mark Mi as all the evaluation values of the i-th client on other clients
  • sort M i in descending order
  • mark as According to The ranking position in, the score of the j-th client is:
  • m is the total number of participating clients
  • p j is the number of the j-th client
  • u i is the i-th user terminal
  • s(j; u i ) is the score of the j-th user terminal under the evaluation of u i , based on the total score
  • the secure collaborative deep learning device based on blockchain is used to realize the storage of key data by using the blockchain as a trusted infrastructure, and the trusted execution results of smart contracts are used to achieve consensus election of optimal parameters .
  • All participating users use a global parameter server as an intermediate bridge to interact with model parameter changes.
  • Users conduct collaborative training through parameter sharing, which not only allows the model to learn the characteristics of all data sources, but also protects data privacy.
  • Users elect the optimal parameters through the consensus contract to ensure smooth convergence of the model.
  • first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include at least one of the features.
  • a plurality of means at least two, such as two, three, etc., unless otherwise specifically defined.
  • the terms “installation”, “connected”, “connected”, “fixed” and other terms should be understood in a broad sense.
  • it can be a fixed connection or a detachable connection. Connected or integrated; it can be mechanically connected or electrically connected; it can be directly connected or indirectly connected through an intermediary, and it can be the internal communication between two components or the interaction between two components, unless otherwise Clearly defined.
  • the specific meaning of the above-mentioned terms in the feedback of this application can be understood according to specific circumstances.
  • the “on” or “under” of the first feature on the second feature may be in direct contact with the first and second features, or the first and second features may be through an intermediary. Indirect contact.
  • the “above”, “above” and “above” of the first feature on the second feature may mean that the first feature is directly above or obliquely above the second feature, or simply means that the level of the first feature is higher than the second feature.
  • the “below”, “below” and “below” of the second feature of the first feature may mean that the first feature is directly below or obliquely below the second feature, or it simply means that the level of the first feature is smaller than the second feature.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Medical Informatics (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Neurology (AREA)
  • Databases & Information Systems (AREA)
  • Information Transfer Between Computers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于区块链的安全协作深度学习方法及装置,其中,该方法包括:允许一个全局参数服务器收集每轮训练中用户提交的模型参数并维护全局模型,同时每个用户利用自身拥有的验证数据集评估上传参数,并通过智能合约实现最优参数共识,最终由全局参数服务器聚合每轮训练中的最优参数,得到最终协作训练出的全局模型。由此,在允许用户之间在不公开隐私数据集的前提下进行协作训练,保护了彼此数据的隐私性,通过参数共享的方式让全局模型能够学习到所有数据源的特征,提高全局模型的准确率以及泛化能力。

Description

基于区块链的安全协作深度学习方法及装置
相关申请的交叉引用
本申请要求清华大学于2019年05月07日提交的、申请反馈名称为“基于区块链的安全协作深度学习方法及装置”的、中国专利申请号“201910375181.9”的优先权。
技术领域
本申请反馈属于分布式机器学习技术领域,特别涉及一种基于区块链的安全协作深度学习方法及装置。
背景技术
深度学习作为机器学习的一个重要分支,在近十年的时间内发展迅速,但是受限于计算机的计算能力和算法本身存在梯度消失等问题,虽然深度学习在图像识别、声音识别和推荐***等研究领域取得了巨大突破,但是深度学习训练过程中往往需要投入大量的训练数据,才能得到理想的训练结果。一些小型研究机构或者普通的研究人员,在特定的研究课题中,往往只拥有有限的训练数据,而数据集问题往往是限制其算法研究和模型设计的重要障碍之一。研究表明,深度模型的训练中,数据的特征反映在模型的参数改变量上,因此协作深度学习让不同用户之间通过参数共享的方式共享数据源特征,而不用直接将数据集公开。但是,协作深度学习中需要有合适的机制保障参数更新的最优化,防止恶意用户或者低质量数据源对全局模型产生负面影响。
区块链可视为一个公开可信的分布式账本(或数据库),许多区块依次连接构成链式的存储结构,并通过共识机制保证数据记录的一致性和不可篡改特性。区块链上的智能合约是存储在区块链上的一段自动执行的电子合同代码,而智能合约的编程语言具有图灵完备性,可以根据需求编写对应功能的合约代码。外部应用可以通过调用合约代码的接口函数实现与区块链数据的交互,完成在协作深度学习中的最优参数共识等功能。
发明内容
本申请反馈旨在至少在一定程度上解决相关技术中的技术问题之一。
为此,本申请反馈的一个目的在于提出一种基于区块链的安全协作深度学习方法,该方法保证参与协作训练进程中,所有用户数据源的隐私特性、训练过程的安全性以及最终 训练模型的高泛化和高准确率。
本申请反馈的另一个目的在于提出一种基于区块链的安全协作深度学习装置。
为达到上述目的,本申请反馈一方面实施例提出了一种基于区块链的安全协作深度学习方法,包括:
S1,获取全局模型、最优参数改变量集合和评价矩阵,对所述全局模型的参数、所述最优参数改变量集合和所述评价矩阵进行初始化;
S2,获取所述全局模型的下载指令,将所述下载指令发送至多个用户端以使所述多个用户端下载所述全局模型,每个用户端根据训练数据集对全局模型进行训练生成参数改变量集合,并按照预设的方法对所述参数改变量集合进行筛选;
S3,将所述每个用户端筛选后的参数改变量集合的哈希值存入至区块链,生成对应的存储交易号码,将所述筛选后的参数改变量集合和对应的所述存储交易号码发送至所述每个用户端,以使所述每个用户端根据验证数据集对接收到的所述筛选后的参数改变量集合和对应的所述存储交易号码进行验证和评估后生成用户端之间的多个评价值,将所述多个评价值存入所述区块链;
S4,根据所述多个评价值对所述评价矩阵进行更新,根据更新后的评价矩阵和预设的区块链共识合约优选出所述最优参数改变量集合,根据所述最优参数改变量集合对所述全局模型进行更新;
S5,迭代S2、S3和S4对所述全局模型进行更新,直至所述全局模型满足预设条件,结束迭代过程。
本申请反馈实施例的基于区块链的安全协作深度学***稳收敛。
另外,根据本申请反馈上述实施例的基于区块链的安全协作深度学习方法还可以具有以下附加的技术特征:
进一步地,所述每个用户端根据训练数据集对所述全局模型进行训练生成参数改变量集合,并按照预设的方法对所述参数改变量集合进行筛选,包括:
所述每个用户端根据本地的训练数据集对所述全局模型进行训练,并计算出所述参数改变量集合,公式为:
Figure PCTCN2019114984-appb-000001
Δθ i=θ′ ii
其中,θ′ i为所述全局模型训练后的参数值,θ i为所述全局模型训练前的参数值,Δθ i为所述全局模型参数改变量;
将所述参数改变量按照降序排列,筛选出变化量最大的一组参数,生成所述筛选后的参数改变量集合,公式为:
Figure PCTCN2019114984-appb-000002
其中,
Figure PCTCN2019114984-appb-000003
为参数改变量集合,γ为筛选参数比例,θ g为所述全局模型的参数集合。
进一步地,所述S2,还包括:
所述每个用户端对所述筛选后的参数改变量集合附加时间戳并进行签名。
进一步地,所述S3,还包括:
所述每个用户端对接收到的所述筛选后的参数改变量集合进行验证;
根据接收到的所述存储交易号码验证所述区块链上对应存储的所述筛选后的参数改变量集合的哈希值与接收到的所述筛选后的参数改变量集合是否一致。
进一步地,S4进一步包括:
根据用户端之间的所述多个评价值更新所述评价矩阵M;
根据所述评价矩阵M和所述预设的区块链共识合约优选出最优参数的用户端集合
Figure PCTCN2019114984-appb-000004
获取所述最优参数的用户端集合
Figure PCTCN2019114984-appb-000005
对应的所述最优参数改变量集合,根据所述最优参数改变量集合对所述全局模型进行更新,
具体步骤为:记M i,:为第i个用户端对其它用户端的所有评价值,降序排列M i,:,记为
Figure PCTCN2019114984-appb-000006
根据在
Figure PCTCN2019114984-appb-000007
中的排列位置,第j个用户端的得分为:
Figure PCTCN2019114984-appb-000008
其中,m为总参与用户端个数,p j为第j个用户端在
Figure PCTCN2019114984-appb-000009
中的位置,第j个用户端的总分为:
Figure PCTCN2019114984-appb-000010
其中,u i为第i个用户端,
Figure PCTCN2019114984-appb-000011
为所有用户端集合,s(j;u i)为第j个用户端在u i的评价下所得分数,基于总得分,所述最优参数的用户端集合:
Figure PCTCN2019114984-appb-000012
为达到上述目的,本申请反馈另一方面实施例提出了一种基于区块链的安全协作深度学习装置,包括:
初始化模块,用于获取全局模型、最优参数改变量集合和评价矩阵,对所述全局模型的参数、所述最优参数改变量集合和所述评价矩阵进行初始化;
训练模块,用于所述获取全局模型的下载指令,将所述下载指令发送至多个用户端以使所述多个用户端下载所述全局模型,每个用户端根据训练数据集对所述全局模型进行训练生成参数改变量集合,并按照预设的方法对所述参数改变量集合进行筛选;
评估模块,用于将所述每个用户端筛选后的参数改变量集合的哈希值存入至区块链,生成对应的存储交易号码,将所述筛选后的参数改变量集合和对应的所述存储交易号码发送至所述每个用户端,以使所述每个用户端根据验证数据集对接收到的所述筛选后的参数改变量集合和对应的所述存储交易号码进行验证和评估后生成用户端之间的多个评价值,将所述多个评价值存入所述区块链;
更新模块,用于根据所述多个评价值对所述评价矩阵进行更新,根据更新后的评价矩阵和预设的区块链共识合约优选出所述最优参数改变量集合,根据所述最优参数改变量集合对所述全局模型进行更新;
迭代模块,用于通过迭代直至所述全局模型满足预设条件。
本申请反馈实施例的基于区块链的安全协作深度学***稳收敛。
另外,根据本申请反馈上述实施例的基于区块链的安全协作深度学习装置还可以具有以下附加的技术特征:
进一步地,所述每个用户端根据训练数据集对所述全局模型进行训练生成参数改变量集合,并按照预设的方法对所述参数改变量集合进行筛选,包括:
所述每个用户端根据本地的训练数据集对所述全局模型进行训练,并计算出所述参数改变量集合,公式为:
Figure PCTCN2019114984-appb-000013
Δθ i=θ′ ii
其中,θ′ i为所述全局模型训练后的参数值,θ i为所述全局模型训练前的参数值,Δθ i为所述全局模型参数改变量;
将所述参数改变量按照降序排列,筛选出变化量最大的一组参数,生成所述筛选后的参数改变量集合,公式为:
Figure PCTCN2019114984-appb-000014
其中,
Figure PCTCN2019114984-appb-000015
为参数改变量集合,γ为筛选参数比例,θ g为所述全局模型的参数集合。
进一步地,所述训练模块,还用于,
所述每个用户端对所述筛选后的参数改变量集合附加时间戳并进行签名。
进一步地,所述评估模块,包括:验证单元,
所述验证单元,用于所述每个用户端对接收到的所述筛选后的参数改变量集合进行验证;
所述验证单元,具体用于根据接收到的所述存储交易号码验证所述区块链上对应存储的所述筛选后的参数改变量集合的哈希值与接收到的所述筛选后的参数改变量集合是否一致。
进一步地,所述更新模块,具体用于,
根据用户端之间的所述多个评价值更新所述评价矩阵M;
根据所述评价矩阵M和所述预设的区块链共识合约优选出最优参数的用户端集合
Figure PCTCN2019114984-appb-000016
获取所述最优参数的用户端集合
Figure PCTCN2019114984-appb-000017
对应的所述最优参数改变量集合,根据所述最优参数改变量集合对所述全局模型进行更新,
具体步骤为:记M i,:为第i个用户端对其它用户端的所有评价值,降序排列M i,:,记为
Figure PCTCN2019114984-appb-000018
根据在
Figure PCTCN2019114984-appb-000019
中的排列位置,第j个用户端的得分为:
Figure PCTCN2019114984-appb-000020
其中,m为总参与用户端个数,p j为第j个用户端在
Figure PCTCN2019114984-appb-000021
中的位置,第j个用户端的总分为:
Figure PCTCN2019114984-appb-000022
其中,u i为第i个用户端,
Figure PCTCN2019114984-appb-000023
为所有用户端集合,s(j;u i)为第j个用户端在u i的评价下所得分数,基于总得分,所述最优参数的用户端集合:
Figure PCTCN2019114984-appb-000024
本申请反馈附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请反馈的实践了解到。
附图说明
本申请反馈上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明 显和容易理解,其中:
图1为根据本申请反馈一个实施例的基于区块链的安全协作深度学习方法流程图;
图2为根据本申请反馈另一个实施例的基于区块链的安全协作深度学习方法流程图;
图3为根据本申请反馈一个实施例的协作学习实体及之间连接关系示意图;
图4为根据本申请反馈一个实施例的用户参与协作时的操作时序图;
图5为根据本申请反馈一个实施例的参数服务器与区块链之间的交互示意图;
图6为根据本申请反馈一个实施例的智能合约内容逻辑流程图;
图7为根据本申请反馈一个实施例的基于区块链的安全协作深度学习装置结构示意图。
具体实施方式
下面详细描述本申请反馈的实施例,实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请反馈,而不能理解为对本申请反馈的限制。
在目前深度学习发展的大背景下,深度模型的结构、规模不断增加,而对应训练过程需要的数据需求也与日俱增。对于同一训练目标,往往单一数据源的训练无法得到泛化能力强、准确率高的深度模型。但是,由于数据源数据隐私性等问题,交换隐私数据集的行为将受到法律的惩罚。本发明提出的一种基于区块链的协作深度学习方法,针对在实际中各种业务场景,利用不同用户的训练目标以及训练数据集之间的相关特性,设计出了一套在彼此无需公开数据集的前提下,能够协作训练同一深度模型的技术机制。该机制不仅可以让具有相同训练目标的用户之间能够共同协同训练得到泛化能力强、准确率高的深度模型,并且允许用户无需公开数据集,保护了数据集的隐私性。同时,基于区块链的可信存储特性,保证协作学习过程中任何一个参与实体无法恶意篡改协作进程中的中间参数,利用链上合约的共识机制,在训练用户之间达成最优参数的共识选择,保证训练进程中只有所有用户共识出的最优参数能够更新全局模型,即本技术方案能够容忍部分拥有低质数据集参与者的存在,保证了全局深度模型的稳定收敛。
下面参照附图描述根据本申请反馈实施例提出的基于区块链的安全协作深度学习方法及装置。
首先将参照附图描述根据本申请反馈实施例提出的基于区块链的安全协作深度学习方法。
图1为根据本申请反馈一个实施例的基于区块链的安全协作深度学习方法流程图。
如图1所示,该基于区块链的安全协作深度学习方法包括以下步骤:
步骤S1,获取全局模型、最优参数改变量集合和评价矩阵,对全局模型的参数、 最优参数改变量集合和评价矩阵进行初始化。
具体地,通过协作学习的方式训练一个全局模型,首先对全局模型的参数进行初始化化,同时初始化后续用到的最优参数改变量集合。
可以理解的是,通过一个全局参数服务器作为中间桥梁实现用户端与区块链之间的信息交互,以协作学习的方式实现全局模型的训练更新。
具体地,全局参数服务器根据参与用户的设定初始化全局模型,并将最优参数量聚合到全局模型。假定有m个用户,选举其中n(n<m)个用户的参数作为每轮训练的最优集合,用于更新全局模型,其中全局模型共有k个参数。
初始化最优参数的用户列表(可以为空或者任意集合),以及随机初始化全局模型参数。
Figure PCTCN2019114984-appb-000025
Figure PCTCN2019114984-appb-000026
步骤S2,获取全局模型的下载指令,将下载指令发送至多个用户端以使多个用户端下载全局模型,每个用户端根据训练数据集对全局模型进行训练生成参数改变量集合,并按照预设的方法对参数改变量集合进行筛选。
具体地,通过全局参数服务器向所有的用户端发出下载指令,所有的用户端下载初始化后的全局模型,开始第一轮的协作训练进程。
所有的用户端在下载全局模型后,每个客户端利用自身本地存储的训练数据集对全局模型进行训练生成参数改变量集合,其中,全局模型包含多个参数,多个参数在训练后会发生改变。每个用户端还会按照预设的方法对生成的参数改变量集合进行筛选。
需要说明的是,对全局模型进行训练可以使用随机梯度下降SGD的方法,也可以使用其它方法进行训练,根据实际需要选择训练方法进行训练。
进一步地,每个用户端根据训练数据集对全局模型进行训练生成参数改变量集合,并按照预设的方法对参数改变量集合进行筛选,包括:
每个用户端根据本地的训练数据集对全局模型进行训练,并计算出参数改变量集合,公式为:
Figure PCTCN2019114984-appb-000027
Δθ i=θ′ ii
其中,θ′ i为全局模型训练后的参数值,θ i为全局模型训练前的参数值,Δθ i为全局模型参数改变量;
将参数改变量按照降序排列,筛选出变化量最大的一组参数,生成筛选后的参数改变 量集合,公式为:
Figure PCTCN2019114984-appb-000028
其中,
Figure PCTCN2019114984-appb-000029
为参数改变量集合,γ为筛选参数比例,θ g为全局模型的参数集合。
由述可知,在筛选参数改变量时采用的是降序排列,选择变化量最大的一组参数的方法,也可以根据实际需要选择其它的方法进行筛选。
步骤S3,将每个用户端筛选后的参数改变量集合的哈希值存入至区块链,生成对应的存储交易号码,将筛选后的参数改变量集合和对应的存储交易号码发送至每个用户端,以使每个用户端根据验证数据集对接收到的筛选后的参数改变量集合和对应的存储交易号码进行验证和评估后生成用户端之间的多个评价值,将多个评价值存入区块链。
进一步地,S3,还包括:
每个用户端对接收到的筛选后的参数改变量集合进行验证;
根据接收到的存储交易号码验证区块链上对应存储的筛选后的参数改变量集合的哈希值与接收到的筛选后的参数改变量集合是否一致。
可以理解的是,所有的用户端在筛选出参数改变集合后,将筛选出参数改变集合的上传至全局参数服务器,并对上传的参数进行时间戳签名,通过全局参数服务器将上传的参数改变量集合的哈希值存入区块链中,并得到一个对应的存储交易号码,通过存储交易号码可以查询区块链上存储的数据预上传至全局参数服务器的数据是否一致,保证全局参数服务器真实上传了每个用户筛选出的参数改变量集合。
全局参数服务器接收用户上传的训练得到的参数改变量集合,并存储在本地,并将参数改变量集合的哈希值Hash_para记录到区块链,得到对应的存储交易号码,记做Tx-ID。其中,H为哈希函数。
Figure PCTCN2019114984-appb-000030
Figure PCTCN2019114984-appb-000031
进一步地,在确定所有的用户端全部上传了筛选的参数改变量集合到区块链并生成对应的存储交易号码后,全局参数服务器再将所有用户端上传的参数改变量集合和对应的存储交易号码发送给所有的用户端,所有的用户端利用本地存储的验证数据集对接收到的参数改变量集合和对应的存储交易号码进行验证和打分得到评价值,其中,每一个客户端对其它的客户端进行打分,得到一一对应的评价值,并存储在区块链中。
需要说明的是,在用户端上传参数改变量集合时,若存在用户端没有上传参数改变量集合时,则等待未上传的用户端上传完成后,再将所有用户端上传的参数改变量集合和对应的存储交易号码发送给所有的用户端。
具体地,利用本地验证数据集计算参数更新对应的F1-score值,调用智能合约的函数,记录至区块链,作为最佳参数选择的依据。
步骤S4,根据多个评价值对评价矩阵进行更新,根据更新后的评价矩阵和预设的区块链共识合约优选出最优参数改变量集合,根据最优参数改变量集合对全局模型进行更新。
进一步地,S4进一步包括:
根据用户端之间的多个评价值更新评价矩阵M;
根据评价矩阵M和预设的区块链共识合约优选出最优参数的用户端集合
Figure PCTCN2019114984-appb-000032
获取最优参数的用户端集合
Figure PCTCN2019114984-appb-000033
对应的最优参数改变量集合,根据最优参数改变量集合对全局模型进行更新。
基于多赢选举规则和用户上传的F1-score值,为每个用户的参数改变量投票打分,分数最高的n个用户被记为最优参数集合,然后服务器用其对应上传的参数改变量更新模型。
具体地,协作学习进程开始时,初始化评价矩阵M={0} m*m
接收用户之间的评价值,并更新评价矩阵M。M ij则表示第i个用户对第j个用户的评价值。
当所有用户进行完彼此参数改变量的评价之后,根据M选择出最优参数的用户集合
Figure PCTCN2019114984-appb-000034
并通告参数服务器进行模型更新。具体选择方法如下:
记M i,:为第i个用户对其他用户的所有评价值,首先降序排列M i,:,记为
Figure PCTCN2019114984-appb-000035
根据在
Figure PCTCN2019114984-appb-000036
中的排列位置,第j个用户的得分为:
Figure PCTCN2019114984-appb-000037
其中,m为总参与用户数,p j为第j个用户在
Figure PCTCN2019114984-appb-000038
中的位置。根据上式,第j个用户的总分为:
Figure PCTCN2019114984-appb-000039
其中u i为第i个用户,
Figure PCTCN2019114984-appb-000040
为所有用户集合,s(j;u i)为第j个用户在u i的评价下所得分数。基于总得分,最优参数的用户集合:
Figure PCTCN2019114984-appb-000041
其中
Figure PCTCN2019114984-appb-000042
即选出在所有
Figure PCTCN2019114984-appb-000043
中得分最高的n个最优用户集合。
可以理解的是,在得到最优参数改变量集合时,首先得到最优参数的用户集合,再根据最优参数用户集合对应的参数改变量集合得到最优参数改变量集合,利用最优参数改变 量集合更新全局模型。
当收到智能合约最终的共识结果
Figure PCTCN2019114984-appb-000044
后,更新最优参数的用户集合,并聚合所有对应用户的参数改变量。
Figure PCTCN2019114984-appb-000045
Figure PCTCN2019114984-appb-000046
其中,
Figure PCTCN2019114984-appb-000047
是模型对应的一个参数,
Figure PCTCN2019114984-appb-000048
是每个最优用户上传对应参数的改变量。全局参数服务器首先平均所有用户模型对θ i的改变量,得到的值加到对应参数上,并循环执行该操作遍历所有的模型参数,最终实现对模型的更新。
步骤S5,迭代S2、S3和S4对全局模型进行更新,直至全局模型满足预设条件,结束迭代过程。
具体地,在完成一次全局模型的训练更新后,全局参数服务器再向用户端发送下载最新全局模型的指令,再进行训练更新,通过多次的迭代过程,直至全局模型满足预设条件,比如当模型准确率达到用户的预期值,或者当训练轮数足够多时,结束迭代,结束协作训练的标志可以根据实际需要进行设定。
本申请反馈的协作学***稳收敛。
如图2所示,展示了***实例初始化部署的流程图,主要包括5个步骤:
步骤1,参与协作学习训练的m个用户,协商出一个共同的深度模型结构。该模型由参数服务器全局维护。
步骤2,参数服务器进行初始化。主要包括两个部分,首先初始化最优参数用户列表,然后随机初始化深度模型,并通告所有参与用户进行下载。
步骤3,部署共识合约。合约首先初始化评价矩阵M,其次,共识合约内需要设定一些重要参数,例如选择每轮训练中的最优参数用户的个数n。
步骤4,所有参与协作训练的用户,从参数服务器下载初始化的深度模型。注意,应该保证所有用户起始训练的模型结构保持一致,因此需要有全局参数服务器对模型进行随机初始化,所有用户在同一随机初始模型基础上进行训练。
步骤5,每个用户准备好训练数据集和验证数据集,并利用训练数据集训练初始化的深 度模型,启动协作学习进程。
本申请反馈的方法让具有相同训练目标的用户之间能够共同协同训练目标模型,并不失对隐私数据的保护,允许一个全局参数服务器收集每轮训练中用户提交的模型参数并维护全局模型,同时每个用户利用自身拥有的验证数据集评估上传参数,并通过智能合约实现最优参数共识,最终由全局参数服务器聚合每轮训练中的最优参数,得到最终协作训练出的全局模型。
如图3所示,展示了涉及的3个实体以及在协作学习进程中各自的交互流程,其中每个实体的具体实施职能概要如下:
第一部分为用户群组(多个用户端)。每个用户拥有各自的训练数据集以及验证数据集,并通过随机梯度下降等方式进行本地训练。当本地训练结束后,用户选择相应的参数改变列表,附加时间戳并签名后上传至参数服务器,防止他人复制(或重放)对应的参数。同时,每当有新的参数改变列表上传时,所有用户都应下载最新的参数改变量,并用自身拥有的验证数据集,计算出评价值F1-score(或其他验证方法得到对应的评价值),然后将对应的评价结果同步至区块链智能合约。需要注意的是,在本申请反馈的协作学习方法中,每个用户应该具有相同的训练目标,例如相同的深度模型。
第二部分为参数服务器。参数服务器与用户和区块链进行数据交互,如模型参数的上传下载、参数对应哈希值的交易广播等。此外,参数服务器还维护全局模型,并使用最优参数集合中的用户上传的参数改变量来更新全局模型。同时,参数服务器应存储每个用户的公钥,用于验证用户的签名数据。为防止参数服务器被攻击,从而出现参数的恶意篡改等现象,所有用户上传的参数改变哈希需要附加到区块链交易的数据字段中,并返回给每个用户对应下载参数的交易号码,即Tx-ID,用于验证参数的一致性,防止上述恶意行为的出现。
第三部分为区块链及链上智能合约。每个用户上传的参数改变量哈希值,需要附加到交易的数据段中,并广播至区块链网络,以保证记录的哈希值不能被服务器篡改。其中,由于现有的公有链网络的性能受限,成本较高,因此本实施方案建议使用的是性能较好的联盟链,例如使用超级账本fabric等开源联盟链项目。同时,所搭建的区块链必须支持智能合约的运行,比如以太坊支持solidity合约,fabric则支持Golang,java等高级编程语言。智能合约是一种计算机协议,旨在对合同的协商或履行进行数字化,从而便于验证或强制执行。智能合约允许在没有第三方的情况下进行可信的交易。这些交易是可跟踪和不可逆转的。方案中的智能合约必须运行与区块链之上,区块链为智能合约提供一个可信的执行环境。因此,本申请反馈方法中的共识合约基于上述特性,保证用户群组内能够形成对最优参数的共识,从而保证全局模型平稳收敛,而避免受到恶意用户或者低质量参数的 影响。
如图4所示,展示了用户群组中每个用户在本实施例中所需要的操作时序流程图,主要包括两个阶段,每个阶段包括6个步骤:
训练阶段:
步骤1,在每轮训练开始时,用户需要从服务器下载最新的全局模型,用于本轮训练。
步骤2,用户利用自身的验证数据集进行本地训练。所有用户的训练方法需要保持一致,例如利用随机梯度下降的方法。利用本地的验证数据集对模型进行训练后,需要并计算出每个参数改变量
Figure PCTCN2019114984-appb-000049
步骤3,将
Figure PCTCN2019114984-appb-000050
降序排列,挑选出变化量最大的一组参数改变
Figure PCTCN2019114984-appb-000051
进行上传。注意,此时挑选的比例会影响***运行的效率,挑选比例记做γ,即
Figure PCTCN2019114984-appb-000052
比例γ越大,上传模型的更新程度越高,可以稍微提高全局模型收敛的速率,但是,相应的通信带宽也就越大,因为每个客户端需要与服务器交互更多的参数。因此,建议γ可以位于区间[0.01,0.2]内,根据实际模型参数整体的大小调整上传比例,综合考虑通信效率和收敛速率两个重要因素。
步骤4,将
Figure PCTCN2019114984-appb-000053
以及对应时间戳签名后,上传至服务器,防止简单的重放攻击等恶意行为。
步骤5,服务器反馈记录上链的交易Tx-ID,用户收到Tx-ID后,验证链上存储是否与上传参数保持一致,这样防止服务器篡改参数后下发给其他用户。
步骤6,服务器通告其他用户下载最新的参数更新。
验证阶段:
步骤1,当有其它用户上传新的参数更新后,服务器通告当前用户下载并评估上传参数。
步骤2,用户下载参数更新以及对应的区块链交易Tx-ID。
步骤3,查询链上存储的参数哈希Hash_para,对比下载的参数更新对应的哈希值,保证下载的参数不会被服务器恶意篡改。
步骤4,利用本地验证数据集对参数进行评估。需要注意,评估方法需要具有一致性,例如利用常用的F1、准确率等评价方法,验证其他用户上传参数的优劣。
步骤5,用户将对应的评价值同步至区块链共识合约。共识合约需要留出对应的合约接口,供用户调用。
步骤6,合约事件触发,通告服务器对应的评估结果。此处需要服务器监听区块链上的合约事件,一旦事件触发,服务器捕捉到相应的事件类型,按照事件分类执行对应的回调响应。例如此处一旦用户提交了最新的评估值,服务器需要捕捉对应的合约事件,保证内容与链上数据的一致性。
如图5所示,展示了参数服务器与区块链的交互流程图,主要分为4个步骤:
步骤1,当用户上传参数时,服务器将对应参数的哈希值Hash_para附加在交易数据段记录在区块链上。
步骤2,区块链反馈对应的交易Tx-ID。Tx-ID是一段哈希值,唯一标识一个交易,服务器将此Tx-ID反馈给用户,用于验证下载参数的一致性。
步骤3,服务器注册监听服务。共识合约需要编写相应的合约事件,服务器监听对应的事件回调响应,对事件作出相应的处理。
步骤4,合约函数被用户调用,抛出对应的事件,由服务器捕捉。服务器捕捉到相应的事件后,根据事件类型,作出对应的响应处理。例如当用户评估参数得到相应的评估结果后,合约会触发对应的事件,服务器需要捕捉对应事件,及时同步数据。
如图6所示,展示了共识合约的数据处理流程图,主要有5个步骤:
步骤1,合约内容的编写及部署。合约内需要定义函数接口以及对应的事件类型。
步骤2,初始化合约内部的参数。包括但不限于最优参数用户的个数n、评价矩阵M等。最优集合个数n影响模型收敛的速率,例如,较大的个数n代表每次服务器会选用较多的参数聚合至全局模型,如果低质数据集数量较多或者有恶意用户的存在,可能会将负面影响引入到全局模型。因此,n的取值应适应实际参与协作训练的用户的数量及数据集之间的差异性。
步骤3,等待接收用户的评估值。
步骤4,当收到用户的评估值后,更新评价矩阵M中对应元素的数值,并抛出对应事件,通告服务器最新的用户评价参数。然后判断是否收到了所有用户之间的评价,如果当前轮次中仍有用户没有上传参数,则跳转至步骤3,否则当每轮训练超时或所有用户均已对模型进行了训练与评估,跳转步骤5。
步骤5,根据评价矩阵M,选举出当前轮次中的最优参数用户集合
Figure PCTCN2019114984-appb-000054
并通告服务器。服务器收到最新轮次中的最优参数集合后,根据
Figure PCTCN2019114984-appb-000055
内每个用户上传的参数改变量
Figure PCTCN2019114984-appb-000056
聚合至全局模型,并通知所有用户下载最新的全局模型,然后开启下一轮的训练进程。
协作训练进程结束的标志可以由用户群组协定,例如,当模型准确率达到用户的预期值,或者当训练轮数足够多时,停止协作训练进程,每个用户可以从参数服务器中下载得到最新的深度模型。根据实际模型的大小,参数服务器需要为每个用户分配适合的带宽,保证协作训练进程的连续性。
根据本申请反馈实施例提出的基于区块链的安全协作深度学***稳收敛。
其次参照附图描述根据本申请反馈实施例提出的基于区块链的安全协作深度学习装置。
图7为根据本申请反馈一个实施例的基于区块链的安全协作深度学习装置结构示意图。
如图7所示,该基于区块链的安全协作深度学习装置包括:初始化模块100、训练模块200、评估模块300、更新模块400和迭代模块500。
其中,初始化模块100,用于获取全局模型、最优参数改变量集合和评价矩阵,对全局模型的参数、最优参数改变量集合和评价矩阵进行初始化。
训练模块200,用于获取全局模型的下载指令,将下载指令发送至多个用户端以使多个用户端下载全局模型,每个用户端根据训练数据集对全局模型进行训练生成参数改变量集合,并按照预设的方法对参数改变量集合进行筛选。
评估模块300,用于将每个用户端筛选后的参数改变量集合的哈希值存入至区块链,生成对应的存储交易号码,将筛选后的参数改变量集合和对应的存储交易号码发送至每个用户端,以使每个用户端根据验证数据集对接收到的筛选后的参数改变量集合和对应的存储交易号码进行验证和评估后生成用户端之间的多个评价值,将多个评价值存入区块链。
更新模块400,用于根据多个评价值对评价矩阵进行更新,根据更新后的评价矩阵和预设的区块链共识合约优选出最优参数改变量集合,根据最优参数改变量集合对全局模型进行更新。
迭代模块500,用于通过迭代直至全局模型满足预设条件。
该装置使得用户之间在不公开隐私数据集的前提下进行协作训练,不仅保护了彼此数据的隐私性,还通过参数共享的方式让全局模型能够学习到所有数据源的特征,提高全局模型的准确率以及泛化能力。
进一步地,每个用户端根据训练数据集对全局模型进行训练生成参数改变量集合,并按照预设的方法对参数改变量集合进行筛选,包括:
每个用户端根据本地的训练数据集对全局模型进行训练,并计算出参数改变量集合,公式为:
Figure PCTCN2019114984-appb-000057
Δθ i=θ′ ii
其中,θ′ i为全局模型训练后的参数值,θ i为全局模型训练前的参数值,Δθ i为全局模型参数改变量;
将参数改变量按照降序排列,筛选出变化量最大的一组参数,生成筛选后的参数改变量集合,公式为:
Figure PCTCN2019114984-appb-000058
其中,
Figure PCTCN2019114984-appb-000059
为参数改变量集合,γ为筛选参数比例,θ g为全局模型的参数集合。
进一步地,训练模块,还用于,
每个用户端对筛选后的参数改变量集合附加时间戳并进行签名。
进一步地,评估模块,包括:验证单元,
验证单元,用于每个用户端对接收到的筛选后的参数改变量集合进行验证;
验证单元,具体用于根据接收到的存储交易号码验证区块链上对应存储的筛选后的参数改变量集合的哈希值与接收到的筛选后的参数改变量集合是否一致。
进一步地,更新模块,具体用于,
根据用户端之间的多个评价值更新评价矩阵M;
根据评价矩阵M和预设的区块链共识合约优选出最优参数的用户端集合
Figure PCTCN2019114984-appb-000060
获取最优参数的用户端集合
Figure PCTCN2019114984-appb-000061
对应的最优参数改变量集合,根据最优参数改变量集合对全局模型进行更新,
具体步骤为:记M i,:为第i个用户端对其它用户端的所有评价值,降序排列M i,:,记为
Figure PCTCN2019114984-appb-000062
根据在
Figure PCTCN2019114984-appb-000063
中的排列位置,第j个用户端的得分为:
Figure PCTCN2019114984-appb-000064
其中,m为总参与用户端个数,p j为第j个用户端在
Figure PCTCN2019114984-appb-000065
中的位置,第j个用户端的总分为:
Figure PCTCN2019114984-appb-000066
其中,u i为第i个用户端,
Figure PCTCN2019114984-appb-000067
为所有用户端集合,s(j;u i)为第j个用户端在u i的评价下所得分数,基于总得分,最优参数的用户端集合:
Figure PCTCN2019114984-appb-000068
需要说明的是,前述对基于区块链的安全协作深度学习方法实施例的解释说明也适用于该实施例的装置,此处不再赘述。
根据本申请反馈实施例提出的基于区块链的安全协作深度学***稳收敛。
在本申请反馈的描述中,需要理解的是,术语“中心”、“纵向”、“横向”、“长度”、“宽度”、“厚度”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”“内”、“外”、“顺时针”、“逆时针”、“轴向”、“径向”、“周向”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本申请反馈和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本申请反馈的限制。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请反馈的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
在本申请反馈中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或成一体;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系,除非另有明确的限定。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本申请反馈中的具体含义。
在本申请反馈中,除非另有明确的规定和限定,第一特征在第二特征“上”或“下”可以是第一和第二特征直接接触,或第一和第二特征通过中间媒介间接接触。而且,第一特征在第二特征“之上”、“上方”和“上面”可是第一特征在第二特征正上方或斜上方,或仅仅表示第一特征水平高度高于第二特征。第一特征在第二特征“之下”、“下方”和“下面”可以是第一特征在第二特征正下方或斜下方,或仅仅表示第一特征水平高度小于第二特征。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请反馈的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
尽管上面已经示出和描述了本申请反馈的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请反馈的限制,本领域的普通技术人员在本申请反馈的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (10)

  1. 一种基于区块链的安全协作深度学习方法,其特征在于,包括以下步骤:
    S1,获取全局模型、最优参数改变量集合和评价矩阵,对所述全局模型的参数、所述最优参数改变量集合和所述评价矩阵进行初始化;
    S2,获取所述全局模型的下载指令,将所述下载指令发送至多个用户端以使所述多个用户端下载所述全局模型,每个用户端根据训练数据集对全局模型进行训练生成参数改变量集合,并按照预设的方法对所述参数改变量集合进行筛选;
    S3,将所述每个用户端筛选后的参数改变量集合的哈希值存入至区块链,生成对应的存储交易号码,将所述筛选后的参数改变量集合和对应的所述存储交易号码发送至所述每个用户端,以使所述每个用户端根据验证数据集对接收到的所述筛选后的参数改变量集合和对应的所述存储交易号码进行验证和评估后生成用户端之间的多个评价值,将所述多个评价值存入所述区块链;
    S4,根据所述多个评价值对所述评价矩阵进行更新,根据更新后的评价矩阵和预设的区块链共识合约优选出所述最优参数改变量集合,根据所述最优参数改变量集合对所述全局模型进行更新;
    S5,迭代S2、S3和S4对所述全局模型进行更新,直至所述全局模型满足预设条件,结束迭代过程。
  2. 根据权利要求1所述的方法,其特征在于,所述每个用户端根据训练数据集对所述全局模型进行训练生成参数改变量集合,并按照预设的方法对所述参数改变量集合进行筛选,包括:
    所述每个用户端根据本地的训练数据集对所述全局模型进行训练,并计算出所述参数改变量集合,公式为:
    Figure PCTCN2019114984-appb-100001
    Δθ i=θ′ ii
    其中,θ′ i为所述全局模型训练后的参数值,θ i为所述全局模型训练前的参数值,Δθ i为所述全局模型参数改变量;
    将所述参数改变量按照降序排列,筛选出变化量最大的一组参数,生成所述筛选后的参数改变量集合,公式为:
    Figure PCTCN2019114984-appb-100002
    其中,
    Figure PCTCN2019114984-appb-100003
    为参数改变量集合,γ为筛选参数比例,θ g为所述全局模型的参数集合。
  3. 根据权利要求1所述的方法,其特征在于,所述S2,还包括:
    所述每个用户端对所述筛选后的参数改变量集合附加时间戳并进行签名。
  4. 根据权利要求1所述的方法,其特征在于,所述S3,还包括:
    所述每个用户端对接收到的所述筛选后的参数改变量集合进行验证;
    根据接收到的所述存储交易号码验证所述区块链上对应存储的所述筛选后的参数改变量集合的哈希值与接收到的所述筛选后的参数改变量集合是否一致。
  5. 根据权利要求1所述的方法,其特征在于,S4进一步包括:
    根据用户端之间的所述多个评价值更新所述评价矩阵M;
    根据所述评价矩阵M和所述预设的区块链共识合约优选出最优参数的用户端集合
    Figure PCTCN2019114984-appb-100004
    获取所述最优参数的用户端集合
    Figure PCTCN2019114984-appb-100005
    对应的所述最优参数改变量集合,根据所述最优参数改变量集合对所述全局模型进行更新,
    具体步骤为:记M i,:为第i个用户端对其它用户端的所有评价值,降序排列M i,:,记为
    Figure PCTCN2019114984-appb-100006
    根据在
    Figure PCTCN2019114984-appb-100007
    中的排列位置,第j个用户端的得分为:
    Figure PCTCN2019114984-appb-100008
    其中,m为总参与用户端个数,p j为第j个用户端在
    Figure PCTCN2019114984-appb-100009
    中的位置,第j个用户端的总分为:
    Figure PCTCN2019114984-appb-100010
    其中,u i为第i个用户端,
    Figure PCTCN2019114984-appb-100011
    为所有用户端集合,s(j;u i)为第j个用户端在u i的评价下所得分数,基于总得分,所述最优参数的用户端集合:
    Figure PCTCN2019114984-appb-100012
  6. 一种基于区块链的安全协作深度学习装置,其特征在于,包括:
    初始化模块,用于获取全局模型、最优参数改变量集合和评价矩阵,对所述全局模型的参数、所述最优参数改变量集合和所述评价矩阵进行初始化;
    训练模块,用于所述获取全局模型的下载指令,将所述下载指令发送至多个用户端以使所述多个用户端下载所述全局模型,每个用户端根据训练数据集对所述全局模型进行训练生成参数改变量集合,并按照预设的方法对所述参数改变量集合进行筛选;
    评估模块,用于将所述每个用户端筛选后的参数改变量集合的哈希值存入至区块链,生成对应的存储交易号码,将所述筛选后的参数改变量集合和对应的所述存储交易号码发送至所述每个用户端,以使所述每个用户端根据验证数据集对接收到的所述筛选后的参数 改变量集合和对应的所述存储交易号码进行验证和评估后生成用户端之间的多个评价值,将所述多个评价值存入所述区块链;
    更新模块,用于根据所述多个评价值对所述评价矩阵进行更新,根据更新后的评价矩阵和预设的区块链共识合约优选出所述最优参数改变量集合,根据所述最优参数改变量集合对所述全局模型进行更新;
    迭代模块,用于通过迭代直至所述全局模型满足预设条件。
  7. 根据权利要求6所述的装置,其特征在于,所述每个用户端根据训练数据集对所述全局模型进行训练生成参数改变量集合,并按照预设的方法对所述参数改变量集合进行筛选,包括:
    所述每个用户端根据本地的训练数据集对所述全局模型进行训练,并计算出所述参数改变量集合,公式为:
    Figure PCTCN2019114984-appb-100013
    Δθ i=θ′ ii
    其中,θ′ i为所述全局模型训练后的参数值,θ i为所述全局模型训练前的参数值,Δθ i为所述全局模型参数改变量;
    将所述参数改变量按照降序排列,筛选出变化量最大的一组参数,生成所述筛选后的参数改变量集合,公式为:
    Figure PCTCN2019114984-appb-100014
    其中,
    Figure PCTCN2019114984-appb-100015
    为参数改变量集合,γ为筛选参数比例,θ g为所述全局模型的参数集合。
  8. 根据权利要求6所述的装置,其特征在于,所述训练模块,还用于,
    所述每个用户端对所述筛选后的参数改变量集合附加时间戳并进行签名。
  9. 根据权利要求6所述的装置,其特征在于,所述评估模块,包括:验证单元,
    所述验证单元,用于所述每个用户端对接收到的所述筛选后的参数改变量集合进行验证;
    所述验证单元,具体用于根据接收到的所述存储交易号码验证所述区块链上对应存储的所述筛选后的参数改变量集合的哈希值与接收到的所述筛选后的参数改变量集合是否一致。
  10. 根据权利要求6所述的装置,其特征在于,所述更新模块,具体用于,
    根据用户端之间的所述多个评价值更新所述评价矩阵M;
    根据所述评价矩阵M和所述预设的区块链共识合约优选出最优参数的用户端集合
    Figure PCTCN2019114984-appb-100016
    获取所述最优参数的用户端集合
    Figure PCTCN2019114984-appb-100017
    对应的所述最优参数改变量集合,根据所述最 优参数改变量集合对所述全局模型进行更新,
    具体步骤为:记M i,:为第i个用户端对其它用户端的所有评价值,降序排列M i,:,记为
    Figure PCTCN2019114984-appb-100018
    根据在
    Figure PCTCN2019114984-appb-100019
    中的排列位置,第j个用户端的得分为:
    Figure PCTCN2019114984-appb-100020
    其中,m为总参与用户端个数,p j为第j个用户端在
    Figure PCTCN2019114984-appb-100021
    中的位置,第j个用户端的总分为:
    Figure PCTCN2019114984-appb-100022
    其中,u i为第i个用户端,
    Figure PCTCN2019114984-appb-100023
    为所有用户端集合,s(j;u i)为第j个用户端在u i的评价下所得分数,基于总得分,所述最优参数的用户端集合:
    Figure PCTCN2019114984-appb-100024
PCT/CN2019/114984 2019-05-07 2019-11-01 基于区块链的安全协作深度学习方法及装置 WO2020224205A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/012,494 US11954592B2 (en) 2019-05-07 2020-09-04 Collaborative deep learning methods and collaborative deep learning apparatuses

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910375181.9A CN110197285B (zh) 2019-05-07 2019-05-07 基于区块链的安全协作深度学习方法及装置
CN201910375181.9 2019-05-07

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/012,494 Continuation US11954592B2 (en) 2019-05-07 2020-09-04 Collaborative deep learning methods and collaborative deep learning apparatuses

Publications (1)

Publication Number Publication Date
WO2020224205A1 true WO2020224205A1 (zh) 2020-11-12

Family

ID=67752417

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114984 WO2020224205A1 (zh) 2019-05-07 2019-11-01 基于区块链的安全协作深度学习方法及装置

Country Status (3)

Country Link
US (1) US11954592B2 (zh)
CN (1) CN110197285B (zh)
WO (1) WO2020224205A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361721A (zh) * 2021-06-29 2021-09-07 北京百度网讯科技有限公司 模型训练方法、装置、电子设备、存储介质及程序产品

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197285B (zh) * 2019-05-07 2021-03-23 清华大学 基于区块链的安全协作深度学习方法及装置
US20210117829A1 (en) * 2019-10-16 2021-04-22 International Business Machines Corporation Learning pattern dictionary from noisy numerical data in distributed networks
CN111104968B (zh) * 2019-12-02 2023-04-18 北京理工大学 一种基于区块链的安全svm训练方法
WO2021112831A1 (en) * 2019-12-03 2021-06-10 Visa International Service Association Techniques for providing secure federated machine-learning
CN111243698A (zh) * 2020-01-14 2020-06-05 暨南大学 一种数据安全共享方法、存储介质和计算设备
CN111310208A (zh) * 2020-02-14 2020-06-19 云从科技集团股份有限公司 数据处理方法、***、平台、设备及机器可读介质
US11604986B2 (en) 2020-02-28 2023-03-14 International Business Machines Corporation Blockchain-enabled decentralized ecosystem for secure training of deep neural networks using trusted execution environments
CN111581648B (zh) * 2020-04-06 2022-06-03 电子科技大学 在不规则用户中保留隐私的联邦学习的方法
CN111680793B (zh) * 2020-04-21 2023-06-09 广州中科易德科技有限公司 一种基于深度学习模型训练的区块链共识方法和***
CN112150282B (zh) * 2020-05-07 2021-06-04 北京天德科技有限公司 一种基于事件库的智能合约处理机制
CN112329557A (zh) * 2020-10-21 2021-02-05 杭州趣链科技有限公司 模型的应用方法、装置、计算机设备及存储介质
CN112015826B (zh) * 2020-10-27 2021-01-29 腾讯科技(深圳)有限公司 基于区块链的智能合约安全性检测方法及相关设备
CN112331353A (zh) * 2020-11-10 2021-02-05 上海计算机软件技术开发中心 一种基于区块链的医疗人工智能模型训练***及方法
CN112434818B (zh) * 2020-11-19 2023-09-26 脸萌有限公司 模型构建方法、装置、介质及电子设备
CA3143855A1 (en) * 2020-12-30 2022-06-30 Atb Financial Systems and methods for federated learning on blockchain
CN113191530B (zh) * 2021-04-09 2022-09-20 汕头大学 一种具有隐私保护的区块链节点可靠性预测方法及***
CN115270001B (zh) * 2022-09-23 2022-12-23 宁波大学 基于云端协同学习的隐私保护推荐方法及***
CN117010010B (zh) * 2023-06-01 2024-02-13 湖南信安数字科技有限公司 一种基于区块链的多服务器协作高安全度存储方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012595A1 (en) * 2017-07-07 2019-01-10 Pointr Data, Inc. Neural network consensus using blockchain
CN109214641A (zh) * 2018-07-05 2019-01-15 广州链基智能科技有限公司 一种基于区块链的企业部门资源数字化控制方法和***
CN109558950A (zh) * 2018-11-06 2019-04-02 联动优势科技有限公司 一种确定模型参数的方法及装置
CN109685501A (zh) * 2018-12-04 2019-04-26 暨南大学 基于区块链激励机制下可审计的隐私保护深度学习平台建设方法
CN109698822A (zh) * 2018-11-28 2019-04-30 众安信息技术服务有限公司 基于公有区块链和加密神经网络的联合学习方法及***
CN110197285A (zh) * 2019-05-07 2019-09-03 清华大学 基于区块链的安全协作深度学习方法及装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104200087B (zh) * 2014-06-05 2018-10-02 清华大学 用于机器学习的参数寻优及特征调优的方法及***
US20180285996A1 (en) * 2017-04-03 2018-10-04 FutureLab Consulting Inc. Methods and system for managing intellectual property using a blockchain
US20180285839A1 (en) * 2017-04-04 2018-10-04 Datient, Inc. Providing data provenance, permissioning, compliance, and access control for data storage systems using an immutable ledger overlay network
CN109388661B (zh) * 2017-08-02 2020-04-21 创新先进技术有限公司 一种基于共享数据的模型训练方法及装置
CN107609583A (zh) * 2017-09-05 2018-01-19 深圳乐信软件技术有限公司 分类模型的参数优化方法、装置、计算机设备及存储介质
US20210342946A1 (en) * 2017-09-06 2021-11-04 State Farm Mutual Automobile Insurance Company Using a Distributed Ledger for Line Item Determination
CN108491266B (zh) * 2018-03-09 2021-11-16 联想(北京)有限公司 基于区块链的数据处理方法、装置及电子设备
CN109194508B (zh) * 2018-08-27 2020-12-18 联想(北京)有限公司 基于区块链的数据处理方法和装置
CN109344823B (zh) * 2018-09-11 2022-06-07 福建天晴在线互动科技有限公司 基于区块链机制的ocr深度学习方法、存储介质
US20210375409A1 (en) * 2018-10-19 2021-12-02 Longenesis Ltd. Systems and methods for blockchain-based health data validation and access management
CN109409738A (zh) * 2018-10-25 2019-03-01 平安科技(深圳)有限公司 基于区块链平台进行深度学习的方法、电子装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012595A1 (en) * 2017-07-07 2019-01-10 Pointr Data, Inc. Neural network consensus using blockchain
CN109214641A (zh) * 2018-07-05 2019-01-15 广州链基智能科技有限公司 一种基于区块链的企业部门资源数字化控制方法和***
CN109558950A (zh) * 2018-11-06 2019-04-02 联动优势科技有限公司 一种确定模型参数的方法及装置
CN109698822A (zh) * 2018-11-28 2019-04-30 众安信息技术服务有限公司 基于公有区块链和加密神经网络的联合学习方法及***
CN109685501A (zh) * 2018-12-04 2019-04-26 暨南大学 基于区块链激励机制下可审计的隐私保护深度学习平台建设方法
CN110197285A (zh) * 2019-05-07 2019-09-03 清华大学 基于区块链的安全协作深度学习方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361721A (zh) * 2021-06-29 2021-09-07 北京百度网讯科技有限公司 模型训练方法、装置、电子设备、存储介质及程序产品
CN113361721B (zh) * 2021-06-29 2023-07-18 北京百度网讯科技有限公司 模型训练方法、装置、电子设备、存储介质及程序产品

Also Published As

Publication number Publication date
CN110197285A (zh) 2019-09-03
CN110197285B (zh) 2021-03-23
US11954592B2 (en) 2024-04-09
US20200401890A1 (en) 2020-12-24

Similar Documents

Publication Publication Date Title
WO2020224205A1 (zh) 基于区块链的安全协作深度学习方法及装置
US11694110B2 (en) Aggregated machine learning verification for database
CN115210741B (zh) 部分有序的区块链
WO2021213065A1 (zh) 一种区块链数据归档方法、装置和计算机可读存储介质
US11093495B2 (en) SQL processing engine for blockchain ledger
US11562228B2 (en) Efficient verification of machine learning applications
US11200260B2 (en) Database asset fulfillment chaincode deployment
CN111951003A (zh) 用于管理对用户数据的同意的认知***
US11940958B2 (en) Artificial intelligence software marketplace
CN114514732A (zh) 用于区块链dag结构的共识协议
AU2021254870B2 (en) Faster view change for blockchain
US11354198B2 (en) Snapshot for world state recovery
US20200394470A1 (en) Efficient verification of maching learning applications
US11645268B2 (en) Database world state performance improvement
US20210058231A1 (en) Database service token
CN114450708A (zh) 基于现有链码的链码推荐
CN114128214A (zh) 用于配置区块链的安全层
CN115211093A (zh) 数据对象的有效阈值存储
US11507535B2 (en) Probabilistic verification of linked data
US11874804B2 (en) Load balancing based blockchain transaction submission
US11379472B2 (en) Schema-based pruning of blockchain data
WO2021090120A1 (en) Downstream tracking of content consumption
US9977877B2 (en) System and method for terminating copyright infringement by BitTorrent users
US11379474B2 (en) Computation of containment relationships
CN114981773A (zh) 无冲突版本控制

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19927989

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19927989

Country of ref document: EP

Kind code of ref document: A1