CN111275202B

CN111275202B - Machine learning prediction method and system for data privacy protection

Info

Publication number: CN111275202B
Application number: CN202010105981.1A
Authority: CN
Inventors: 赵川; 赵埼; 荆山; 张波; 陈贞翔; 王吉伟
Original assignee: University of Jinan
Current assignee: University of Jinan
Priority date: 2020-02-20
Filing date: 2020-02-20
Publication date: 2023-08-11
Anticipated expiration: 2040-02-20
Also published as: CN111275202A

Abstract

The disclosure provides a machine learning prediction method and a system for protecting data privacy, wherein the method comprises the following steps: acquiring encrypted data; the main server creates a trusted zone, and decrypts the acquired data to be predicted and the prediction model in the trusted zone; the master server performs secret sharing on the decrypted data to be predicted and the prediction model, obtains a data secret share and a model share respectively, and distributes the data secret share and the model share to an auxiliary server and the master server which are not collusion; the auxiliary server and the main server respectively conduct prediction calculation to obtain a predicted result share; and the master server carries out secret reconstruction on all the predicted result shares, forwards the reconstructed predicted result shares to a trusted area for integration and encryption, and sends the reconstructed predicted result shares to a data providing terminal to be predicted, and the data providing terminal obtains a predicted result predicted according to the model after decryption. The privacy security of both parties is protected by combining secure multiparty computing and SGX technology, and the security problem in the process of providing predictive service is solved.

Description

Machine learning prediction method and system for data privacy protection

Technical Field

The disclosure relates to the technical field of machine learning, in particular to a machine learning prediction method and a system for protecting data privacy.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

In recent years, artificial intelligence techniques such as machine learning have been widely used in various fields such as image recognition and text processing. Training a model requires a large amount of data, high computational resources and associated expertise, which is certainly difficult for average individuals and businesses. To solve this problem, large companies start to provide machine learning, i.e., service, and users can obtain prediction results by directly uploading data and selecting an appropriate model without learning a complicated machine learning algorithm. Such as amazon machine learning and service platforms, can help generate billions of real-time predictions every day. The inventors have found that while predictive services offer convenience to users, they pose a threat to personal privacy. On the one hand, the data of the user providing the predicted data is at risk of information leakage: when predicting relevant medical pathology data and other personal sensitive information, the service platform can directly acquire user privacy information, and the information is uploaded and stored in a server, so that the personal privacy data can be revealed if the information is maliciously collected or is attacked by the outside. On the other hand, the model provider predicts that the data used by the model is at risk of leakage: in recent years, more and more attacks aiming at machine learning are proposed, such as model inversion attack (model inversion attack), member reasoning attack (membership attack) and the like, and an attacker can infer the attribute of original sensitive data only through an attack model without directly contacting the original data. If the model is trained based on privacy data, the adversary can be disguised as an honest user and attack is carried out through malicious inquiry, which definitely brings hidden danger to machine learning and service use. In summary, in the process of providing a machine learning prediction service based on privacy data, there is a problem that privacy is leaked in two directions, including a problem that user uploading data may be stolen by a service provider, and a problem that a mechanism provides a prediction model which may be attacked by a malicious user, so how to implement a safe and reliable prediction service has important value in practical application.

Disclosure of Invention

In order to solve the above problems, the disclosure provides a machine learning prediction method and system for protecting data privacy, which combines secure multiparty computing and SGX technology to protect privacy security of both parties, and solve the security problem in the process of providing prediction service.

In order to achieve the above purpose, the present disclosure adopts the following technical scheme:

one or more embodiments provide a machine learning prediction method for data privacy protection, including the following steps:

acquiring data: the method comprises the steps that a main server obtains encrypted data to be predicted and an encrypted prediction model;

the main server creates a trusted zone, and decrypts the acquired data to be predicted and the prediction model in the trusted zone; the master server performs secret sharing on the decrypted data to be predicted and the prediction model, obtains a data secret share and a model share respectively, and distributes the data secret share and the model share to an auxiliary server and the master server which are not collusion;

the auxiliary server and the main server respectively conduct prediction calculation according to the obtained data secret share and the model share to obtain a predicted result share, and the auxiliary server encrypts and sends the obtained predicted result share to the main server;

the method comprises the steps that a main server obtains encrypted predicted result shares sent by an auxiliary server, secret reconstruction is conducted on all the predicted result shares, the reconstructed predicted result shares are forwarded to a trusted area to be integrated and encrypted, the trusted area is sent to a data providing terminal to be predicted, and the data providing terminal obtains predicted results predicted according to a model after decryption.

the main server acquires the encrypted predicted result shares sent by the auxiliary server, carries out secret reconstruction on all the predicted result shares, forwards the reconstructed predicted result shares to a trusted area for integration and encryption, and sends the reconstructed predicted result shares to the data providing terminal to be predicted.

the auxiliary server respectively acquires the data secret share and the model share;

the auxiliary server predicts shares according to the respective model, according to the local private key sk _s Decryption to obtain the master server key k _s By means of key k _s Decrypting to obtain original parameters of the prediction model and data to be predicted respectively;

The auxiliary server predicts according to the data secret share and the model share, and adopts chebyshev polynomial approximation activation function to perform nonlinear activation function calculation so as to obtain a predicted result share;

encrypting the predicted result share by adopting a homomorphic encryption algorithm: each secondary server uses the public key pk of the Enclave distributed homomorphic encryption _ep The predicted share results are encrypted and sent to the host server.

One or more embodiments provide a machine learning prediction system oriented to data privacy protection, including a model providing terminal, a data providing terminal to be predicted, and an auxiliary server and a main server which are not collusion;

model providing terminal: for providing a machine learning training model;

and the data to be predicted providing terminal: data to be predicted for providing a training model;

the main server: executing the machine learning prediction method facing the data privacy protection;

the auxiliary server: the machine learning prediction method facing the data privacy protection is executed.

Compared with the prior art, the beneficial effects of the present disclosure are:

(1) The machine learning prediction method of the present disclosure provides reliable bi-directional security: the user privacy data, the prediction result cannot be stolen by a model provider and a server; model details uploaded by the predictive service organization are not revealed to the host server and the user. On the one hand, in the whole calculation process, privacy data of a user (a providing terminal of data to be predicted) is uploaded in an encrypted mode, a prediction model of a model provider can only operate data in a plaintext state by a trusted enclaspe, and the processed data is stored in an undestroyed server in a sharing value mode, so that the data is prevented from being stolen by a main server.

The security of the predicted result is realized through homomorphic encryption, so that the privacy leakage during the reconstruction of the result is prevented, the secret key is usually stored in a plaintext form on an untrusted node under the existing general cloud environment, the security of an application program is difficult to ensure, and the secret key in the present disclosure is stored in a trusted Enclave so as to prevent the access leakage from an internal manager or privileged software.

(2) The technical scheme of the present disclosure can reduce user overhead: the traditional protection mode utilizing secret sharing needs to be carried out secret sharing on a user side and then distributed to a server, and a prediction result is rebuilt locally on the user side, so that the calculation cost of the user side is increased.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain and do not limit the disclosure.

FIG. 1 is a diagram of the overall architecture of a system of embodiment 4 of the present disclosure;

FIG. 2 is a flow chart of a method of embodiment 1 of the present disclosure;

FIG. 3 is a schematic diagram of shared value addition computation of embodiment 1 of the present disclosure;

FIG. 4 is a schematic diagram of shared value multiplication computation of embodiment 1 of the present disclosure;

FIG. 5 is a schematic diagram of embodiment 1 of the present disclosure using a Chebyshev polynomial to approximate a first activation function of a neural network;

FIG. 6 is a schematic diagram of a second activation function of a neural network approximated using a Chebyshev polynomial in embodiment 1 of the present disclosure;

fig. 7 is a remote authentication flow chart in embodiment 1 of the present disclosure;

FIG. 8 is a homomorphic encryption flow chart of a primary server and a secondary server in embodiment 1 of the present disclosure;

fig. 9 is a bidirectional encryption flow chart of the main server and the user or model providing terminal in embodiment 1 of the present disclosure.

The specific embodiment is as follows:

the disclosure is further described below with reference to the drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof. It should be noted that, without conflict, the various embodiments and features of the embodiments in the present disclosure may be combined with each other. The embodiments will be described in detail below with reference to the accompanying drawings.

SGX (Intel Software Guard Extensions) is a new processor technology developed by Intel, which can provide a trusted space on a computing platform, and ensure confidentiality and integrity of key codes and data of users. The data to be protected may be securely packaged in an environment called an Enclave environment, which may be protected from attacks from external malware or privileged software (e.g., an operating system).

The non-collusion server refers to: independent cloud servers means that two cloud servers cannot collude with each other.

The encryption algorithm appearing in the formula in this embodiment is denoted as En (-), and the decryption algorithm is denoted as Dec (-).

Homomorphic encryption (Homomorphic Encryption): homomorphic encryption is a special encryption method that allows ciphertext to be processed to yield a result that is still encrypted. Homomorphic encryption is divided into homomorphic encryption and semi-homomorphic encryption, wherein homomorphic encryption refers to an encryption function which simultaneously satisfies homomorphic and multiplication homomorphic properties and can perform addition and multiplication operations for any number of times. Semi-homomorphic is satisfied with only additive or multiplicative properties.

In the technical solution disclosed in one or more embodiments, as shown in fig. 1 and 2, a machine learning prediction method for protecting data privacy is used for inputting data to be predicted of a user into a model of a providing terminal according to a model, directly obtaining a prediction result and sending the prediction result to the user, wherein in the process, the user cannot obtain the model of the providing terminal, and the providing terminal cannot obtain the data of the user, so that the protection of the data to be predicted and the training data of the model is realized, and the method comprises the following steps:

Step 1, acquiring data: the method comprises the steps that a main server obtains encrypted data to be predicted and an encrypted prediction model;

step 2, the main server creates a trusted zone, and decrypts the acquired data to be predicted and the prediction model in the trusted zone; the master server performs secret sharing on the decrypted data to be predicted and the prediction model, obtains a data secret share and a model share respectively, and distributes the data secret share and the model share to an auxiliary server and the master server which are not collusion;

step 3, the auxiliary server and the main server respectively conduct prediction calculation according to the obtained data secret share and the model share to obtain a predicted result share, and the auxiliary server encrypts and sends the obtained predicted result share to the main server;

and 4, the main server acquires encrypted predicted result shares sent by the auxiliary server, secret reconstruction is carried out on all the predicted result shares, the reconstructed predicted result shares are forwarded to a trusted area for integration and encryption and are sent to a data providing terminal to be predicted, and the data providing terminal obtains a predicted result predicted according to a model after decryption.

In the above steps, the main server serves as a transfer station to distribute the received data and model shares to the auxiliary server, meanwhile, the main server also keeps a share, the main server and the auxiliary server are two non-collusion servers, the data shares are respectively input into the model shares according to the acquired shares to conduct prediction calculation, a prediction result share is obtained, the prediction result shares are integrated through the main server, and the integrated prediction result share is sent to a provider of data to be predicted to obtain a prediction result according to the data to be predicted.

According to the embodiment, two servers, namely the main server and the auxiliary server, are used for realizing bidirectional protection of the prediction data and the prediction model, and the problems are solved on the basis of not increasing calculation cost. The multi-party calculation is realized through the main server and the auxiliary server, the data to be predicted is not provided for the model provider through the method, the data of the auxiliary server or the main server is only a part of the data to be predicted and is not complete data, the complete data cannot be obtained even if the data of the auxiliary server is leaked, the confidentiality of the data to be predicted is improved, meanwhile, encryption is carried out in the transmission process of the data, and the safety of data transmission is improved.

The number of the servers is only an example, the number of the participants calculated by multiple parties can be set according to specific situations, and the number of the participants calculated by multiple parties is more than two, and meanwhile, the cost and the calculation amount of the system can be increased.

The following is a specific description:

in step 1, the model providing terminal provides a machine learning model, and the providing terminal of the data to be predicted can provide the data to be predicted for the user, and obtain a prediction model according to the input corresponding model of the data to be predicted for the user to use. And selecting a proper model according to the data submitted by the user and the prediction requirement, and providing the model for the user. The model may be uploaded to a server in advance and stored. The providing time of the model and the data time to be predicted are not necessarily provided at the same time, the sequence in time is not limited, the model can be prepared and transmitted to the main server first, and the data to be predicted is accepted when the user side needs to predict.

In step 2, the host server may dynamically apply for building a trusted zone Enclave in Intel SGX trusted mode, and create a trusted execution environment Enclave, which is a protected content container for storing sensitive data and code in the computing process, and protecting it from being accessed and attacked by external malware. The user, model provider, auxiliary server needs to remotely authenticate with Enclave to ensure that the main server does run a protected Enclave.

SGX: intel SGX is a new extension of Intel architecture, adding a new set of instruction sets and memory access mechanisms to the original architecture. These extensions allow an application to implement a container called Enclave, which partitions a protected area in the application's address space, providing confidentiality and integrity protection for code and data within the container from malware that has special rights. The SGX does not identify and isolate all malware on the platform, but rather encapsulates the security operations of legitimate software in a trusted zone Enclave, protecting it from attacks by malware.

The key of the encryption executed by the main server, the data providing terminal to be predicted, the model providing terminal and the auxiliary server in the executing process of the steps is shared.

Further, in order to ensure that the server does operate in a necessary component including enclaspers, before step 1, the data providing terminal to be predicted and the model providing terminal remotely authenticate with the server, and establish a trusted zone enclaspers of the main server and key sharing between the data providing terminal to be predicted, the model providing terminal and the auxiliary server. The user, model provider, auxiliary server need to remotely authenticate with the Enclave to ensure that the primary server does have a protected Enclave running.

The principle of the remote authentication process is shown in fig. 7, and the method comprises the following steps:

1) A communication channel is first established between a challenger and a platform application, the challenger initiating a challenge to the application.

2) The application program sends the Enclave identity information of the taking encclave and the challenge to the platform application encclave.

3) A manifest is generated by enclaspe, including responses to the challenges, after which the challenger will use the temporary public key, and then a hash digest of the manifest is generated, generating a REPORT associated with the manifest, and sending it to the application.

4) The application program sends the REPORT to the taking Enclave for signing.

5) And performing in-platform authentication on REPORT by using the query enclaspe, and replacing the signature performed by using the private key of the query enclaspe with the MAC value in the REPORT to generate the QUOTE and returning the QUOTE to the application program if the verification is successful.

6) The application sends the query and the associated supported data list to the challenger.

7) The challenger uses the EPID public key certificate and authentication verification service to verify the signature of the QUTOE, verifying the manifest integrity.

The data providing terminal to be predicted and the model providing terminal are remotely authenticated with the main server, and a trusted zone enclaspe of the main server is established to be in key sharing with the data providing terminal to be predicted, the model providing terminal and the auxiliary server: the master service runs Enclave, and each terminal (the data providing terminal to be predicted and the model providing terminal) is used as a challenger to carry out remote authentication with the master server, so that the master server is ensured to actually run a trusted Enclave.

Optionally, the trusted zone Enclave of the main server, the data providing terminal to be predicted and the model providing terminal can transmit data by using a mixed encryption mode combining RSA encryption and AES encryption. And encrypting and decrypting the transmission data between the trusted zone enclaspe of the main server and the auxiliary server by adopting a Paillier homomorphic encryption algorithm.

RSA is the first more sophisticated public key cryptoalgorithm, its security is based on the difficulty of large integer molecular decomposition, RSA cryptosystem is as follows:

1) Selecting two large prime numbers p and q;

2) N=p×q is calculated and,

3) Random selectionAnd is in charge of>Mutually plain;

4) Calculation ofThe public key is pk= (n, e), and the private key is sk= (p, q, d);

5) Encryption c=m ^e modn；

6) Decryption m=c ^e modn。

Advanced encryption standard AES (Advanced Encryption Standard) is the most common symmetric encryption algorithm, i.e. encryption and decryption with the same key, AES encryption process involves 4 operations, byte substitution, row shifting, column confusion and round key addition, respectively. The decryption process is the corresponding inverse operation. Since each operation is reversible, decryption is performed in reverse order to recover the plaintext.

The trusted zone enclaspe of the primary server, the data providing terminal to be predicted, i.e. the user, the model provider and the auxiliary server respectively generate public-private key pairs: (sk) _e ,pk _e ) _RSA ，(sk _ep ,pk _ep ) _Pailler ，(sk _u ,pk _u ) _RSA ，(sk _s ,pk _s ) _RSA And key K of AES _e ，K _u ，K _MP . The user, the auxiliary server and the enclaspe share the public keys of the RSA, while the enclaspe issues its Pailaer public key to both servers.

Wherein, (sk) _e ,pk _e ) _RSA : generated by enclaspe, the public key pk _e Respectively sending the private key sk to a model providing terminal and a data providing terminal to be predicted _e Is kept in the local area for encrypting and decrypting the coming modeAES key K for providing terminal and data providing terminal to be predicted _u ，K _mp 。

(sk _ep ,pk _ep ) _Pailler : and generating by Enclave, and respectively sending the Enclave to the auxiliary server and the main server for encrypting and decrypting the predicted result share predicted by the auxiliary server.

(sk _u ,pk _u ) _RSA : generated by the user, the public key pk _u Enclave, private key sk sent to the primary server _u And the prediction result is stored locally and used for encrypting and decrypting the prediction result reconstructed by the enclaspe, and the prediction result is sent to the user after being encrypted and decrypted after being received by the user.

(sk _s ,pk _s ) _RSA : generated by the auxiliary server, the public key pk _s Enclave, private key sk sent to the primary server _s Is kept locally for encrypting and decrypting an AES key K generated by Enclave _e 。

K _u : the data to be predicted is provided by the data to be predicted providing terminal, and the data to be predicted uploaded by the user is encrypted by the data to be predicted providing terminal.

K _MP : the model provides a terminal to locally generate an AES key for encrypting the uploaded model;

K _e : and generating by Enclave, and encrypting and decrypting data sent by the main server to the auxiliary server and the user.

The prediction model in step 1 may be a prediction model that the model providing terminal trains locally in advance based on local data, and may be established by using any machine learning method.

As shown in fig. 9, the method for transmitting data between the trusted area Enclave of the main server and the model providing terminal by using a hybrid encryption mode combining RSA encryption and AES encryption includes a method for implementing an encryption step in step 1 and a decryption step in step 2, specifically includes:

encryption step of model providing terminal: the method of the encrypted prediction model in step 1 may be: model providing terminal locally generates AES key k of model providing terminal _MP Encryption of predictive model parameters omega _I Obtaining an encrypted modelParameter ciphertextI.e. < ->Where I is the number of the participant.

RSA public key pk shared according to primary server enclaspe _e Encryption model provides AES key k of terminal _MP I.e.The encrypted prediction model parameters and ciphertext of an AES key of the encrypted model providing terminal are used as mixed ciphertext to be sent to a main server, and the main server forwards the ciphertext to Enclave;

the trusted zone Enclave decryption prediction model of the main server comprises the following steps: after the Enclave receives the mixed ciphertext, the local RSA private key sk is adopted _e Ciphertext obtaining model of AES key of decryption model providing terminal _MP I.e.Providing AES key k of a terminal according to a model _MP Decrypting the encrypted model parameter ciphertext ++ >Obtaining model parameters omega _I I.e. +.>

The method for transmitting data between the trusted area enclaspe of the main server and the data providing terminal (such as the user) to be predicted by using a hybrid encryption mode combining RSA encryption and AES encryption, as shown in fig. 9, includes a method for implementing the data encryption step to be predicted in step 1 and the decryption step in step 2, specifically includes:

(1) Predicting an AES key k of a data providing terminal (e.g., a user) locally by the data providing terminal _u Encrypting the prediction data x to obtain an encrypted predictionData ciphertext C _x I.e.RSA public key pk using trusted zone Enclave of primary server _e AES key k of encrypted predicted data providing terminal _u Ciphertext c of AES key for obtaining predicted data providing terminal _u Namely +.>Will predict data ciphertext C _x And ciphertext c of AES key of predictive data providing terminal _u To the host server.

(2) The main server forwards the ciphertext to the trusted area Enclave, and after the trusted area Enclave receives the mixed ciphertext, the private key sk of RSA is used first _e Decryption obtaining AES key k of predicted data providing terminal _u NamelyAES key k of terminal provided by predictive data _u Decrypting to obtain predictive data x, namely +.>

In step 2, the method for the main server to perform secret sharing on the decrypted predicted data and the model, obtain a plurality of data secret shares and model shares respectively, and distribute the data secret shares and model shares to the non-collusion auxiliary server may include the steps of:

step 21, decrypting the obtained prediction model in the trusted zone Enclave of the main server, carrying out addition secret sharing on model parameters, sending one model share to the main server, encrypting other model shares, and then sending the encrypted model shares to the auxiliary server, wherein the trusted zone Enclave deletes the original model, and the model is stored in the auxiliary server in a share mode; the original model is deleted, and the original model parameter data are specifically deleted; the method comprises the following specific steps:

the trusted zone enclaspe of the primary server shares the protection model parameters ω by adding secrets _I Will omega _I Share divided into two _i (ω _I ) I=0, 1, i.e. ω _I ＝(share ₀ (ω _I )+share ₁ (ω _I ) Mod Q, each model share being a shared value, wherein both the shared value and Q belong to a finite field;

encrypting key k for one of two model shares by a primary server _e Encryption, k _e Encrypting a model shareRSA public key pk over trusted zone Enclave _s Encryption key k of encryption main server _e I.e. +.>And after encryption, forwarding the encrypted data to a main server. The encryption model share is forwarded by the primary service to the secondary server, and another share is maintained at the primary server. After secret distribution is completed, original model parameter data omega _I Deleted by the trusted zone Enclave of the primary server.

Enclave divides private data into two secret shares, share, through encrypted secret sharing ₀ (ω _I ) And share ₁ (ω _I ) And (3) encrypting a secret share, forwarding the secret share to the auxiliary server through the main server, and decrypting and storing the secret share in the auxiliary server. The other secret share is directly stored in the main server in plain text. For example, the final host server has share ₀ (ω _I ) Auxiliary server has share ₁ (ω _I ) The original private data are stored in the form of plain secret shares in two non-colluded servers, respectively.

Encrypting one of the secret shares realizes that the secret cannot be stolen by the master server, and if the secret is not encrypted, the master server can recover the user data or the model by obtaining the two plaintext secret shares, thereby revealing the privacy. The unencrypted plaintext shares are directly stored in the main server and participate in the subsequent predictive computation. The encrypted secret shares are forwarded to the auxiliary server for decryption and storage, and the two servers respectively have one secret share, so that the secret cannot be recovered because the two servers are not colluded.

And 22, after receiving the data to be predicted, decrypting the acquired data to be predicted in a trusted area enclaspe of the main server, obtaining a data secret share by adopting an addition secret sharing for the decrypted data to be predicted, sending one data secret share to the main server, and sending other data secret shares to the auxiliary server after encrypting.

Enclave shares the data x to be predicted which protects privacy input through adding secrets, and the execution steps are the same as the model, and the x is divided into 2 data secret shares: share _i (x) I=0, 1, encrypting one of the data shares,and the encrypted data share is transmitted to the auxiliary server by the main server, and the plaintext data share is reserved. After the secret sharing operation is completed, the original data is destroyed by enclaspe.

The number of the auxiliary servers can be set as required, and one auxiliary server is set as an example in the embodiment for explanation.

In step 3, for the step of predicting and calculating according to the selected specific model, since the model prediction involves multiplication, and it is difficult to directly multiply and calculate the shared value, as a further improvement, in order to reduce the calculation cost when the server predicts, the embodiment may need to be completed by means of a multiplication triplet, and step 2 further includes the following steps: the slave triples (u, v, z) are generated in the trusted zone enclaspe of the primary server, distributed and stored in the secondary server and the primary server. In the step 3, the prediction step is performed, and the Beaver triplet (u, v, z) is directly used for completing the related multiplication operation, so that the calculation overhead of an auxiliary server is reduced, and the data processing efficiency is improved.

In step 3, the auxiliary server and the main server respectively perform prediction calculation according to the obtained data secret share and the model share to obtain a predicted result share, and the method for encrypting and sending the obtained predicted result share to the main server by the auxiliary server is specifically as follows:

(3-1) decrypting the prediction model and the data to be predicted: the auxiliary server reserves the prediction share of the respective model, and the specific auxiliary server is based on the local private key sk _s Decryption to obtain the master server key k _e By means of key k _e Decrypting to obtain original parameters of the prediction model and data to be predicted respectively; namely, is

And (3-2) predictive computation, namely, the main server and the auxiliary server respectively conduct predictive computation according to the data secret share and the model share, and nonlinear activation function computation is conducted by adopting a Chebyshev polynomial approximation activation function, so that the predicted result share is obtained.

Two auxiliary servers predict on the respective owned data secret shares and model sharesThe predictive computation involves mainly addition, multiplication of the shared value. For the nonlinear activation function, the method adopts polynomial approximation, the activation function is fitted through a higher-order chebyshev polynomial, compared with a common polynomial, the chebyshev polynomial has better fitting property and accuracy, the calculation efficiency is ensured to be in an acceptable range, and as shown in fig. 5 and 6, the nonlinear activation function is converted into a linear function through polynomial fitting of the activation function of the neural network so as to calculate a shared value.

The shared value addition calculation principle, as shown in fig. 3, is as follows: given two secrets a, b, two servers S _i Having a shared value a for each of two numbers _i ，b _i ，i＝0,1a _i ，b _i E F, F is a finite field where a= (a) ₀ +a ₁ )mod Q,b＝(b ₀ +b ₁ ) mod Q, Q.epsilon.F. The two-party server calculates the sum of two secrets c=a+b, S by means of a secret sharing value _i Directly calculating the sum c of two shared values owned by the user _i ＝(a _i +b _i ) mod Q and then sent to S _1-i Both servers run a reconstruction algorithm to reconstruct the secret, i.e. c=rec (c ₀ ,c ₁ )＝c ₀ +c ₁ 。

Shared value multiplication calculation: the shared value multiplication is complex, the multiplication triples are used for assisting in calculation, namely u, v and z meet z=uv mod Q, and the shared value multiplication is distributed to two servers, namely S by the trusted enclaspe generation _i Having respective u _i ，v _i ，z _i ,i＝0,1。

Given two secrets a, b, two servers S _i Having a shared value a for each of two numbers _i ，b _i ，i＝0,1，a _i ，b _i E F where a= (a) ₀ +a ₁ )mod Q,b＝(b ₀ +b ₁ ) mod Q. The two-party servers calculate two secret products c=a×b by means of a secret shared value, as shown in fig. 4, each server S _i First calculate e _i ＝a _i -u _i ，f _i ＝b _i -v _i Hiding the local shared value, then exchanging the hidden value e _i ，f _i . After obtaining the hidden value, S _i Local reconstruction e=rec (e ₀ ,e ₁ )，f＝Rec(f ₀ ,f ₁ ) And calculate c _i ＝-i·e·f+f·a _i +e·b _i +z _i Transmitting the calculation result to S _1-i Both servers reconstruct c=rec (c ₀ ,c ₁ )＝c ₀ +c ₁ 。

(3-3) encrypting the predicted result shares by adopting a homomorphic encryption algorithm: the auxiliary server uses the public key pk of the Enclave distributed homomorphism encryption _ep Encrypting the prediction share result, namelyThe homomorphic encryption flow of the main server and the auxiliary server is shown in fig. 8.

The Paillier homomorphic encryption is adopted in the embodiment, the algorithm meets the addition homomorphic, and the security is based on the residual problem of the determination of the total number. The Paillier algorithm procedure is as follows:

1) Selecting two large prime numbers p and q;

2) N=p×q is calculated and,so that gcd (L (g) ^λ mod N ² ) N) =1, L (x) = (x-1)/N;

3) Calculating a public key pk= (N, g), sk=λ=lcm (p-1, q-1), λ being the least common multiple of p-1, q-1;

4) Randomly selecting a random number r, r<N, encrypted c=e _pk (m)＝g ^m r ^N mod N ² ；

5) Decryption

In step 4, the main server obtains the encrypted predicted result shares sent by the auxiliary server, and performs secret reconstruction on all the predicted result shares, specifically reconstructs the predicted result under the ciphertext according to the addition homomorphism, namelyThe addition homomorphism, i.e. an encryption algorithm f satisfies f (a) +f (B) =f (a+b), in this embodiment, the privacy data (user data and model) is divided into two secret shares and stored in two servers for calculation, and the obtained two prediction shares are recovered into a complete ciphertext prediction result by the main service.

Because the primary server does not have the enclaspe private key and can not decrypt the predicted result, the encrypted predicted share can be rebuilt, in order to avoid leakage of the predicted result when the server rebuilds the predicted result, and meanwhile, because the actual memory of the enclaspe is smaller and a large amount of calculation cannot be supported, homomorphic encryption protection is used in the server, the predicted result is rebuilt, and then the encrypted predicted result is forwarded to the enclaspe of the primary server for operation, instead of being rebuilt in the enclaspe directly, so that the computation cost of the enclaspe is reduced, and the overall efficiency is improved.

In step 4, the reconstructed prediction result share is forwarded to the trusted area, and the reconstructed prediction result share is integrated and encrypted in the trusted area, which may include the following steps:

4-1, decryption step: the main server sends the reconstructed encryption prediction result to the enclaspe of the main server for decryption, and a prediction result in a plaintext is obtained: the decryption key of Enclave of the main server is sk _ep The decryption formula is

4-2, integrating the prediction result: and selecting the predicted result with the largest number of votes as a final predicted result by adopting a voting method.

Enclave integrates the predicted results using voting to obtain a final predicted result y _vote (x) That is, the classification category having the highest number of votes in the prediction result is selected as the final prediction category for the classification problem. Firstly, calculating the number of the same predicted results, namely the number of votes,selecting a prediction result of a maximum number of votesAs a final prediction result.

By voting on the prediction results of a plurality of models, the voting results are output as the final result, so that overfitting can be avoided on one hand, and on the other hand, the prediction results cannot be issued independently because some privacy information contained in the data to be predicted can be leaked by the category predicted by the single model. By combining the predicted results of multiple models, it is avoided that the final result is too dependent on a single model, resulting in being vulnerable to attacks such as membership inference attacks.

4-3, encrypting the final prediction result y _vote (x) The method comprises the following steps AES private key k using trusted zone Enclave _e Encrypting the final prediction result to obtain a final prediction result ciphertext C _vote ，

Using RSA public key pk _u Encryption AES private key k _e ，Ciphertext of final prediction result and encrypted AES private key k _e The ciphertext after being transmitted to a main server, and the final prediction result y is deleted _vote (x)。

Data providing terminal to be predictedThe end, namely the user, decrypting step: the main server encrypts the final prediction result ciphertext and the AES private key k _e The ciphertext is then sent to the user. User local private key sk _u Decrypting and encrypting AES private key k _e The subsequent ciphertext obtains the AES private key k of the trusted zone Enclave _e NamelyAES private key k through trusted zone Enclave _e Decrypting the final predicted result ciphertext to obtain Dec _e (C _vote )→y _vote (x)。

The prediction method has the following advantages:

(1) The machine learning prediction method of the present disclosure provides reliable bi-directional security: the user privacy data, the prediction result cannot be stolen by the model provider and the main server; model details uploaded by the predictive service organization are not revealed to the host server and the user. On the one hand, in the whole calculation process, privacy data of a user (a providing terminal of data to be predicted) is uploaded in an encrypted mode, a prediction model of a model diagram provider can only be operated on data in a plaintext state by a trusted enclaspe, and the processed data is stored in an undesireable auxiliary server in a sharing value mode, so that the data is prevented from being stolen.

(3) Secret sharing is adopted, the secret sharing splits the secret in a proper mode, each split share is managed by different participants, a single participant cannot recover secret information, and only a plurality of participants cooperate together to recover secret information. More importantly, the secret can still be fully recovered when there is a problem with the participants in any of the respective ranges. Since in this scheme addition and multiplication of shared values are involved, shared value calculations are different from direct calculations in the clear.

Example 2

The embodiment provides a machine learning prediction method for protecting data privacy, which is implemented in a main server and comprises the following steps:

Example 3

The embodiment provides a machine learning prediction method for protecting data privacy, which is implemented in an auxiliary server and comprises the following steps:

Example 4

The embodiment provides a machine learning prediction system for protecting data privacy, which is characterized in that: the method comprises a model providing terminal, a data providing terminal to be predicted, an auxiliary server and a main server, wherein the auxiliary server and the main server are not collusion;

model providing terminal: for providing a machine learning predictive model;

and the data to be predicted providing terminal: data to be predicted for providing a prediction model;

the main server: a machine learning prediction method for data privacy protection as described in embodiment 2;

the auxiliary server: a machine learning prediction method for data privacy protection in embodiment 3.

The foregoing description of the preferred embodiments of the present disclosure is provided only and not intended to limit the disclosure so that various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

While the specific embodiments of the present disclosure have been described above with reference to the drawings, it should be understood that the present disclosure is not limited to the embodiments, and that various modifications and changes can be made by one skilled in the art without inventive effort on the basis of the technical solutions of the present disclosure while remaining within the scope of the present disclosure.

Claims

1. A machine learning prediction method facing data privacy protection is characterized by comprising the following steps:

the method comprises the steps that a main server obtains encrypted predicted result shares sent by an auxiliary server, secret reconstruction is conducted on all the predicted result shares, the reconstructed predicted result shares are forwarded to a trusted area to be integrated and encrypted, the trusted area is sent to a data providing terminal to be predicted, and a predicted result predicted according to a model is obtained after decryption of the data providing terminal;

before the step of acquiring data, the data providing terminal to be predicted and the model providing terminal are remotely authenticated with a server, a trusted zone Enclave of a main server is established, key sharing is carried out among the data providing terminal to be predicted, the model providing terminal and an auxiliary server, and data is transmitted between the trusted zone Enclave of the main server, the data providing terminal to be predicted and the model providing terminal by using a mixed encryption mode combining RSA encryption and AES encryption respectively;

the method for transmitting data between the trusted zone enclaspe of the main server and the model providing terminal by using a hybrid encryption mode combining RSA encryption and AES encryption specifically comprises the following steps:

Encryption step of model providing terminal: the model providing terminal encrypts training model parameters by adopting an AES key of the local model providing terminal to obtain encrypted model parameter ciphertext;

according to an RSA public key encryption model shared by a master server Enclave, providing an AES key of a terminal, using encrypted training model parameters and the AES key of the encrypted model providing terminal as mixed ciphertext to be sent to the master server, and forwarding the ciphertext to the Enclave by the master server;

the trusted zone Enclave decryption training model of the main server comprises the following steps: after the enclase receives the mixed ciphertext, decrypting the AES key by adopting the local RSA private key to decrypt the AES key to obtain the AES key of the model providing terminal, and decrypting the encrypted training model parameter ciphertext according to the AES key of the model providing terminal to obtain the model parameter.

2. The machine learning prediction method for protecting data privacy according to claim 1, wherein the machine learning prediction method for protecting data privacy is characterized in that: the main server creates a trusted zone, and specifically builds a trusted zone Enclave for dynamic application in Intel SGX trusted mode.

3. The machine learning prediction method for protecting data privacy according to claim 1, wherein the machine learning prediction method for protecting data privacy is characterized in that: and encrypting and decrypting the transmission data between the trusted zone enclaspe of the main server and the auxiliary server by adopting a Paillier homomorphic encryption algorithm.

4. The machine learning prediction method for protecting data privacy according to claim 3, wherein:

the method for transmitting data between the trusted area enclaspe of the main server and the data providing terminal to be predicted by using a mixed encryption mode combining RSA encryption and AES encryption specifically comprises the following steps:

the data to be predicted providing terminal encrypts the data to be predicted through an AES key of the data to be predicted providing terminal to obtain encrypted ciphertext of the data to be predicted; encrypting an AES key of the data providing terminal to be predicted by using an RSA public key of a trusted area Enclave of the main server to obtain a ciphertext of the AES key of the data providing terminal to be predicted; sending the ciphertext of the data to be predicted and the ciphertext of the AES key of the data providing terminal to be predicted to a main server;

the main server forwards the ciphertext to the trusted zone Enclave, and the RSA private key sk is locally stored in the trusted zone Enclave _e And decrypting to obtain an AES key of the data providing terminal to be predicted, and decrypting to obtain the data to be predicted through the AES key of the data providing terminal to be predicted.

5. The machine learning prediction method for protecting data privacy according to claim 1, wherein the machine learning prediction method for protecting data privacy is characterized in that:

the method for the main server to carry out secret sharing on the decrypted data to be predicted and the prediction model, respectively obtaining the secret share of the data and the model share, and distributing the secret share of the data and the model share to the auxiliary server and the main server which are not collusion comprises the following steps:

Decrypting the obtained prediction model in the trusted zone enclaspe of the main server, carrying out addition secret sharing on model parameters, sending one model share to the main server, encrypting other model shares, and sending the encrypted model shares to the auxiliary server;

after receiving the data to be predicted, decrypting the obtained data to be predicted in a trusted area enclaspe of the main server, obtaining a data secret share by adopting addition secret sharing on the decrypted data to be predicted, sending one data secret share to the main server, and sending other data secret shares to the auxiliary server after encryption.

6. The machine learning prediction method for protecting data privacy according to claim 1, wherein the machine learning prediction method for protecting data privacy is characterized in that: the auxiliary server also comprises the following steps before performing prediction calculation: and generating a Beaver triplet in the trusted area Enclave of the main server, and distributing the Beaver triplet to the auxiliary server and the main server.

7. The machine learning prediction method for protecting data privacy according to claim 1, wherein the machine learning prediction method for protecting data privacy is characterized in that: the method for the auxiliary server and the main server to obtain the predicted result share by performing prediction calculation according to the obtained data secret share and the model share respectively comprises the following steps of:

Decrypting the training model and the data to be predicted: the auxiliary server reserves the prediction share of the respective model according to the local private key sk _s Decryption to obtain the master server key k _s By means of key k _s Decrypting to obtain the original parameters and the number to be predicted of the prediction modelAccording to the above;

the auxiliary server carries out training prediction on the data secret share and the model share, and carries out nonlinear activation function calculation by adopting a Chebyshev polynomial approximation activation function to obtain a predicted result share;

encrypting the predicted result share by adopting a homomorphic encryption algorithm: the secondary server encrypts the predicted share result using the homomorphic encrypted public key distributed by enclaspe.

8. The machine learning prediction method for protecting data privacy according to claim 1, wherein the machine learning prediction method for protecting data privacy is characterized in that: the main server acquires the encrypted predicted result shares sent by the auxiliary server, and a secret reconstruction method is carried out on all the predicted result shares, and specifically, the predicted result under the ciphertext is reconstructed according to the addition homomorphism.

9. The machine learning prediction method for protecting data privacy according to claim 1, wherein the machine learning prediction method for protecting data privacy is characterized in that: forwarding the reconstructed predicted result share to a trusted region, integrating and encrypting the reconstructed predicted result share in the trusted region, and comprising the following steps:

Decryption: the main server sends the reconstructed encryption prediction result to an enclaspe of the main server for decryption to obtain a prediction result in a plaintext;

integrating the prediction result: selecting the predicted result with the largest number of votes as a final predicted result by adopting a voting method;

AES private key k using trusted zone Enclave _e And encrypting the final prediction result to obtain a final prediction result ciphertext.

10. The machine learning prediction method for protecting data privacy according to claim 1, wherein the machine learning prediction method for protecting data privacy is characterized in that: the data to be predicted providing terminal decrypting step includes:

the main server encrypts the final prediction result ciphertext and the AES private key k _e The ciphertext is sent to the data providing terminal to be predicted;

local private key sk of data providing terminal to be predicted _u Decrypting and encrypting AES private key k _e The subsequent ciphertext obtains the AES private key k of the trusted zone Enclave _e AES private key k through trusted zone Enclave _e And decrypting the final prediction result ciphertext to obtain a prediction result.

11. A machine learning prediction method facing data privacy protection is characterized by comprising the following steps:

acquiring data: the method comprises the steps that a main server obtains encrypted data to be predicted and an encrypted training model;

the main server creates a trusted zone, and decrypts the acquired data to be predicted and the training model in the trusted zone; the method comprises the steps that a master server performs secret sharing on decrypted data to be predicted and a training model, obtains a plurality of data secret shares and model shares respectively, and distributes the data secret shares and model shares to a plurality of non-collusion auxiliary servers;

The method comprises the steps that a main server obtains encrypted prediction result shares sent by a plurality of auxiliary servers, secret reconstruction is carried out on the encrypted prediction result shares respectively, the reconstructed prediction result shares are forwarded to a trusted area for integration and encryption and are sent to a data providing terminal to be predicted, and a prediction result predicted according to a model is obtained after decryption of the data providing terminal;

before the step of acquiring data, the data providing terminal to be predicted and the model providing terminal are remotely authenticated with a server, and a mixed encryption mode combining RSA encryption and AES encryption is respectively used between a trusted zone Enclave of a main server and the data providing terminal to be predicted and between the model providing terminal to be predicted;

12. A machine learning prediction system facing data privacy protection is characterized in that: the method comprises a model providing terminal, a data providing terminal to be predicted, an auxiliary server and a main server, wherein the auxiliary server and the main server are not collusion;

model providing terminal: for providing a machine learning training model;

the main server: a machine learning prediction method for performing data privacy protection oriented as defined in claim 11;

the auxiliary server: the machine learning prediction method for executing the data privacy protection comprises the following steps: