CN113065145A - Privacy protection linear regression method based on secret sharing and random disturbance - Google Patents

Privacy protection linear regression method based on secret sharing and random disturbance Download PDF

Info

Publication number
CN113065145A
CN113065145A CN202110322472.9A CN202110322472A CN113065145A CN 113065145 A CN113065145 A CN 113065145A CN 202110322472 A CN202110322472 A CN 202110322472A CN 113065145 A CN113065145 A CN 113065145A
Authority
CN
China
Prior art keywords
random
data
values
secret
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110322472.9A
Other languages
Chinese (zh)
Other versions
CN113065145B (en
Inventor
魏立斐
丁悦
李梦思
张蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ocean University
Original Assignee
Shanghai Ocean University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ocean University filed Critical Shanghai Ocean University
Priority to CN202110322472.9A priority Critical patent/CN113065145B/en
Publication of CN113065145A publication Critical patent/CN113065145A/en
Application granted granted Critical
Publication of CN113065145B publication Critical patent/CN113065145B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Algebra (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a privacy protection linear regression method based on secret sharing and random disturbance, which comprises the following steps: s1: multiplication of secret sharing values, S2: training data preprocessing, S3: parameter initialization, S4: model parameter update, S5: model parameter reconstruction, S6: prediction data preprocessing, S7: calculating a predicted share value, S8: and reconstructing a prediction result. The data provider can obtain the model parameters only by adding the parameter values of the two parties, and the privacy of the model parameters and the privacy of the original data can be protected. According to the technology, an enterprise or an organization can distribute data to the two cloud servers in a secret sharing mode, the cloud servers are used for storing the data, the two cloud servers can be used for calculating the linear regression model, and in the process, the cloud servers are used for calculating efficiently, and original data cannot be leaked.

Description

Privacy protection linear regression method based on secret sharing and random disturbance
Technical Field
The invention relates to the field of privacy protection of machine learning data, in particular to a privacy protection linear regression method based on secret sharing and random disturbance technologies of two parties.
Background
Gradient Descent (Gradient decision) is one of the most commonly used methods when solving model parameters of machine learning algorithms, i.e. unconstrained optimization problems. When the loss function is minimized, iterative solution can be performed step by step through a gradient descent method, and the minimized loss function and the model parameter value are obtained. However, a large amount of data is inevitably used for iteration in the process, and leakage of the original data can endanger the security of sensitive data of users or bring huge economic loss to service providers. This is also a major obstacle to cloud computing development. Therefore, the machine learning service system based on cloud computing should pay more attention to the privacy problem and continuously improve the privacy protection capability.
Secure multi-party computing (SMC) originates from the puyao wisdom of the millionaire problem, and is mainly used for solving the cooperative computing problem of keeping privacy among a group of distrusted parties. The data side does not want to hand all the training data to one server to train the model. He wishes to distribute the data set to multiple servers, collectively training the model. Each server does not have knowledge of the training data of the other servers. SecureML is a two-server model protocol that protects privacy. The data owner distributes private data to 2 non-colluding servers and trains the federated data using secure two-party computing techniques. An inadvertent transmission and encryption circuit and a multiparty computing-friendly activation function are adopted. The use of secure multiparty computing to address the privacy protection problem of data is the mainstream direction of current research. The main challenge of current secure multiparty computing is how to build a secure and efficient computing protocol over multiple parties.
Bose et al, in Vaidya-based thought and Clifton's research, propose an algebraic solution that hides the true values by placing them in equations masked with the random values by means of stochastic data perturbation, proposing computation protocols for one-way and two-way protocols of scalar product notation. Both sides obtain the sign of the scalar product through the same parallel action, and the complexity is reduced in the parallel computing scene. Without third party involvement, until the last step, one person must rely on the output of the other.
The scheme adopts a method based on safe two-party gradient descent iteration and combines with cryptology technologies such as secret sharing and random data disturbance, and the advantages are represented as follows: 1) compared with a homomorphic encryption scheme, the expensive calculation overhead can be replaced by a large amount of interaction between the two parties, and meanwhile, the privacy protection requirements of original data and model parameters can be met. 2) All training samples must be used for each iteration of the random descent gradient, so that the algorithm is slow in convergence, and therefore the scheme uses a small-batch gradient descent algorithm, and a small part of samples with fixed quantity are randomly selected for each iteration and updated. 3) And a randomization method and an algebraic method are adopted, so that the encryption overhead of safe multi-party calculation is further reduced.
Disclosure of Invention
The invention provides a privacy protection linear regression method based on secret sharing and random disturbance, which is characterized in that on the basis of safe two-party calculation, a small batch of descending gradient iteration is adopted, secret sharing addition operation is used in addition calculation, a random data disturbance method is used in multiplication, data operation is expanded to matrix operation, on the basis of the original two-party calculation protocol which is transmitted carelessly, encryption overhead and communication complexity are greatly reduced, meanwhile, in a training stage and a prediction stage, the privacy of original data is protected, and as final model parameters are reconstructed by a user, the privacy of the model parameters can be well protected. The scheme can be widely applied in the field of mechanical privacy protection.
A privacy protection linear regression method based on secret sharing and random disturbance comprises the following steps:
s1: multiplication of secret shared values
The step is mainly that two parties hide own true values into an equation with random values and send the equation to the other party on the premise of not revealing own shared values, one party depends on the other party to carry out calculation, and finally, one party obtains the product of the shared values, and for a given matrix M belongs to Rs×tThe sum vector v ∈ Rt,MiAnd vi(i ═ 0, 1) are their secret share shares and are each a calculator Pi(i=0,1) Is owned by0+M1,v=v0+v1I.e. the calculation party P0Having a private matrix M0And a private vector v0Another party P1Having a private matrix M1And a private vector v1After solving the problem by means of random data perturbation, Pi(i ═ 0, 1) secret shared share p by which product Mv can be obtainedi=RPM(M0,M1,v0,v1);
S2: training data preprocessing
The method adopts a mode of splitting original data into two parts and sending the two parts of data to two servers, wherein the two servers meet the non-collusion and semi-honest requirements, namely private data of the two servers cannot be leaked, but respective operation is completed through communication with the server of the other side; the data preprocessing is to split the secret into two calculation parties in a secret sharing mode, and finally, the data can be reconstructed by only adding calculation results of the two calculation parties. Here the data provider divides the required training data (X, y) into equal dimensional sizes (X)0,y0) And (X)1,y1) Satisfy X ═ X0+X1,y=y0+y1. And sends the data to a calculator S through a safety channel0And S1. Where X ∈ R denotes a matrix with dimension size s × t, s is the number of samples, t is the number of features per sample, and y ∈ R denotes an s-dimensional column vector, called the target value of a sample.
S3: parameter initialization
The linear regression training model can adopt two methods, namely a least square method and a gradient descent method, the gradient descent method is adopted in the invention, the gradient descent method is divided into a batch gradient descent method, a random gradient descent method and a small-batch gradient descent method, and in order to enable the model parameters to be converged more quickly and the efficiency to be higher and to share the limiting condition of multiplication operation secretly, the small-batch gradient descent method is selected for iteration.
Si(i is 0, 1) the learning rate alpha, the small batch sample number | B |, the maximum iteration time T and the loss threshold e are jointly preset, and the learning rate alpha, the small batch sample number | B |, the maximum iteration time T and the loss threshold e are respectively and initially setInitialized model parameter θiThe initial number of iterations is set to 1. Wherein theta isie.R represents the t-dimensional column vector. It should be noted that the number | B | of samples selected here needs to be larger than the number t of column vectors of samples, since the condition applicable to step S1a needs to be satisfied.
S4: model parameter updating
Parameter update for the small batch gradient descent algorithm for the training data set (X, y):
Figure BDA0002993425280000041
where m denotes the current number of iterations, XBAnd yBRespectively representing the characteristic values and the target values of the small-lot sample sets,
s5: model parameter reconstruction
The user receives the parameter value theta transmitted by the two serversiThen, the parameter theta is calculatediAdding to reconstruct a model parameter theta;
s6: predictive data pre-processing
Through the steps, the current cloud server S is known0And S1Having secret shared values theta of model parameters, respectively0And theta1The user can distribute the prediction data to the cloud servers for prediction, and finally the prediction values of the two cloud servers are added to obtain a final prediction result, so that the prediction data set X needs to be subjected to information leakage to the cloud servers to prevent the prediction data set information from being leaked to the cloud serversPCarrying out pretreatment;
s7: calculating a predicted shared value
Since the prediction data set X cannot be guaranteed at this timePThe number of samples is less than the number of features, so Si(i is 0, 1) in case two of the step S1 is needed, the step S1b is invoked to calculate the secret sharing value
Figure BDA0002993425280000051
Namely, it is
Figure BDA0002993425280000052
S8: reconstructing a prediction result
Si(i-0, 1) respectively sharing the secret
Figure BDA00029934252800000515
Sent to the user, the secret shared value being shared by the user
Figure BDA00029934252800000516
Adding to reconstruct the true prediction result yP
As a preferred embodiment of the present invention, the step S1 includes the steps of:
S1a:P0can obtain M0v1One constraint required for this case is M0∈Rs×tS < t, i.e. the number of rows is less than the number of columns,
S1a1:P0and P1Co-negotiating a set of random matrices
Figure BDA0002993425280000053
Wherein
Figure BDA0002993425280000054
Matrix C(k)The vectors in (1) satisfy linear irrelevancy. P0Selecting a set of random vectors
Figure BDA0002993425280000055
And a set of random numbers
Figure BDA0002993425280000056
Wherein
Figure BDA0002993425280000057
P1Selecting a random vector ry=(ry1,ry2,…,ryt) And a set of random numbers
Figure BDA0002993425280000058
Wherein P is0Selected random vector
Figure BDA0002993425280000059
There are certain constraints, namely: ensuring random vectors
Figure BDA00029934252800000510
T random values of
Figure BDA00029934252800000511
Is composed of u different random values, such that
Figure BDA00029934252800000512
Wherein
Figure BDA00029934252800000513
Representing rounded-up symbols, then these random values (z)1,z2,...,zu) And vector
Figure BDA00029934252800000514
In a relationship of
Figure BDA0002993425280000061
S1a2:P1Computing
Figure BDA0002993425280000062
Wherein b is(k)=q(k)v+C(k)ry
Figure BDA0002993425280000063
And sends B to P0
S1a3:P0Receiving P1Transmitted by
Figure BDA0002993425280000064
Then, h ═ h is calculated(1),h(2),...,h(s)) And sent to P1Wherein
Figure BDA0002993425280000065
M0[k]Representation matrix M0The k-th row vector of (a),
S1a4:P0computing
Figure BDA0002993425280000066
Therein after that
Figure BDA0002993425280000067
Sending A to P1
S1a5:P1Computing
Figure BDA0002993425280000068
Wherein g is(k)=rya(k)
S1a6:P1From ryIn the sequence, w random values are successively selected for summation, and the sum(s) is obtained1,s2,...,su) Is sent to P0Wherein
Figure BDA0002993425280000069
S1a7:P0Computing
Figure BDA00029934252800000610
And sends the result to P1
S1a8:P1Computing
Figure BDA00029934252800000611
To obtain
Figure BDA00029934252800000612
And according to q(k)Computing
Figure BDA00029934252800000613
To P0
S1a9:P0According to
Figure BDA00029934252800000614
Is calculated to obtain
Figure BDA00029934252800000615
Similarly, step P is calculated by the above1Can obtain M1v0
As a preferred embodiment of the present invention,
S1b:P1can obtain M0v1
S1b1:P0And P1Co-negotiating a set of random matrices
Figure BDA00029934252800000616
Wherein
Figure BDA00029934252800000617
Matrix C(k)The vectors in (1) satisfy linear irrelevancy. P0Selecting a random vector rx=(rx1,rx2,...,rxt) e.R and a set of random numbers
Figure BDA00029934252800000618
P1Selecting a random vector ry=(ry1,ry2,…,ryt) And a set of random numbers
Figure BDA00029934252800000619
Wherein P is1Selected random vector ryThere are certain constraints, namely: guarantee random vector ryT random values r ofy1,ry2,rytIs composed of u different random values, such that
Figure BDA0002993425280000071
Wherein
Figure BDA0002993425280000072
Indicating a rounded up symbol. Then these random values (z)1,z2,...,zu) And vector ryIn a relationship of
Figure BDA0002993425280000073
S1b2:P0Computing
Figure BDA0002993425280000074
Wherein a is(k)=p(k)(M0k[T+C(k)(rx)T,M0[k]Representation matrix M0And sends a to P1
S1b3:P1Receiving P0Transmitted by
Figure BDA0002993425280000075
Then, h ═ h is calculated(1),h(2),...,h(s)) And is sent to P0Wherein
Figure BDA0002993425280000076
S1b4:P1Computing
Figure BDA0002993425280000077
Then, b therein(k)=q(k)C(k)v1+rySending B to P0
S1b5:P0Computing
Figure BDA0002993425280000078
Wherein
Figure BDA0002993425280000079
S1b6:P0From rxIn the sequence, w random values are successively selected for summation, and the sum(s) is obtained1,s2,...,su) Is sent to P1Wherein
Figure BDA00029934252800000710
S1b7:P1Computing
Figure BDA00029934252800000711
And sends the result to P0
S1b8:P0Computing
Figure BDA00029934252800000712
To obtain
Figure BDA00029934252800000713
And according to
Figure BDA00029934252800000714
Will be provided with
Figure BDA00029934252800000715
To P1
S1b9:P1According to qkIs calculated to obtain
Figure BDA00029934252800000716
Similarly, by the above step P0Can obtain M1v0
As a preferred embodiment of the present invention, the step S4 includes the steps of:
S41:Si(i is 0, 1) randomly selecting sample data with the quantity of | B |
Figure BDA00029934252800000717
S42:Si(i equals 0, 1) calling step S1a1, according to the batch sample data
Figure BDA00029934252800000718
And the current model parameter θiSeparately deriving secret shared values
Figure BDA00029934252800000719
Namely, it is
Figure BDA00029934252800000720
S43:Si(i is 0, 1) is obtained
Figure BDA00029934252800000721
Sharing a value with a true secret
Figure BDA00029934252800000722
Error between
Figure BDA0002993425280000081
S44:Si(i is 0, 1) the calculation of the step S1a is called to obtain
Figure BDA0002993425280000082
Namely, it is
Figure BDA0002993425280000083
S45:Si(i-0, 1) updating the formula according to the model parameters of MBGD
Figure BDA0002993425280000084
Updating model parameter θi
S46:Si(i-0, 1) calculating the current loss function value lossi=Xi×θi-yiNumber of iterations t plus 1
S47:Si(i is 0, 1) judging the current loss function value lossiIf the loss function value is less than the loss threshold value e, recording the current thetaiThe training is finished for the secret shared value of the model parameter; otherwise, judging whether the current iteration time T is less than T, if so, continuing to iterate and executing S4, otherwise, ending the training.
In step S6, the user will predict the data set XPSplitting the data set according to the mode of the step S2 to obtain two sub data sets
Figure BDA0002993425280000085
And
Figure BDA0002993425280000086
respectively sent to the cloud server S0And S1
The principle of the invention is as follows: the data provider only needs to split the data into two parts and distribute the two parts to the two cloud servers, the two parties calculate respective parameter values by exchanging with each other by using methods such as secret sharing and random data disturbance without revealing secret sharing shares held by the two parties, and the data provider only needs to add the parameter values of the two parties to obtain model parameters, so that both the privacy of the model parameters and the privacy of original data can be protected.
Has the advantages that: at present, the event of data leakage is endlessly layered. Taking the medical industry as an example, a 2019 medical industry data security report issued by Protenus shows: in 2019, the hacker attack events in the medical industry are dramatically increased by 48% compared with 2018; since 2016, the medical industry has at least experienced an event of leakage of patient data with an average of at least one patient per day, with medical data continuing to "nude". Such an event seriously jeopardizes the personal privacy security of a large number of users. According to the technology, an enterprise or an organization can distribute data to the two cloud servers in a secret sharing mode, the cloud servers are used for storing the data, the two cloud servers can be used for calculating the linear regression model, and in the process, the cloud servers are used for calculating efficiently, and original data cannot be leaked.
Drawings
FIG. 1 is a general design of the present invention.
FIG. 2 is a diagram showing the relationship between steps and sequence in the operation of the present invention.
Fig. 3 is a flowchart of calculating a secret sharing value by using random number perturbation.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the accompanying drawings: the present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following embodiments.
Referring to fig. 1, which is a general design diagram of the present invention, a user may split training data into two cloud servers, here referring to step S2. The two servers communicate with each other according to the protocol, calculate their respective parameter values, and refer to steps S3 and S4. If the respective loss value calculated by the server is smaller than the predetermined loss threshold value, the iterative training is completed, the respective parameter is sent to the user, the user reconstructs the final model parameter, and refer to step S5. If the user needs to predict the data, the predicted data is sent to the two servers after being subjected to the same preprocessing, the two servers calculate respective predicted values and then transmit the predicted values to the user, and the user adds the two predicted values to obtain a predicted result, and the steps S6-S8 are referred.
As shown in fig. 2, which is a sequence diagram of each step during the operation of the present invention, the two servers in the prediction stage need to use the parameter values calculated in the training stage, wherein both the model parameter update in the training stage and the private prediction value calculation in the prediction stage need to use the multiplication operation of random number perturbation, that is, step S1.
Fig. 3 shows a flowchart for calculating the product of the secret sharing values by means of random number perturbation, which is exemplified by the case of step S1.
S1: multiplication of secret shared values
The first condition is as follows:
S1a1:P0and P1Co-negotiating a set of random matrices
Figure BDA0002993425280000101
Wherein
Figure BDA0002993425280000102
Matrix C(k)The vectors in (1) satisfy linear irrelevancy. P0Selecting a set of random vectors
Figure BDA0002993425280000103
And a set of random numbers
Figure BDA0002993425280000104
Wherein
Figure BDA0002993425280000105
P1Selecting a random vector ry=(ry1,ry2,…,ryt) And a set of random numbers
Figure BDA0002993425280000106
Wherein P is0Selected random vector
Figure BDA0002993425280000107
There are certain constraints, namely: ensuring random vectors
Figure BDA0002993425280000108
T random values of
Figure BDA0002993425280000109
Is composed of u different random values.
S1a2:P1Computing
Figure BDA00029934252800001010
Wherein b is(k)=q(k)v+C(k)ry
Figure BDA00029934252800001011
And sends B to P0
S1a3:P0Receiving P1Transmitted by
Figure BDA00029934252800001012
Then, h ═ h is calculated(1),h(2),...,h(s)) And sent to P1Wherein
Figure BDA00029934252800001013
M0[k]Representation matrix M0The k-th row vector of (2).
S1a4:P0Computing
Figure BDA00029934252800001014
Therein after that
Figure BDA00029934252800001015
Sending A to P1
S1a5:P1Computing
Figure BDA00029934252800001016
Wherein g is(k)=rya(k)
S1a6:P1From ryIn the sequence, w random values are successively selected for summation, and the sum(s) is obtained1,s2,...,su) Is sent to P0Wherein
Figure BDA0002993425280000111
S1a7:P0Computing
Figure BDA0002993425280000112
And sends the result to P1
S1a8:P1Computing
Figure BDA0002993425280000113
To obtain
Figure BDA0002993425280000114
And according to q(k)Computing
Figure BDA0002993425280000115
To P0
Case two:
S1b1:P0and P1Co-negotiating a set of random matrices
Figure BDA0002993425280000116
Wherein
Figure BDA0002993425280000117
Matrix C(k)The vectors in (1) satisfy linear irrelevancy. P0Selecting a random vector rx=(rx1,rx2,...,rxt) e.R and a set of random numbers
Figure BDA0002993425280000118
P1Selecting a random vector ry=(ry1,ry2,…,ryt) And a set of random numbers
Figure BDA0002993425280000119
Wherein P is1Selected random vector ryThere are certain constraints, namely: guarantee random vector ryT random values r ofy1,ry2,rytIs composed of u different random values.
S1b2:P0Computing
Figure BDA00029934252800001110
Wherein a is(k)=p(k)(M0k[T+C(k)(rx)T,M0[k]Representation matrix M0And sends a to P1
S1b3:P1Receiving P0Transmitted by
Figure BDA00029934252800001111
Then, h ═ h is calculated(1),h(2),...,h(s)) And is sent to P0Wherein
Figure BDA00029934252800001112
S1b4:P1Computing
Figure BDA00029934252800001113
Then, b therein(k)=q(k)C(k)v1+rySending B to P0
S1b5:P0Computing
Figure BDA00029934252800001114
Wherein
Figure BDA00029934252800001115
S1b6:P0From rxIn the sequence, w random values are successively selected for summation, and the sum(s) is obtained1,s2,...,su) Is sent to P1Wherein
Figure BDA00029934252800001116
S1b7:P1Computing
Figure BDA00029934252800001117
And sends the result to P0
S1b8:P0Computing
Figure BDA0002993425280000121
To obtain
Figure BDA0002993425280000122
And according to
Figure BDA0002993425280000123
Will be provided with
Figure BDA0002993425280000124
To P1
S1b9:P1According to qkIs calculated to obtain
Figure BDA0002993425280000125
S2: training data preprocessing
The data provider divides the required training data (X, y) into (X0, y0) and (X1, y1) of the same dimension size, and satisfies X0+ X1 and y0+ y 1. And sent to the calculators S0 and S1 through secure channels, respectively.
S3: parameter initialization
Si(i is 0, 1) the learning rate alpha and the number of small-batch samples are preset togetherL B, maximum iteration time T and loss threshold e, and respectively initializing model parameters thetaiThe initial number of iterations is set to 1. Wherein theta isie.R represents the t-dimensional column vector. It should be noted that the number | B | of samples selected here needs to be larger than the number t of column vectors of samples, since the condition applicable to step S1a needs to be satisfied.
S4: model parameter updating
Parameter update for the small batch gradient descent algorithm for the training data set (X, y):
Figure BDA0002993425280000126
where m denotes the current number of iterations, XBAnd yBRespectively representing the characteristic values and the target values of the small-batch sample sets.
S41:Si(i is 0, 1) randomly selecting sample data with the quantity of | B |
Figure BDA0002993425280000127
S42:Si(i equals 0, 1) calling step S1a1, according to the batch sample data
Figure BDA0002993425280000128
And the current model parameter θiSeparately deriving secret shared values
Figure BDA0002993425280000129
Namely, it is
Figure BDA00029934252800001210
S43:Si(i is 0, 1) is obtained
Figure BDA00029934252800001211
Sharing a value with a true secret
Figure BDA00029934252800001212
Error between
Figure BDA00029934252800001213
S44:Si(i is 0, 1) the calculation of the step S1a is called to obtain
Figure BDA00029934252800001214
Namely, it is
Figure BDA0002993425280000131
S45:Si(i-0, 1) updating equation (1) based on the model parameters of MBGD
Figure BDA0002993425280000132
Updating model parameter θi
S46:Si(i-0, 1) calculating the current loss function value lossi=Xi×θi-yiThe number of iterations t is t +1
S47:Si(i is 0, 1) judging whether the current loss function value satisfies lossiIf lossiIf < e, the current theta is recordediThe training is finished for the secret shared value of the model parameter; otherwise, judging whether the current iteration number meets T < T, if so, continuing to iterate and executing S4, otherwise, ending the training.
S5: model parameter reconstruction
The user receives the parameter value theta transmitted by the two serversiThen, θ is calculated01And reconstructing the parameters of the model.
S6: predictive data pre-processing
Through the steps, the current cloud server S is known0And S1Having secret shared values theta of model parameters, respectively0And theta1The user will predict data set XPSplitting the data set according to the mode of the step S2 to obtain two sub data sets
Figure BDA0002993425280000136
And
Figure BDA0002993425280000137
respectively sent to the cloud server S0And S1
S7: calculating a predicted shared value
Since the prediction data set X cannot be guaranteed at this timePSatisfies that the number of samples is less than the number of features, so Si(i is 0, 1) in case two of the step S1 is needed, the step S1b is invoked to calculate the secret sharing value
Figure BDA0002993425280000133
Namely, it is
Figure BDA0002993425280000134
S8: reconstructing a prediction result
Si(i-0, 1) respectively sharing the secret
Figure BDA0002993425280000138
Sending the prediction result to a user, and reconstructing a real prediction result by the user
Figure BDA0002993425280000135
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (5)

1. A privacy protection linear regression method based on secret sharing and random disturbance is characterized by comprising the following steps:
s1: multiplication of secret shared values
The step is mainly that two parties can share the value through the method under the premise of not revealing the own shared valueHiding the true value of the user into an equation with a random value and sending the equation to the other party, calculating by one party depending on the other party, and finally obtaining the product of the shared value by one party, wherein the given matrix M belongs to Rs×tThe sum vector v ∈ Rt,MiAnd vi(i ═ 0, 1) are their secret share shares and are each a calculator Pi(i ═ 0, 1), where M ═ M0+M1,v=v0+v1I.e. the calculation party P0Having a private matrix M0And a private vector v0Another party P1Having a private matrix M1And a private vector v1After solving the problem by means of random data perturbation, Pi(i ═ 0, 1) secret shared share p by which product Mv can be obtainedi=RPM(M0,M1,v0,v1);
S2: training data preprocessing
The method adopts a mode of splitting original data into two parts and sending the two parts of data to two servers, wherein the two servers meet the non-collusion and semi-honest requirements, namely private data of the two servers cannot be leaked, but respective operation is completed through communication with the server of the other side; the data preprocessing is to split the secret into two calculation parties in a secret sharing mode, and finally, the data can be reconstructed by only adding calculation results of the two calculation parties. Here the data provider divides the required training data (X, y) into equal dimensional sizes (X)0,y0) And (X)1,y1) Satisfy X ═ X0+X1,y=y0+y1. And sends the data to a calculator S through a safety channel0And S1. Where X ∈ R denotes a matrix with dimension size s × t, s is the number of samples, t is the number of features per sample, and y ∈ R denotes an s-dimensional column vector, called the target value of a sample.
S3: parameter initialization
The linear regression training model can adopt two methods, namely a least square method and a gradient descent method, the gradient descent method is adopted in the invention, the gradient descent method is divided into a batch gradient descent method, a random gradient descent method and a small-batch gradient descent method, and in order to enable the model parameters to be converged more quickly and the efficiency to be higher and to share the limiting condition of multiplication operation secretly, the small-batch gradient descent method is selected for iteration.
Si(i is 0, 1) commonly presetting a learning rate alpha, a small batch sample quantity | B |, a maximum iteration time T and a loss threshold e, and respectively initializing a model parameter thetaiThe initial number of iterations is set to 1. Wherein theta isie.R represents the t-dimensional column vector. It should be noted that the number | B | of samples selected here needs to be larger than the number t of column vectors of samples, since the condition applicable to step S1a needs to be satisfied.
S4: model parameter updating
Parameter update for the small batch gradient descent algorithm for the training data set (X, y):
Figure FDA0002993425270000021
where m denotes the current number of iterations, XBAnd yBRespectively representing the characteristic values and the target values of the small-lot sample sets,
s5: model parameter reconstruction
The user receives the parameter value theta transmitted by the two serversiThen, the parameter theta is calculatediAdding to reconstruct a model parameter theta;
s6: predictive data pre-processing
Through the steps, the current cloud server S is known0And S1Having secret shared values theta of model parameters, respectively0And theta1The user can distribute the prediction data to the cloud servers for prediction, and finally the prediction values of the two cloud servers are added to obtain a final prediction result, so that the prediction data set X needs to be subjected to information leakage to the cloud servers to prevent the prediction data set information from being leaked to the cloud serversPCarrying out pretreatment;
s7: calculating a predicted shared value
Since the prediction data set X cannot be guaranteed at this timePThe number of samples is less than the number of features, so Si(i-0, 1) using step S1In case two, the step S1b is invoked to respectively calculate the secret sharing value
Figure FDA0002993425270000031
Namely, it is
Figure FDA0002993425270000032
S8: reconstructing a prediction result
Si(i-0, 1) respectively sharing the secret
Figure FDA0002993425270000033
Sent to the user, the secret shared value being shared by the user
Figure FDA0002993425270000034
Adding to reconstruct the true prediction result yP
2. The privacy-preserving linear regression method based on secret sharing and random perturbation as claimed in claim 1, wherein the step S1 comprises the following steps:
S1a:P0can obtain M0v1One constraint required for this case is M0∈Rs×tS < t, i.e. the number of rows is less than the number of columns,
S1a1:P0and P1Co-negotiating a set of random matrices
Figure FDA0002993425270000035
Wherein
Figure FDA0002993425270000036
Matrix C(k)The vectors in (1) satisfy linear irrelevancy. P0Selecting a set of random vectors
Figure FDA0002993425270000037
And a set of random numbers
Figure FDA0002993425270000038
Wherein
Figure FDA0002993425270000039
P1Selecting a random vector ry=(ry1,ry2,…,ryt) And a set of random numbers
Figure FDA00029934252700000310
Wherein P is0Selected random vector
Figure FDA00029934252700000311
There are certain constraints, namely: ensuring random vectors
Figure FDA00029934252700000312
T random values of
Figure FDA00029934252700000313
Is composed of u different random values, such that
Figure FDA00029934252700000314
Wherein
Figure FDA00029934252700000315
Representing rounded-up symbols, then these random values (z)1,z2,...,zu) And vector
Figure FDA00029934252700000316
In a relationship of
Figure FDA00029934252700000317
S1a2:P1Computing
Figure FDA00029934252700000318
Wherein b is(k)=q(k)v+C(k)ry
Figure FDA00029934252700000319
And sends B to P0
S1a3:P0Receiving P1Transmitted by
Figure FDA00029934252700000320
Then, h ═ h is calculated(1),h(2),...,h(s)) And sent to P1Wherein
Figure FDA00029934252700000321
M0[k]Representation matrix M0The k-th row vector of (a),
S1a4:P0computing
Figure FDA0002993425270000041
Therein after that
Figure FDA0002993425270000042
Sending A to P1
S1a5:P1Computing
Figure FDA0002993425270000043
Wherein g is(k)=rya(k)
S1a6:P1From ryIn the sequence, w random values are successively selected for summation, and the sum(s) is obtained1,s2,...,su) Is sent to P0Wherein
Figure FDA0002993425270000044
S1a7:P0Computing
Figure FDA0002993425270000045
And will be knottedFruit is sent to P1
S1a8:P1Computing
Figure FDA0002993425270000046
To obtain
Figure FDA0002993425270000047
And according to q(k)Computing
Figure FDA0002993425270000048
To P0
S1a9:P0According to
Figure FDA0002993425270000049
Is calculated to obtain
Figure FDA00029934252700000410
Similarly, step P is calculated by the above1Can obtain M1v0
3. The privacy-preserving linear regression method based on secret sharing and random perturbation as claimed in claim 1, wherein the step S1 comprises the following steps:
S1b:P1can obtain M0v1
S1b1:P0And P1Co-negotiating a set of random matrices
Figure FDA00029934252700000411
Wherein
Figure FDA00029934252700000412
Matrix C(k)The vectors in (1) satisfy linear irrelevancy. P0Selecting a random vector rx=(rx1,rx2,...,rxt) e.R and a set of random numbers
Figure FDA00029934252700000413
P1Selecting a random vector ry=(ry1,ry2,…,ryt) And a set of random numbers
Figure FDA00029934252700000414
Wherein P is1Selected random vector ryThere are certain constraints, namely: guarantee random vector ryT random values r ofy1,ry2,rytIs composed of u different random values, such that
Figure FDA00029934252700000415
Wherein
Figure FDA00029934252700000416
Indicating a rounded up symbol. Then these random values (z)1,z2,...,zu) And vector ryIn a relationship of
Figure FDA00029934252700000417
S1b2:P0Computing
Figure FDA00029934252700000418
Wherein a is(k)=p(k)(M0[k])T+C(k)(rx)T,M0[k]Representation matrix M0And sends a to P1
S1b3:P1Receiving P0Transmitted by
Figure FDA0002993425270000051
Then, h ═ h is calculated(1),h(2),...,h(s)) And is sent to P0Wherein
Figure FDA0002993425270000052
S1b4:P1Computing
Figure FDA0002993425270000053
Then, b therein(k)=q(k)C(k)v1+rySending B to P0
S1b5:P0Computing
Figure FDA0002993425270000054
Wherein
Figure FDA0002993425270000055
S1b6:P0From rxIn the sequence, w random values are successively selected for summation, and the sum(s) is obtained1,s2,...,su) Is sent to P1Wherein
Figure FDA0002993425270000056
S1b7:P1Computing
Figure FDA0002993425270000057
And sends the result to P0
S1b8:P0Computing
Figure FDA0002993425270000058
To obtain
Figure FDA0002993425270000059
And according to
Figure FDA00029934252700000510
Will be provided with
Figure FDA00029934252700000511
To P1
S1b9:P1According to qkIs calculated to obtain
Figure FDA00029934252700000512
Similarly, by the above step P0Can obtain M1v0
4. The privacy-preserving linear regression method based on secret sharing and random perturbation as claimed in claim 1, wherein the step S4 comprises the following steps:
S41:Si(i is 0, 1) randomly selecting sample data with the quantity of | B |
Figure FDA00029934252700000513
S42:Si(i equals 0, 1) calling step S1a1, according to the batch sample data
Figure FDA00029934252700000514
And the current model parameter θiSeparately deriving secret shared values
Figure FDA00029934252700000515
Namely, it is
Figure FDA00029934252700000516
S43:Si(i is 0, 1) is obtained
Figure FDA00029934252700000517
Sharing a value with a true secret
Figure FDA00029934252700000518
Error between
Figure FDA00029934252700000519
S44:Si(i is 0, 1) the calculation of the step S1a is called to obtain
Figure FDA00029934252700000520
Namely, it is
Figure FDA00029934252700000521
S45:Si(i-0, 1) updating the formula according to the model parameters of MBGD
Figure FDA0002993425270000061
Updating model parameter θi
S46:Si(i-0, 1) calculating the current loss function value lossi=Xi×θi-yiNumber of iterations t plus 1
S47:Si(i is 0, 1) judging the current loss function value lossiIf the loss function value is less than the loss threshold value e, recording the current thetaiThe training is finished for the secret shared value of the model parameter; otherwise, judging whether the current iteration time T is less than T, if so, continuing to iterate and executing S4, otherwise, ending the training.
5. The privacy-preserving linear regression method based on secret sharing and random disturbance as claimed in claim 1, wherein in step S6, the user predicts the data set XPSplitting the data set according to the mode of the step S2 to obtain two sub data sets
Figure FDA0002993425270000062
And
Figure FDA0002993425270000063
respectively sent to the cloud server S0And S1
CN202110322472.9A 2021-03-25 2021-03-25 Privacy protection linear regression method based on secret sharing and random disturbance Active CN113065145B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110322472.9A CN113065145B (en) 2021-03-25 2021-03-25 Privacy protection linear regression method based on secret sharing and random disturbance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110322472.9A CN113065145B (en) 2021-03-25 2021-03-25 Privacy protection linear regression method based on secret sharing and random disturbance

Publications (2)

Publication Number Publication Date
CN113065145A true CN113065145A (en) 2021-07-02
CN113065145B CN113065145B (en) 2023-11-24

Family

ID=76563761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110322472.9A Active CN113065145B (en) 2021-03-25 2021-03-25 Privacy protection linear regression method based on secret sharing and random disturbance

Country Status (1)

Country Link
CN (1) CN113065145B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114650134A (en) * 2022-03-31 2022-06-21 深圳前海环融联易信息科技服务有限公司 Longitudinal privacy protection logistic regression method based on secret sharing
CN114679316A (en) * 2022-03-25 2022-06-28 中国人民解放军国防科技大学 Safety prediction method and system for personnel mobility, client device and server
CN114817997A (en) * 2022-06-24 2022-07-29 蓝象智联(杭州)科技有限公司 Shared data random ordering method based on secret sharing
CN115632761A (en) * 2022-08-29 2023-01-20 哈尔滨工业大学(深圳) Multi-user distributed privacy protection regression method and device based on secret sharing
WO2023093278A1 (en) * 2021-11-24 2023-06-01 华为技术有限公司 Digital signature thresholding method and apparatus
WO2023213190A1 (en) * 2022-05-06 2023-11-09 华为技术有限公司 Model security aggregation method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108712260A (en) * 2018-05-09 2018-10-26 曲阜师范大学 The multi-party deep learning of privacy is protected to calculate Proxy Method under cloud environment
CN110537191A (en) * 2017-03-22 2019-12-03 维萨国际服务协会 Secret protection machine learning
US10600006B1 (en) * 2019-01-11 2020-03-24 Alibaba Group Holding Limited Logistic regression modeling scheme using secrete sharing
CN110998579A (en) * 2019-01-11 2020-04-10 阿里巴巴集团控股有限公司 Privacy-preserving distributed multi-party security model training framework
CN112182649A (en) * 2020-09-22 2021-01-05 上海海洋大学 Data privacy protection system based on safe two-party calculation linear regression algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110537191A (en) * 2017-03-22 2019-12-03 维萨国际服务协会 Secret protection machine learning
CN108712260A (en) * 2018-05-09 2018-10-26 曲阜师范大学 The multi-party deep learning of privacy is protected to calculate Proxy Method under cloud environment
US10600006B1 (en) * 2019-01-11 2020-03-24 Alibaba Group Holding Limited Logistic regression modeling scheme using secrete sharing
CN110998579A (en) * 2019-01-11 2020-04-10 阿里巴巴集团控股有限公司 Privacy-preserving distributed multi-party security model training framework
CN112182649A (en) * 2020-09-22 2021-01-05 上海海洋大学 Data privacy protection system based on safe two-party calculation linear regression algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
董业;侯炜;陈小军;曾帅;: "基于秘密分享和梯度选择的高效安全联邦学习", 计算机研究与发展, no. 10 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023093278A1 (en) * 2021-11-24 2023-06-01 华为技术有限公司 Digital signature thresholding method and apparatus
CN114679316A (en) * 2022-03-25 2022-06-28 中国人民解放军国防科技大学 Safety prediction method and system for personnel mobility, client device and server
CN114650134A (en) * 2022-03-31 2022-06-21 深圳前海环融联易信息科技服务有限公司 Longitudinal privacy protection logistic regression method based on secret sharing
WO2023213190A1 (en) * 2022-05-06 2023-11-09 华为技术有限公司 Model security aggregation method and device
CN114817997A (en) * 2022-06-24 2022-07-29 蓝象智联(杭州)科技有限公司 Shared data random ordering method based on secret sharing
CN114817997B (en) * 2022-06-24 2022-09-23 蓝象智联(杭州)科技有限公司 Shared data random ordering method based on secret sharing
CN115632761A (en) * 2022-08-29 2023-01-20 哈尔滨工业大学(深圳) Multi-user distributed privacy protection regression method and device based on secret sharing

Also Published As

Publication number Publication date
CN113065145B (en) 2023-11-24

Similar Documents

Publication Publication Date Title
CN112182649B (en) Data privacy protection system based on safe two-party calculation linear regression algorithm
CN113065145A (en) Privacy protection linear regression method based on secret sharing and random disturbance
CN111324870B (en) Outsourcing convolutional neural network privacy protection system based on safe two-party calculation
Wagh et al. SecureNN: 3-party secure computation for neural network training
Han et al. Logistic regression on homomorphic encrypted data at scale
CN112183730B (en) Neural network model training method based on shared learning
Dong et al. Eastfly: Efficient and secure ternary federated learning
Zhang et al. GELU-Net: A Globally Encrypted, Locally Unencrypted Deep Neural Network for Privacy-Preserved Learning.
CN110750801B (en) Data processing method, data processing device, computer equipment and storage medium
Li et al. Optimizing privacy-preserving outsourced convolutional neural network predictions
CN114696990B (en) Multi-party computing method, system and related equipment based on fully homomorphic encryption
CN113435592A (en) Privacy-protecting neural network multi-party cooperative lossless training method and system
CN113420232A (en) Privacy protection-oriented graph neural network federal recommendation method
CN112883387A (en) Privacy protection method for machine-learning-oriented whole process
CN115994559A (en) Efficient method for converting unintentional neural network
Shen et al. ABNN2: secure two-party arbitrary-bitwidth quantized neural network predictions
Miyajima et al. Machine Learning with Distributed Processing using Secure Divided Data: Towards Privacy-Preserving Advanced AI Processing in a Super-Smart Society
Aharoni et al. He-pex: Efficient machine learning under homomorphic encryption using pruning, permutation and expansion
CN116595589B (en) Secret sharing mechanism-based distributed support vector machine training method and system
Dong et al. Meteor: improved secure 3-party neural network inference with reducing online communication costs
CN117291258A (en) Neural network training reasoning method and system based on function secret sharing
CN111859440A (en) Sample classification method of distributed privacy protection logistic regression model based on mixed protocol
CN113098682B (en) Multi-party security computing method and device based on block chain platform and electronic equipment
CN114358323A (en) Third-party-based efficient Pearson coefficient calculation method in federated learning environment
CN111614456B (en) Multi-party collaborative encryption method for SM4 algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant