CN111737743A

CN111737743A - Deep learning differential privacy protection method

Info

Publication number: CN111737743A
Application number: CN202010572297.4A
Authority: CN
Inventors: 陶陶; 柏建树; 郑啸; 刘恒; 王爱国
Original assignee: Anhui University of Technology AHUT
Current assignee: Anhui University of Technology AHUT
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2020-10-02

Abstract

The invention discloses a deep learning differential privacy protection method, and belongs to the technical field of information system security. The invention provides a novel deep learning differential privacy protection model, WGAN is adopted to generate an image result for data subjected to model privacy protection processing, a result closest to a real image is selected from the generated image, the similarity of the generated result and an original image is compared, a difference value is calculated to carry out threshold value comparison, and privacy parameters in the gradient of the model are fed back and adjusted under the condition of the similarity threshold value limitation, so that a certain promoting effect is provided for the application of differential privacy in the fields of deep learning and the like.

Description

Deep learning differential privacy protection method

Technical Field

The invention belongs to the technical field of information system security, and particularly relates to a deep learning differential privacy protection method.

Background

In the existing privacy protection method for a relatively common data set, for example, anonymity processing is performed on data by adopting k-anonymity, and the like, the effect of true processing is difficult to provide strict privacy guarantee. As a novel privacy protection technology with great advantages, the Differential privacy (DR) technology is a privacy protection method based on data distortion proposed for an attacker with a strong knowledge background, and the purpose of protecting data privacy is achieved by adding noise to ensure that any record is inserted or deleted in a data set without influencing the query output result. The technology is established on the strict mathematical basis, and provides a quantitative evaluation method, which is one of the most effective and high-applicability ways of the current privacy protection technology. The differential privacy technology is proposed by many developers to research and expand the technology, various algorithm models are generated continuously, and the technology plays an important role in daily life, industry, production, medical treatment and the like.

As one of deep learning model classifications, the Generative Adaptive Networks (GAN) can generate an image result very close to the original image, achieving the effect of false or true. However, the conventional GAN has the problems of unstable training, collapse of pattern, disappearance of gradient and the like, so that the actual training process is often difficult to generate the desired image result. Until the proposals of Wasserstein GAN (WGAN) solved these problems well, the WGAN replaces the asymmetric JS divergence used by the traditional GAN with the smoothness and symmetry of the Wasserstein distance, and the training process shows strong stability and high image generation quality. Therefore, from the viewpoint of training stability and image generation quality, generation of a deep learning image data set by WGAN is becoming an important subject of many fields such as image processing and computer vision.

The architecture for adjusting the deep learning differential privacy protection algorithm based on the WGAN feedback method usually includes many parameters, and the setting of the parameters is generally considered as a key factor for balancing the privacy protection degree and the data availability. However, the general privacy parameter grouping method is often limited by the user's own requirements, and the privacy protection degree of the model is not analyzed in a fixed manner, so that the realization of the balance between the privacy protection degree and the data availability is hindered.

Through searching, the application number is: 201811540698.0, filing date: 12 and 17 in 2018, the invention name is as follows: a combined deep learning training method based on privacy protection technology is provided. In the application, a Homorphic Encryption (HE) method is used for sending encrypted data to a cloud server, and a user obtains the data through ciphertext decryption. However, the homomorphic encryption only involves addition and multiplication, and is difficult to adapt to the complex operation requirement of deep learning, and the operation process needs to consume a large amount of computing resources, which may cause the performance degradation of the deep learning network.

As another example, application numbers are: 201710611972.8, filing date: 7, month and 25 in 2017, the invention name is: a deep differential privacy protection method based on a generative countermeasure network. In this application, the concept of deep convolution generation of a countermeasure network (DCGAN) is applied to a deep learning image data set for privacy protection. However, the DCGAN used in this method still has a deficiency in training stability, as the number of training times increases, some parameters (such as filter) will oscillate due to collapse, and the DCGAN generation model is limited by batch normalization. In addition, the grouping setting method of the privacy parameters by the method mainly depends on the individual requirements of the user, and qualitative analysis of privacy loss minimization is not carried out on the feedback-adjusted privacy parameter setting.

Based on the above analysis, there is a need in the art for a deep learning data set privacy protection method that can better balance the privacy protection degree and the data availability.

Disclosure of Invention

1. Technical problem to be solved by the invention

The invention aims to overcome the defect that the privacy protection degree and the data usability are difficult to balance when the deep learning data set is subjected to privacy protection, and provides a deep learning differential privacy protection method. The invention provides a novel deep learning differential privacy protection model, WGAN is adopted to generate image results for data subjected to model privacy protection processing, the result closest to a real image is selected from the generated images, the similarity of the generated result and an original image is compared, a difference value is calculated for threshold value comparison, and privacy parameters in the gradient of the model are fed back and adjusted under the condition of the similarity threshold value, so that a certain promoting effect is provided for the application of differential privacy in the fields of deep learning and the like.

2. Technical scheme

In order to achieve the purpose, the technical scheme provided by the invention is as follows:

the invention discloses a deep learning differential privacy protection method, which comprises the following steps:

step 1, constructing a deep learning network introducing a differential privacy mechanism;

step 2, Gaussian noise is added to the gradient in a parameter optimization stage of the deep learning network by combining a difference privacy theory;

step 3, calculating the privacy loss of the step 2 by combining the combinable characteristics of the differential privacy;

step 4, generating a generated image processed by the differential privacy parameters by using a generating model of the WGAN; generating an image by utilizing a WGAN generation model for the raw data which is not processed at all;

step 5, selecting an optimal image result from the generated image processed by the difference privacy parameters, comparing the difference between the optimal image and the original data generated image, and calculating a similarity difference value;

and 6, under the limit of the similarity threshold, feeding back related privacy parameters in the regulation model to enable the privacy loss in the step 3 to reach the minimum value, and realizing the balance between privacy protection and data availability.

Furthermore, a convolutional neural network with two convolutional layers and three fully-connected layers is established in step 1, and the introduced differential privacy is (,) -differential privacy, as shown in formula (1):

Pr[M(D)∈S_M]≤e×Pr[M(D’)∈S_M]+ (1)

wherein M is a given random algorithm; d and D' are neighbor datasets that differ by at most one record; s_MIs randomAll possible outputs of the algorithm M on the data sets D and D'; representing a privacy budget and a privacy error value, respectively.

Furthermore, the process of adding gaussian noise to the deep network parameter optimization stage in the step 2 is as follows:

from the training data set X ═ X₁,x₂,...,x_nRandomly selecting a small batch of training data to input, wherein the batch size is m, and calculating a gradient value corresponding to each training data

For L of each gradient₂Gradient cutting is carried out on the norm, the average value is calculated, and a new gradient value is obtained within the threshold value range C

Then at the new gradient value

Adding Gaussian noise V-N (0, sigma)²) The output of the disturbance gradient, sigma is the scale of noise addition; finally, the new gradient is reduced according to the gradient descent method

Moving one step backward and updating gradient value parameter theta_t。

Further, the calculation of the privacy loss in step 3 is represented by formula (2):

wherein M is a given random algorithm; D. d' is two adjacent data sets, aux is the input auxiliary parameter, s represents the output and s is for R;

the added gaussian noise has markov property, and the following formula (3) can be obtained by combining the definition of differential privacy:

wherein, the gaussian noise V and V 'added to the neighboring data sets D and D' satisfies the following formula (4):

V'＝V+Dd (4)

wherein Dd is differential privacy sensitivity;

finally, the characteristics of the Gaussian mechanism are combined to obtain

The privacy loss in the process of adding gaussian noise can be obtained by simplifying the formulas (2), (3) and (4) as the following formula (5):

further, the loss function of WGAN in step 4 is shown in equation (6):

wherein, P_dRepresenting the true data distribution, P_gRepresenting a generated data distribution; when updating the weights, it is necessary to maintain the network parameters within a range that satisfies the Lipschitz condition.

Furthermore, the optimal selection of the generated image after the privacy parameter processing in step 5 mainly depends on the classification accuracy and the visual evaluation, and the similarity difference is calculated by subtracting the classification accuracy of the optimal selection from the classification accuracy of the generated image without any processing, as shown in formula (7):

C_acc＝acc_r-acc_p(7)

wherein, acc_rRepresenting the generated image classification accuracy without any processing; acc (acrylic acid)_pThe generated image classification accuracy rate representing the optimal choice.

Furthermore, the setting of the similarity threshold C in step 6 is generally 10%; calculating C obtained in step 5_accThe size of the similarity threshold C is set; if C_accWhen the value of (A) is greater than C, the proper value is selected againRepeating step 5 until C_accIs less than C; when C is present_accWhen the value of (a) is less than (C), the magnitude of privacy loss is evaluated through step (3), and the privacy loss is minimized by selecting an appropriate sum and the condition of gaussian noise is satisfied, so that the balance between the privacy protection degree and the data availability is finally realized.

3. Advantageous effects

Compared with the prior art, the technical scheme provided by the invention has the following remarkable effects:

(1) according to the deep learning differential privacy protection method, a deep learning model introducing a (nor) differential privacy mechanism is constructed, wherein the degree of privacy protection is influenced by privacy parameters and sigma, through setting a plurality of pairs of privacy parameter groups, single-parameter variable and multi-parameter variable classification accuracy rate experiments are carried out, and the most appropriate privacy parameter combination is selected, so that the privacy protection effect of the control model can be effectively improved.

(2) According to the deep learning differential privacy protection method, the WGAN is used for generating the generated image after being processed by the privacy parameters and the generated image without being processed, the privacy parameters are fed back and adjusted by combining the evaluation criteria of privacy loss and classification accuracy, and finally a group of proper privacy parameters is obtained, so that a good privacy protection effect and high data availability can be guaranteed. The WGAN well solves the problems of the traditional GAN and shows better training stability and generated image quality than other GAN derivative variants (such as DCGAN), thereby being beneficial to improving the usability of generated results and the practical significance of research.

(3) According to the deep learning differential privacy protection method, a method for quantitatively calculating privacy loss is designed, and whether privacy loss can reach a smaller value through the privacy loss minimization can be verified through the privacy parameter setting adjusted through feedback, so that a better privacy protection effect is obtained.

Drawings

FIG. 1 is a diagram of the overall architecture of the model of the present invention;

FIG. 2 is a flow chart of the privacy feedback portion of the WGAN-based of the invention;

FIG. 3 is a schematic diagram of a convolutional neural network model of the present invention.

Detailed Description

For a further understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings and examples.

Example 1

With reference to fig. 1, a deep learning differential privacy protection method according to this embodiment includes the steps of:

step 1, constructing a convolutional neural network with two convolutional layers and three fully-connected layers, introducing a difference privacy theory in the parameter optimization of the network, and adding Gaussian noise meeting a Gaussian mechanism, wherein the specific process is as follows:

initializing, establishing a convolutional neural network with two convolutional layers and three fully-connected layers, and initializing model parameters of the convolutional neural network, as shown in fig. 3. Introducing (,) -differential privacy is shown in equation (1):

Pr[M(D)∈S_M]≤e×Pr[M(D’)∈S_M]+ (1)

wherein M is a given random algorithm; d and D' are neighbor datasets that differ by at most one record; s_MIs all possible outputs of the random algorithm M on the data sets D and D'. The degree of privacy protection is governed by the privacy parameters: privacy budget, privacy disclosure error value, and Gaussian noise V-N (0, σ) derived from satisfying the Gaussian mechanism²) The noise addition scale σ of (2), excessive noise may reduce the usability of data, and too little noise may reduce the degree of privacy protection.

Step 2, Gaussian noise is added to the gradient in a parameter optimization stage of the deep learning network by combining a difference privacy theory; a privacy parameter grouping method is set, and the specific process is as follows:

from the training data set X ═ X₁,x₂,...,x_nRandomly selecting a small batch (batch size m) of training data to input, and calculating each training data (x)_i∈m_t) Corresponding gradient value

For L of each gradient₂Gradient cutting is carried out on the norm, the average value is calculated, and a new gradient value is obtained within the range of a gradient cutting threshold value C

Then at the new gradient value

Adding Gaussian noise V-N (0, sigma)²) Outputting a disturbance gradient; finally, the new gradient is determined according to a gradient descent method

Moving one step backward and updating gradient value parameter theta_t。

For privacy parameters influencing the differential privacy protection effect, a classification accuracy rate experiment is carried out by changing a single parameter and fixing other two parameters; the ideal value range obtained by analyzing the variation trend is [0.5,1,2,4], the value range of sigma is [2,4,6,8], and the value range is [1e-5,1e-4,1e-3,1e-2 ]. Then, 64 sets of privacy parameter combinations are constituted by combinations with each other.

Step 3, calculating the privacy loss of the step 2 according to the characteristic that the privacy protection can be quantitatively evaluated by combining the differential privacy; the specific process is as follows:

the privacy loss is used as a random variable, and the value directly reflects the privacy protection effect and mainly depends on the added Gaussian noise. Due to Gaussian noise V-N (0, sigma)²) With markov property, the privacy loss of the process can be tracked by a Moments Accountant (MA) calculation method, and the privacy loss is calculated as represented by formula (2):

wherein M is a random algorithm; D. d' is two adjacent datasets; aux is an input auxiliary parameter; s represents the output and s ∈ R.

For better quantitative calculation of privacy loss, analyzing the influence of privacy parameter setting on privacy loss, and combining the characteristics of gaussian noise with the definition of differential privacy, the following formula (3) can be obtained:

by the global sensitivity characteristic of differential privacy, the gaussian noise V and V 'when added to the adjacent data sets D and D' satisfy the following equation (4):

V'＝V+Dd (4)

where Dd is the differential privacy sensitivity.

Because in the Gaussian mechanism, the Gaussian noise V to N (0, sigma)²) Satisfy sigma²≥c△f/，c²>2In (2 /); thus can obtain

After the above calculation formula is simplified, the privacy loss in the process of adding gaussian noise can be obtained as the following formula (5):

step 6 setting of privacy parameters for feedback adjustment requires minimizing the value of equation (5) as much as possible and needs to be satisfied

To obtain a better privacy protection effect.

Step 4, generating a generated image processed by the difference privacy parameter in the step 2 by using a generating model of the WGAN, and generating an image by using the WGAN generating model for the raw data which is not processed; calculating the classification accuracy, and the specific process is as follows:

firstly, building a WGAN class, defining basic information: the size (28,28,1) of the input picture; the input implicit coding dimension (100 dimensions); defining a generator and a discriminator function; a loss function of the WGAN is defined. Then, a generator and a discriminator are set up, and a weight clipping value (0.01) is set. Finally, the parameters of the generator are optimized (RMSProp method). In the training process, the WGAN is used for setting to train the generation model under different privacy parameters to generate an image result, the obtained image result trains a classifier, and the classification accuracy of the test set is tested; the classifier was trained using the image results generated by the WGAN without any privacy protection and the test set was tested for classification accuracy. The loss function of WGAN is shown in equation (6):

The invention uses the excellent derivative variant WGAN of the GAN to generate the images processed by the privacy parameters and the images without any processing, and well solves the problems of unstable training, mode collapse and the like of the traditional GAN. The generative model of the WGAN is continuously trained towards the direction closer to the real data distribution, and the closer the accuracy of the generative result is to the real result, the better the data usability is shown. In combination with the privacy loss calculation and analysis in step 3, the objectives of the present invention are: the method ensures a better privacy protection effect as far as possible and simultaneously gives consideration to higher data availability.

Step 5, selecting an optimal image result from the generated image processed by the difference privacy parameters, comparing the difference between the optimal image and the original data generated image, and calculating a similarity difference value; the specific process is as follows:

and selecting the most appropriate generated image result according to the test accuracy and the visual evaluation, calculating the difference value of the accuracy of the generated image result without privacy protection, and regarding the difference value as the similarity difference value. As shown in equation (7):

C_acc＝acc_r-acc_p(7)

Step 6, under the limit of the similarity threshold, feeding back related privacy parameters in the regulation model, as shown in fig. 2, so that the privacy loss in the step 3 reaches the minimum value, and the balance between the privacy protection degree and the data availability is realized; the specific process is as follows:

calculating C obtained in step 5_accThe size of the similarity threshold C (10%) set; if C_accIf the value of C is greater than C, the appropriate generated image is selected again, and step 5 is repeated until C is reached_accIs less than C; when C is present_accWhen the value of (a) is less than (C), the magnitude of the privacy loss is evaluated by step (3); and finally, the balance between the privacy protection degree and the data availability is realized.

The present invention and its embodiments have been described above schematically, without limitation, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, if the person skilled in the art receives the teaching, without departing from the spirit of the invention, the person skilled in the art shall not inventively design the similar structural modes and embodiments to the technical solution, but shall fall within the scope of the invention.

Claims

1. A deep learning differential privacy protection method is characterized by comprising the following steps:

step 4, generating a generated image processed by the differential privacy parameters by using a generating model of the WGAN; generating an image by utilizing a WGAN generation model on the raw data without any treatment;

2. The deep-learning differential privacy protection method according to claim 1, wherein: in step 1, a convolutional neural network with two convolutional layers and three fully-connected layers is established, and the introduced differential privacy is (,) -differential privacy, as shown in formula (1):

Pr[M(D)∈S_M]≤e×Pr[M(D’)∈S_M]+ (1)

wherein M is a given random algorithm; d and D' are neighbor datasets that differ by at most one record; s_MIs all possible outputs of the random algorithm M on the data sets D and D'; representing a privacy budget and a privacy error value, respectively.

3. The deep-learning differential privacy protection method according to claim 2, wherein: step 2, the process of adding Gaussian noise in the deep network parameter optimization stage is as follows:

from the training data set X ═ X₁,x₂,...,x_nRandomly selecting a small batch of training data for input, wherein the batch size is m, and calculating a gradient value corresponding to each training data

Then at the new gradient value

Moving one step backward and updating gradient value parameter theta_t。

4. The deep-learning differential privacy protection method of claim 3, wherein: the privacy loss calculation in step 3 is represented by formula (2):

V'＝V+Dd (4)

wherein Dd is differential privacy sensitivity;

finally, the characteristics of the Gaussian mechanism are combined to obtain

5. the deep-learning differential privacy protection method according to claim 4, wherein: the loss function of WGAN in step 4 is shown in equation (6):

6. The deep-learning differential privacy protection method of claim 5, wherein: in step 5, the optimal selection of the generated image after the privacy parameter processing mainly depends on the classification accuracy and the visual evaluation, and the calculation of the similarity difference is obtained by subtracting the optimal selection classification accuracy from the classification accuracy of the generated image without any processing, as shown in formula (7):

C_acc＝acc_r-acc_p(7)

wherein, acc_rRepresenting the generated image classification accuracy without any processing; acc (acrylic acid)_pRepresenting the generated image classification accuracy for the optimal selection.

7. The deep-learning differential privacy protection method of claim 6, wherein: setting the similarity threshold C in the step 6 to be generally 10 percent; calculating C obtained in step 5_accThe size of the similarity threshold C is set; if C_accIf the value of C is greater than C, the appropriate generated image is selected again, and step 5 is repeated until C is reached_accIs less than C; when C is present_accWhen the value of (a) is less than (C), the magnitude of privacy loss is evaluated through step (3), and the privacy loss is minimized by selecting an appropriate sum and the condition of gaussian noise is satisfied, so that the balance between the privacy protection degree and the data availability is finally realized.