CN111523668B

CN111523668B - Training method and device of data generation system based on differential privacy

Info

Publication number: CN111523668B
Application number: CN202010373419.7A
Authority: CN
Inventors: 熊涛
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-05-06
Filing date: 2020-05-06
Publication date: 2021-08-20
Anticipated expiration: 2040-05-06
Also published as: TW202143118A; WO2021223663A1; CN111523668A; TWI761151B; CN113642731A

Abstract

The embodiment of the specification provides a training method and a training device for a data generation system based on differential privacy, wherein the data generation system comprises a self-coding network and a discriminator; and determining the reconstruction loss of the sample according to the comparison of the real sample and the recovered sample. Further, the synthesized samples are generated by a self-encoding network. And respectively inputting the real sample and the synthesized sample into a discriminator to obtain a first probability and a second probability of the real sample. Aiming at the discriminator, noise is added on the gradient in a differential privacy mode by taking the first loss as a target to be reduced so as to adjust the parameter of the discriminator, wherein the first loss is negatively related to the first probability and positively related to the second probability. And aiming at the self-coding network, adding noise on the gradient with the aim of reducing the second loss to adjust the self-coding network parameters, wherein the second loss is positively correlated with the sample reconstruction loss and is negatively correlated with the first loss.

Description

Training method and device of data generation system based on differential privacy

Technical Field

One or more embodiments of the present specification relate to the field of computer technology, and more particularly, to a training method and apparatus for a differential privacy-based data generation system executed by a computer.

Background

With the development of computer technology, there is a great demand for automatic data synthesis. For example, in the scene of picture recognition, a large number of pictures need to be automatically generated or synthesized for machine learning; in a scenario such as intelligent customer service, it is necessary to automatically generate a dialog text. In one case, when a research result based on user sample data is displayed, some simulated user sample data needs to be synthesized to replace real user data for displaying for the purpose of protecting user privacy. In other cases, it may also be desirable to automatically generate synthesized data in other formats, such as audio.

For this reason, it is attempted to train some generative models to automatically perform data generation by way of machine learning. For example, in one approach, a Generative Adaptive Networks (GAN) is trained, in which Generative models are used to synthesize data. However, in the conventional GAN training mode, on one hand, the generation effect of the generative model needs to be further improved, and on the other hand, the conventional GAN training mode is vulnerable to attack and difficult to ensure the privacy and security of data.

Accordingly, improved solutions are desired that result in a more secure and efficient data generation system.

Disclosure of Invention

One or more embodiments of the present specification describe a training method for a data generation system based on differential privacy, so as to obtain a data generation system which protects privacy and is more effective.

According to a first aspect, there is provided a method of training a differential privacy based data generation system, the data generation system comprising a self-encoding network and a discriminator, the method comprising:

inputting a first real sample into the self-coding network to obtain a first recovered sample;

determining sample reconstruction loss according to the comparison of the first real sample and the first recovery sample;

generating a first synthesized sample through the self-encoding network;

inputting the first real sample into the discriminator to obtain a first probability of the first real sample; inputting the first synthesized sample into the discriminator to obtain a second probability that the first synthesized sample belongs to a real sample;

aiming at a first parameter corresponding to the discriminator, noise is added to a gradient obtained by taking reduction of first prediction loss as a target in a mode of differential privacy, and the first parameter is adjusted according to the obtained first noise gradient, wherein the first prediction loss is negatively related to the first probability and positively related to the second probability;

and aiming at a second parameter corresponding to the self-coding network, adding noise on a gradient obtained by taking reduction of a second prediction loss as a target in a mode of differential privacy, and adjusting the second parameter according to the obtained second noise gradient, wherein the second prediction loss is positively correlated with the sample reconstruction loss, is positively correlated with the first probability, and is negatively correlated with the second probability.

According to one embodiment, a self-encoding network includes an encoder, a generator, and a decoder; in such a case, inputting the first true sample into the self-coding network to obtain a first recovered sample, specifically including: inputting a first original vector corresponding to the first real sample into the encoder to obtain a first feature vector reduced to a first representation space; inputting the first feature vector into the decoder, resulting in the first recovered sample; generating a first synthesized sample through the self-coding network, specifically comprising: generating, by the generator, a second feature vector in the first token space; and inputting the second feature vector into the decoder to obtain the first synthetic data.

Further, in one embodiment, the encoder may be implemented as a first multi-layered perceptron, with the number of neurons in each layer decreasing layer by layer; the decoder may be implemented as a second multi-layered perceptron, with the number of neurons in each layer increasing layer by layer.

According to one embodiment, the sample reconstruction loss is determined by: determining a vector distance between a first original vector corresponding to a first real sample and a first recovery vector corresponding to the first recovery sample; determining the sample reconstruction loss as being positively correlated to the vector distance.

In one embodiment, adding noise to a gradient obtained with a goal of reducing the first prediction loss, and adjusting the first parameter according to the obtained first noise gradient specifically includes: determining, for the first parameter, a first original gradient that reduces the first prediction loss; based on a preset first cutting threshold value, cutting the first original gradient to obtain a first cutting gradient; determining a first Gaussian noise for implementing differential privacy using a first Gaussian distribution determined based on the first clipping threshold; and superposing the first Gaussian noise and the first cutting gradient to obtain the first noise gradient.

In one embodiment, adding noise to a gradient obtained with a goal of reducing the second prediction loss, and adjusting the second parameter according to the obtained second noise gradient specifically includes: determining, for the second parameter, a second original gradient that reduces the second prediction loss; based on a preset second cutting threshold value, cutting the second original gradient to obtain a second cutting gradient; determining a second Gaussian noise for implementing differential privacy using a second Gaussian distribution determined based on the second clipping threshold; and superposing the second Gaussian noise and the second cutting gradient to obtain the second noise gradient.

Further, the second parameter can be divided into an encoder parameter, a generator parameter and a decoder parameter; in one embodiment, a third original gradient corresponding to the decoder parameters, a fourth original gradient corresponding to the encoder parameters, and a fifth original gradient corresponding to the generator parameters, respectively, may be determined by gradient backpropagation; respectively adding noise in the third original gradient, the fourth original gradient and the fifth original gradient by using a differential privacy mode to obtain a corresponding third noise gradient, a corresponding fourth noise gradient and a corresponding fifth noise gradient; adjusting the decoder parameters using the third noise gradient; adjusting the encoder parameters using the fourth noise gradient; adjusting the generator parameter using the fifth noise gradient.

In another embodiment, after determining a third original gradient corresponding to the decoder parameter, a fourth original gradient corresponding to the encoder parameter, and a fifth original gradient corresponding to the generator parameter through gradient back propagation, respectively, noise is added to the third original gradient in a differential privacy manner to obtain a corresponding third noise gradient; adjusting the decoder parameters using the third noise gradient; adjusting the encoder parameters using the fourth original gradient; adjusting the generator parameter using the fifth raw gradient.

In various embodiments, the first authentic sample may be a picture sample, an audio sample, a text sample, or a business object sample.

According to a second aspect, there is provided a training apparatus for a differential privacy based data generation system, the data generation system comprising a self-encoding network and a discriminator, the apparatus comprising:

a recovery sample obtaining unit configured to input a first true sample into the self-coding network to obtain a first recovery sample;

a reconstruction loss determination unit configured to determine a sample reconstruction loss according to a comparison of the first real sample and the first restored sample;

a synthesized sample obtaining unit configured to generate a first synthesized sample through the self-encoding network;

the probability obtaining unit is configured to input the first real sample into the discriminator to obtain a first probability of the first real sample; inputting the first synthesized sample into the discriminator to obtain a second probability that the first synthesized sample belongs to a real sample;

a first parameter adjusting unit configured to add noise to a gradient obtained with a goal of reducing a first prediction loss in a manner of differential privacy with respect to a first parameter corresponding to the discriminator, and adjust the first parameter according to the obtained first noise gradient, wherein the first prediction loss is negatively related to the first probability and positively related to the second probability;

and a second parameter adjusting unit configured to add noise to a gradient obtained with a goal of reducing a second prediction loss positively correlated with the sample reconstruction loss, positively correlated with the first probability, and negatively correlated with the second probability, for a second parameter corresponding to the self-coding network, by using a differential privacy method, and adjust the second parameter according to a second noise gradient obtained.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first aspect.

By the method and the device provided by the embodiment of the specification, the generation model in the conventional GAN is realized through the self-coding network, and the self-coding network can perform auxiliary training by means of the coding process for restoring the real sample, so that the synthetic data highly simulating the real sample is obtained. In the training process, differential privacy is introduced into the self-coding network and the discriminator respectively through a gradient descending mode of the differential privacy, and the data generation system with the differential privacy characteristic is obtained. Due to the introduction of differential privacy, the information of the training samples is difficult to reverse-deduce or identify based on the disclosed model, and privacy protection is provided for the model. In this way, a more efficient and secure data generation system is obtained.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 illustrates an architectural diagram of a data generation system according to the present concepts;

FIG. 2 illustrates a flow diagram of a training method of a differential privacy based data generation system, according to one embodiment;

FIG. 3 shows a schematic block diagram of an encoder and decoder according to one embodiment;

FIG. 4 shows a schematic block diagram of a training apparatus of a data generation system according to one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

FIG. 1 illustrates an architectural diagram of a data generation system in accordance with the present concepts. As shown in fig. 1, the data generation system as a whole includes a self-encoding network 100 and an arbiter 200. The self-encoding network 100 may include an encoder 110, a generator 120, and a decoder 130. The encoder 110 is configured to encode a high-dimensional feature vector of input real sample data x into a sample vector e (x) in a low-dimensional token space, and the generator 120 is configured to generate a noise vector g (z) in the low-dimensional token space based on noise z. The decoder 130 is configured to decode corresponding sample data based on the vector in the low-dimensional token space. When the low-dimensional sample vector e (x) corresponding to the real sample data x is input into the decoder 130, the decoder outputs the restored sample data x'; when the noise vector g (z) is input to the decoder 130, the decoder outputs synthesized sample data s.

The discriminator 200 is used to discriminate whether the input sample data is real sample data or synthesized sample data. When the above-mentioned real sample data x is input into the discriminator 200, the discriminator may output a probability P1 that it is real data; when the above-mentioned composite data s is input to the discriminator 200, the discriminator may output a probability P2 that it is real data.

The generator 120, the decoder 130 and the discriminator 200 described above together form a generation countermeasure network GAN. Specifically, the training of the discriminator is aimed at distinguishing the true sample from the synthesized sample as much as possible, that is, it is desirable that the probability P1 be as large as possible and the probability P2 be as small as possible. The training goal of the generator together with the decoder is to generate synthetic sample data that is as spurious as possible, making it difficult for the discriminator to distinguish. Thus, the training goals of the generator and decoder are such that the reconstructed sample data x' is as close as possible to the true sample data x, while the probability P1 above is as small as possible and the probability P2 is as large as possible. Thus, the capability of the decoder to generate the synthetic data is gradually improved through the countertraining of the decoder and the discriminator.

Further, to enhance the privacy security of the model, differential privacy may be introduced in the above GAN network, in particular in the decoder 130 and the arbiter 200. Specifically, the decoder based on the differential privacy and the discriminator based on the differential privacy can be obtained by adopting gradient descent based on the differential privacy during the countermeasure training process and adding noise in the gradient. Therefore, the training sample can be prevented from being reversely deduced from the model obtained by training when the model is attacked, and the safety of private data is protected.

The following describes a specific implementation of the above concept.

FIG. 2 illustrates a flow diagram of a training method of a differential privacy based data generation system, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. The following describes a training process of the data generation system based on differential privacy, with reference to the architecture of the data generation system shown in fig. 1 and the method flow shown in fig. 2.

First, in step 21, a first true sample x is input into the coding network, resulting in a first recovered sample x'.

In different embodiments, the first real sample x may be sample data in various different forms. For example, in a picture composition scenario, the first real sample may be a picture; in a text question-answering scene, the first real sample can be a text; in a speech synthesis scenario, the first real sample may be a piece of audio. In other examples, the first reality sample may also be some business object sample, such as a user sample, a merchant sample, an interaction event sample, and so on.

Typically, the first real sample x may be represented by a vector f (x), which is referred to as the first original vector. For example, when the first real sample x is a picture, the first original vector f (x) corresponds to a vector formed by pixel features in the picture; when the first real sample x is audio, the first original vector f (x) corresponds to a vector formed by audio frequency spectrum features; in other examples, the first true sample may be represented in response to obtaining the first original vector.

When the first original vector corresponding to the first real sample is input into the self-coding network, the self-coding network may perform coding and decoding processing on the first original vector, and output a first recovered sample.

Specifically, in one embodiment, the self-encoder network employs the architecture shown in fig. 1, which includes an encoder 110, a generator 120, and a decoder 130. In such a case, in step 21, a first original vector f (x) corresponding to the first real sample x is input into the encoder 110, and the encoder 110 performs a dimension reduction process on the first original vector f (x) to obtain a first feature vector e (x) in the token space K after the dimension reduction. The first feature vector e (x) is further input to the decoder 130. The decoder 130 structure is symmetrical to the encoder 110, and its algorithm and model parameters are associated with (e.g., inverse to) the corresponding ones in the encoder 130. Therefore, the decoder 130 may restore the first true sample x according to the first feature vector e (x), and output a first restored sample x'.

Fig. 3 shows a schematic structural diagram of an encoder and a decoder according to an embodiment. As shown in fig. 3, the encoder 110 and the decoder 130 may each be implemented as a multi-layer perceptron, which includes a plurality of neural network layers. In contrast, in the encoder 110, the number of neurons in each layer decreases gradually layer by layer, that is, each layer dimension decreases gradually layer by layer, so that the dimension is compressed layer by layer for the input first original vector f (x), and a first feature vector e (x), also called a characterization vector, in the characterization space K is output to the output layer. The dimension D of the characterization space K is far smaller than the dimension D of the first original vector of the input, so that the dimension reduction of the input original vector is realized. For example, a first original vector of several hundred dimensions may be compressed into a coded vector of several tens of dimensions, or even several dimensions.

In the decoder 130, the number of neurons in each layer is increased layer by layer, that is, the dimension of each layer is increased layer by layer, so as to restore the dimension of the first feature vector e (x) in the low dimension layer by layer, and a vector having the same dimension as the first original vector f (x) is obtained in the output layer and is used as the restoration vector of the first restoration sample x'.

It can be understood that the token vector (e.g. the first token vector e (x)) in the token space K performs dimensionality reduction on the input original vector (e.g. the first original vector f (x)), and the lower the information loss of the dimensionality reduction operation, or the higher the information content of the token vector in the token space K, the easier the decoder can restore the input real sample, that is, the higher the similarity between the restored sample and the real sample. This property can be used later to assist in training the self-encoding network.

It is to be understood that although the above describes exemplary structures of the encoder and decoder, the specific implementations thereof may be varied. For example, when processing picture sample data, the encoder may further include a number of convolutional layers, the decoder includes a number of anti-convolutional layers, and so on. The specific design of the encoder and decoder may have many variations depending on the form of the sample data, and is not limited herein.

Through the method, the self-coding network restores the input first real sample to obtain a first restored sample. Next, at step 22, a sample reconstruction loss Lr is determined based on the comparison of the first true sample and the first recovered sample.

In one embodiment, the first original vector f (x) corresponding to the first real sample x and the first restored vector corresponding to the first restored sample x may be compared to obtain a vector distance between the two vectors, such as a euclidean distance, a cosine distance, and the like. The sample reconstruction penalty Lr can then be determined as positively correlating to the vector distance. That is, the smaller the vector distance between the first original vector and the first restored vector, the smaller the data difference, and the smaller the sample reconstruction loss.

In another embodiment, the first real sample and the first restored sample may be compared to obtain the similarity therebetween. For example, the similarity may be determined from a dot product between the first original vector and the first restored vector. From this, the sample reconstruction loss Lr can also be determined so as to be inversely correlated with the above-described similarity. That is, the greater the similarity, the less the sample reconstruction loss.

The above determined sample reconstruction loss Lr can be used to measure the reconstruction capability of the self-coding network, particularly the decoder therein, for the samples, and thus used to train the self-coding network.

On the other hand, in step 23, a first synthesized sample is generated by the self-encoding network.

In one embodiment, the self-encoding network employs the architecture shown in fig. 1, which includes an encoder 110, a generator 120, and a decoder 130. In such a case, in step 23, a second eigenvector g (z) simulating the real eigenvector is generated in the aforementioned token space K by the generator 120; then, the second feature vector g (z) is input to the decoder 130 to obtain the first synthesized data s.

In one embodiment, the generator 120 obtains a data distribution of the characterization vectors of the plurality of real samples output by the encoder 110, and samples the data distribution with a certain probability in the data distribution space, thereby generating the second feature vector g (z). In another embodiment, a noise signal is input to the generator 120, and the generator 120 generates the second feature vector g (z) in the above-mentioned characterization space K based on the noise signal.

The second feature vector g (z) generated in the above manner can be used to simulate the feature vector of the real sample in the feature space K. Therefore, when the second eigenvector g (z) is input into the decoder 130, the decoder 130 may decode it as the true eigenvector e (x) is processed, so as to obtain the synthesized sample s with the same form as the true sample data.

It is to be understood that the above-mentioned step 23, and the aforementioned steps 21-22, may be performed in any reasonable relative order, e.g. in parallel, before or after.

Then, in step 24, the first true sample x and the first synthetic sample s are respectively input into the discriminator, so as to respectively obtain a first probability P1 that the first true sample belongs to the true sample, and a second probability P2 that the first synthetic sample s belongs to the true sample.

It is to be understood that the discriminator is used to distinguish whether the input sample data is a real sample or a synthesized sample. Specifically, the discriminator outputs the prediction probability to give the discrimination result. Typically, the arbiter outputs a probability that the sample data is a true sample. In this case, the first probability P1 is an output probability of the discriminator after the first true sample x is input to the discriminator; the second probability P2 is the output probability of the discriminator after the first synthesized sample s is input to the discriminator.

In another example, the discriminator may also output the probability that the sample data is a synthesized sample. In such a case, the first probability P1 can be understood as 1-P1 ', where P1' is the output probability of the discriminator for the first true sample x; the second probability P2 can be understood as 1-P2 ', where P2' is the output probability of the discriminator for the first synthetic sample s.

Based on the sample reconstruction loss Lr obtained in step 22, and the first probability P1 and the second probability P2 obtained in step 24, a first prediction loss L1 for training the discriminant and a second prediction loss L2 for training the self-coding network can be determined, respectively.

It is understood that the training goal of the discriminator is to discriminate the real samples and the synthesized samples as much as possible, and therefore, it is desirable for the discriminator that the above first probability P1 be as large as possible and the second probability P2 be as small as possible. Therefore, the first prediction loss L1 can be set to be negatively correlated with the first probability P1 and positively correlated with the second probability P2. In this way, the direction in which the first prediction loss L1 decreases is the direction in which the second probability P2 decreases to increase the first probability P1.

More specifically, in one embodiment, the first predicted loss may be set to:

L1＝-∑_ilog(P1)-∑_jlog(1-P2) (1)

wherein i is a real sample, P1 is a first probability corresponding to each real sample, j is a synthesized sample, and P2 is a second probability corresponding to each synthesized sample.

On the other hand, the training target of the self-coding network is to reconstruct a more closely restored sample for a true sample and to make the discriminator unable to distinguish between the true sample and the synthesized sample generated by the decoder, and therefore, it is desirable for the self-coding network that the aforementioned sample reconstruction loss Lr be as small as possible, and that the above first probability P1 be as small as possible and the second probability P2 be as large as possible. Therefore, the second prediction loss L2 can be set to be positively correlated with the sample reconstruction loss and the first probability P1 and negatively correlated with the second probability P2. In this way, the direction in which the second prediction loss L2 decreases, that is, the direction in which the first probability P1 decreases and the second probability P2 increases to decrease the sample reconstruction loss.

More specifically, in one embodiment, the second predicted loss may be set to:

L2＝Lr-∑_ilog(1-P1)-∑_jlog(P2) (2)

in this way, in the above manner, the first prediction loss for the discriminator and the second prediction loss for the self-coding network are obtained. As can be seen from the above definition of the first prediction loss L1 and the second prediction loss L2, the training targets of the self-coding network and the discriminator form a countermeasure. Next, based on the first and second predicted losses, a parameter gradient may be determined that reduces the losses, thereby training the discriminator and the self-encoding network, respectively.

Innovatively, in the embodiments of the present description, noise is added to the gradient in a differential privacy manner during the training process, and the data generation system is trained according to the gradient containing the noise. That is, in step 25, noise is added to the gradient obtained with the goal of reducing the first prediction loss L1, using the difference privacy method, for the first parameter corresponding to the discriminator, and the first parameter is adjusted based on the obtained first noise gradient; in step 26, noise is added to the gradient obtained with the goal of reducing the second prediction loss L2 in a differential privacy manner for the second parameter corresponding to the self-encoding network, and the second parameter is adjusted based on the obtained second noise gradient. In this way, the property of differential privacy is introduced in the discriminator and the self-encoding network, respectively.

Differential privacy (differential privacy) is a means in cryptography that aims to provide a way to maximize the accuracy of data queries while minimizing the chances of identifying their records when querying from a statistical database. A random algorithm M is provided, and PM is a set formed by all possible outputs of M. For any two adjacent data sets D and D' and any subset SM of PMs, if the random algorithm M satisfies: pr [ M (D) ∈ SM ] < ∈ × Pr [ M (D') ∈ SM ], then algorithm M is said to provide epsilon-differential privacy protection, where the parameter epsilon is called the privacy protection budget, for balancing the degree and accuracy of privacy protection. ε may be generally predetermined. The closer epsilon is to 0 and the closer e epsilon is to 1, the closer the processing results of the random algorithm on the two neighboring data sets D and D', the stronger the degree of privacy protection.

Implementations of differential privacy include noise mechanisms, exponential mechanisms, and the like. To introduce differential privacy in a data generation system, differential privacy is here implemented by adding noise in the parameter gradient, using a noise mechanism, according to embodiments of the present description. Depending on the noise scheme, the noise may be embodied as laplacian noise, gaussian noise, or the like. According to one embodiment, in this step 25, differential privacy is introduced in the discriminator by adding gaussian noise in the gradient determined based on the first prediction loss. The specific process may include the following steps.

First, for the first parameter corresponding to the discriminator, a first original gradient for reducing the first prediction loss L1 may be determined according to the first prediction loss L1; then, based on a preset cutting threshold value, cutting the first original gradient to obtain a first cutting gradient; then, a first Gaussian noise for realizing the difference privacy is determined by utilizing the Gaussian distribution determined based on the first clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the first clipping threshold; then, the first gaussian noise thus obtained is superimposed on the first clipping gradient to obtain a first noise gradient, which is used to update the first parameter of the discriminator.

More specifically, as an example, assume that for a training set X consisting of a first real sample X and a first synthetic sample s, a first raw gradient obtained for the discriminator is:

wherein, L1(θ)_DX) represents the first prediction loss, θ_DIs the parameter in the discriminator, i.e. the first parameter.

As mentioned above, the addition of noise for implementing differential privacy to the original gradient may be implemented by means such as laplacian noise, gaussian noise, or the like. In an embodiment, for example, gaussian noise is taken as an example, gradient clipping may be performed on an original gradient based on a preset clipping threshold to obtain a clipping gradient, gaussian noise for implementing differential privacy is determined based on the clipping threshold and a predetermined noise scaling coefficient (a preset super parameter), and then the clipping gradient and the gaussian noise are fused (e.g., summed) to obtain a gradient including noise. It can be understood that this way, on one hand, performs clipping on the original gradient, and on the other hand, superimposes the clipped gradients, thereby performing differential privacy processing satisfying gaussian noise on the gradient.

For example, the first raw gradient is gradient clipped to:

wherein,

representing the clipped gradient, i.e. the first clipping gradient, C1 representing a first clipping threshold, | g_D(X)‖₂Denotes g_D(X) second order norm. That is, in the case where the original gradient is less than or equal to the clipping threshold C1, the original gradient is retained, and in the case where the original gradient is greater than the clipping threshold C1, the original gradient is clipped to a corresponding size in a proportion greater than the clipping threshold C1.

Adding a first gaussian noise to the first clipping gradient to obtain a first noise gradient containing noise, for example:

wherein,

representing a first noise gradient;

representing the probability density coincidence with 0 as mean, σ²C1²I is a first gaussian noise of a gaussian distribution of variances; sigma represents the noise scaling coefficient, is a preset hyper parameter and can be set as required; c1 isThe first clipping threshold; indicating function, can take 0 or 1, for example, it can be set that even rounds in a plurality of rounds of training take 1, and odd rounds take 0.

Then, the first parameter θ of the discriminator may be set to minimize the prediction loss L1 using the first noise gradient to which the gaussian noise is added_DThe adjustment is as follows:

where η represents a learning step length or a learning rate, and is a predetermined hyper-parameter, for example, 0.5 or 0.3. And under the condition that the difference privacy is satisfied by gradient-added Gaussian noise, the adjustment of the model parameters of the discriminator satisfies the difference privacy.

On the other hand, for the self-encoding network, the parameters of the self-encoding network can be adjusted in a differential privacy manner by adding noise in the gradient in a similar manner at step 26. In particular, in one embodiment, the second parameter θ for the self-encoding network_ADetermining a second original gradient g which reduces the aforementioned second prediction loss L2_A(X), for example:

then, based on a preset second clipping threshold value C2, clipping is performed on the second original gradient to obtain a second clipping gradient

The clipping manner is similar to the above equation (4), wherein the second clipping threshold C2 is set independently from the first clipping threshold C1, and may be the same or different. Next, a second Gaussian noise for implementing differential privacy is determined using a second Gaussian distribution determined based on a second clipping threshold

Making a second Gaussian noiseThe sound and the second cutting gradient are superposed to obtain a second noise gradient

Accordingly, the corresponding second parameter of the self-encoding network may be adjusted according to the second noise gradient.

The above describes the way of adding gaussian noise in the second original gradient for the self-encoding network, thereby adjusting the second parameter. Further, in one embodiment, as shown in fig. 1, the self-coding network further includes an encoder 110, a generator 120 and a decoder 130, and accordingly, the second parameter can be divided into an encoder parameter, a generator parameter and a decoder parameter, and each part of the parameters corresponds to each part of the original parameter gradient. The noise may be added to the second original gradient, the noise may be added to the original parameter gradients of each portion, or the noise may be added to only some of the original parameter gradients, for example, the original parameter gradients corresponding to the decoder.

Specifically, in one embodiment, each original gradient of the parameters for each parameter portion in the self-encoding network may be determined by gradient back propagation in step 26, including a third original gradient corresponding to the decoder parameters, a fourth original gradient corresponding to the encoder parameters, and a fifth original gradient corresponding to the generator parameters, respectively.

Then, noise is added to the third original gradient, the fourth original gradient and the fifth original gradient respectively in a differential privacy mode, and a corresponding third noise gradient, a fourth noise gradient and a fifth noise gradient are obtained. In the manner of adding noise, the process of adding gaussian noise described above may be referred to. Then, the decoder parameters may be adjusted using the third noise gradient; adjusting the encoder parameters by using the fourth noise gradient; using the fifth noise gradient, the generator parameters are adjusted. In this way, differential privacy features are introduced in the self-encoding network.

According to another embodiment, in step 26, after determining a third original gradient corresponding to the decoder parameters, a fourth original gradient corresponding to the encoder parameters, and a fifth original gradient corresponding to the generator parameters, respectively, by gradient backpropagation, noise is added thereto in a differential privacy manner only for the third original gradient, resulting in a corresponding third noise gradient. Then, using the third noise gradient, decoder parameters are adjusted, thereby introducing differential privacy properties in the decoder. For the encoder and the generator, the corresponding original parameter gradient can be used for updating, namely, the fourth original gradient is used for adjusting the encoder parameter; using the fifth raw gradient, generator parameters are adjusted.

It is to be understood that the decoder is a core module in the self-coding network, through which the real samples are restored and through which the synthesized samples are generated. Therefore, the differential privacy is introduced into the decoder, so that the whole self-coding network has the differential privacy characteristic, and the effect of enabling the whole data generation system to have the differential privacy characteristic can also be achieved.

It should be noted that, in actual operation, the training of the discriminator in step 25 and the training of the self-coding network in step 26 may be performed in alternate iterations. For example, a sample set including real samples and generated samples is used, m times of iterative updating are performed on the discriminator, then n times of iterative updating are performed on the self-coding network, and the steps are repeated. The updating sequence and the iteration mode of the discriminator and the self-coding network are not limited herein.

After the arbiter and the self-coding network are repeatedly updated in the above manner until a predetermined end condition is reached (e.g., a predetermined number of iterations, convergence of parameters, etc.), a trained data generation system can be obtained. When the data generation system is used for generating the sample data, the noise vector is generated only by using the generator in the sample data, and the decoder is used for decoding, so that the synthetic sample data simulating the real sample can be obtained.

Reviewing the above process, the generative model in conventional GAN is implemented by a self-encoding network that can be trained with the aid of an encoding process that restores real samples, resulting in synthetic data that highly simulates real samples. In the training process, differential privacy is introduced into the self-coding network and the discriminator respectively through a gradient descending mode of the differential privacy, and the data generation system with the differential privacy characteristic is obtained. Due to the introduction of differential privacy, the information of the training samples is difficult to reverse-deduce or identify based on the disclosed model, and privacy protection is provided for the model. In this way, a more efficient and secure data generation system is obtained.

According to an embodiment of another aspect, there is also provided a training apparatus of a data generation system based on differential privacy, the data generation system including a self-coding network and a discriminator, the training apparatus may be deployed in any apparatus, device, platform, device cluster having computing and processing capabilities. FIG. 4 shows a schematic block diagram of a training apparatus of a data generation system according to one embodiment. As shown in fig. 4, the training apparatus 400 includes:

a restored sample obtaining unit 41 configured to input the first true sample into the self-coding network, so as to obtain a first restored sample;

a reconstruction loss determining unit 42 configured to determine a sample reconstruction loss according to a comparison between the first real sample and the first restored sample;

a synthesized sample obtaining unit 43 configured to generate a first synthesized sample through the self-encoding network;

a probability obtaining unit 44 configured to input the first real sample into the discriminator to obtain a first probability that the first real sample belongs to the real sample; inputting the first synthesized sample into the discriminator to obtain a second probability that the first synthesized sample belongs to a real sample;

a first parameter adjusting unit 45, configured to add noise to a gradient obtained with a goal of reducing a first prediction loss in a manner of differential privacy with respect to a first parameter corresponding to the discriminator, and adjust the first parameter according to the obtained first noise gradient, wherein the first prediction loss is negatively related to the first probability and positively related to the second probability;

a second parameter adjusting unit 46 configured to add noise to a gradient obtained for reducing a second prediction loss positively correlated with the sample reconstruction loss, positively correlated with the first probability, and negatively correlated with the second probability, in a manner of differential privacy, with respect to a second parameter corresponding to the self-coding network, and adjust the second parameter according to a second noise gradient obtained.

According to one embodiment, the self-encoding network includes an encoder, a generator, and a decoder. In such a case, the recovery sample acquiring unit 41 may be configured to: inputting a first original vector corresponding to the first real sample into the encoder to obtain a first feature vector reduced to a first representation space; inputting the first feature vector into the decoder, resulting in the first recovered sample; the synthetic sample acquiring unit 43 may be configured to: generating, by the generator, a second feature vector in the first token space; and inputting the second feature vector into the decoder to obtain the first synthetic data.

According to an embodiment, the reconstruction loss determination unit 42 is specifically configured to: determining a vector distance between a first original vector corresponding to the first real sample and a first restored vector corresponding to the first restored sample; determining the sample reconstruction loss as being positively correlated to the vector distance.

In an embodiment, the first parameter adjusting unit 45 is specifically configured to: determining, for the first parameter, a first original gradient that reduces the first prediction loss; based on a preset first cutting threshold value, cutting the first original gradient to obtain a first cutting gradient; determining a first Gaussian noise for implementing differential privacy using a first Gaussian distribution determined based on the first clipping threshold; and superposing the first Gaussian noise and the first cutting gradient to obtain the first noise gradient.

Similarly, the second parameter adjusting unit 46 may be specifically configured to: determining, for the second parameter, a second original gradient that reduces the second prediction loss; based on a preset second cutting threshold value, cutting the second original gradient to obtain a second cutting gradient; determining a second Gaussian noise for implementing differential privacy using a second Gaussian distribution determined based on the second clipping threshold; and superposing the second Gaussian noise and the second cutting gradient to obtain the second noise gradient.

More specifically, in one embodiment, the second parameters specifically include an encoder parameter, a generator parameter, and a decoder parameter. In an example, the second parameter adjusting unit 46 is specifically configured to: determining, by gradient backpropagation, a third original gradient corresponding to the decoder parameters, a fourth original gradient corresponding to the encoder parameters, and a fifth original gradient corresponding to the generator parameters, respectively; respectively adding noise in the third original gradient, the fourth original gradient and the fifth original gradient by using a differential privacy mode to obtain a corresponding third noise gradient, a corresponding fourth noise gradient and a corresponding fifth noise gradient; adjusting the decoder parameters using the third noise gradient; adjusting the encoder parameters using the fourth noise gradient; adjusting the generator parameter using the fifth noise gradient.

In another example, the second parameter adjusting unit 46 is specifically configured to: determining, by gradient backpropagation, a third original gradient corresponding to the decoder parameters, a fourth original gradient corresponding to the encoder parameters, and a fifth original gradient corresponding to the generator parameters, respectively; adding noise in the third original gradient by using a differential privacy mode to obtain a corresponding third noise gradient; adjusting the decoder parameters using the third noise gradient; adjusting the encoder parameters using the fourth original gradient; adjusting the generator parameter using the fifth raw gradient.

It should be noted that the apparatus 400 shown in fig. 4 is an apparatus embodiment corresponding to the method embodiment shown in fig. 2, and the corresponding description in the method embodiment shown in fig. 2 is also applicable to the apparatus 400, and is not repeated herein.

According to an embodiment of a further aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments are intended to explain the technical idea, technical solutions and advantages of the present specification in further detail, and it should be understood that the above-mentioned embodiments are merely specific embodiments of the technical idea of the present specification, and are not intended to limit the scope of the technical idea of the present specification, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the present specification should be included in the scope of the technical idea of the present specification.

Claims

1. A training method of a data generation system based on differential privacy comprises a self-coding network and a discriminator, wherein the self-coding network comprises an encoder, a generator and a decoder; the method comprises the following steps:

inputting a first original vector corresponding to a first real sample into the encoder to obtain a first feature vector reduced to a first representation space; inputting the first feature vector into the decoder to obtain a first restored sample;

generating, by the generator, a second feature vector in the first characterization space based on a noise signal; inputting the second feature vector into the decoder to obtain a first synthesized sample;

2. The method of claim 1, wherein the encoder is implemented as a first multi-layered perceptron with progressively decreasing numbers of neurons at each layer; the decoder is implemented as a second multi-layered perceptron, with the number of neurons in each layer increasing layer by layer.

3. The method of claim 1, wherein determining a sample reconstruction loss comprises:

determining a vector distance between a first original vector corresponding to the first real sample and a first restored vector corresponding to the first restored sample;

determining the sample reconstruction loss as being positively correlated to the vector distance.

4. The method of claim 1, wherein adding noise to a gradient targeted to reduce the first prediction penalty, adjusting the first parameter based on the resulting first noise gradient, comprises:

determining, for the first parameter, a first original gradient that reduces the first prediction loss;

based on a preset first cutting threshold value, cutting the first original gradient to obtain a first cutting gradient;

determining a first Gaussian noise for implementing differential privacy using a first Gaussian distribution determined based on the first clipping threshold;

and superposing the first Gaussian noise and the first cutting gradient to obtain the first noise gradient.

5. The method of claim 1, wherein adding noise to a gradient targeted to reduce a second prediction penalty, adjusting the second parameter based on the resulting second noise gradient comprises:

determining, for the second parameter, a second original gradient that reduces the second prediction loss;

based on a preset second cutting threshold value, cutting the second original gradient to obtain a second cutting gradient;

determining a second Gaussian noise for implementing differential privacy using a second Gaussian distribution determined based on the second clipping threshold;

and superposing the second Gaussian noise and the second cutting gradient to obtain the second noise gradient.

6. The method of claim 1, wherein the second parameters include, encoder parameters, generator parameters, and decoder parameters; adding noise to a gradient obtained with the goal of reducing the second prediction loss, and adjusting the second parameter according to the obtained second noise gradient, including:

determining, by gradient backpropagation, a third original gradient corresponding to the decoder parameters, a fourth original gradient corresponding to the encoder parameters, and a fifth original gradient corresponding to the generator parameters, respectively;

respectively adding noise in the third original gradient, the fourth original gradient and the fifth original gradient by using a differential privacy mode to obtain a corresponding third noise gradient, a corresponding fourth noise gradient and a corresponding fifth noise gradient;

adjusting the decoder parameters using the third noise gradient; adjusting the encoder parameters using the fourth noise gradient; adjusting the generator parameter using the fifth noise gradient.

7. The method of claim 1, wherein the second parameters include, encoder parameters, generator parameters, and decoder parameters; adding noise to a gradient obtained with the goal of reducing the second prediction loss, and adjusting the second parameter according to the obtained second noise gradient, including:

adding noise in the third original gradient by using a differential privacy mode to obtain a corresponding third noise gradient;

adjusting the decoder parameters using the third noise gradient; adjusting the encoder parameters using the fourth original gradient; adjusting the generator parameter using the fifth raw gradient.

8. The method of claim 1, wherein the first real sample comprises one of: picture samples, audio samples, text samples, business object samples.

9. A training device of a data generation system based on differential privacy, the data generation system comprises a self-coding network and a discriminator, the self-coding network comprises an encoder, a generator and a decoder; the device comprises:

a restored sample obtaining unit configured to input a first original vector corresponding to a first real sample into the encoder to obtain a first feature vector reduced to a first characterization space; inputting the first feature vector into the decoder to obtain a first restored sample;

a synthetic sample acquisition unit configured to generate, by the generator, a second feature vector in the first characterization space based on a noise signal; inputting the second feature vector into the decoder to obtain a first synthesized sample;

10. The apparatus of claim 9, wherein the encoder is implemented as a first multi-layered perceptron with a decreasing number of neurons per layer, layer by layer; the decoder is implemented as a second multi-layered perceptron, with the number of neurons in each layer increasing layer by layer.

11. The apparatus of claim 9, wherein the reconstruction loss determination unit is configured to:

12. The apparatus of claim 9, wherein the first parameter adjustment unit is configured to:

13. The apparatus of claim 9, wherein the second parameter adjustment unit is configured to:

14. The apparatus of claim 9, wherein the second parameters comprise an encoder parameter, a generator parameter, and a decoder parameter;

the second parameter adjustment unit is configured to:

15. The apparatus of claim 9, wherein the second parameters comprise an encoder parameter, a generator parameter, and a decoder parameter;

the second parameter adjustment unit is configured to:

16. The apparatus of claim 9, wherein the first true sample comprises one of: picture samples, audio samples, text samples, business object samples.

17. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-8.

18. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-8.