CN113642029B

CN113642029B - Method and system for measuring correlation between data sample and model decision boundary

Info

Publication number: CN113642029B
Application number: CN202111188034.4A
Authority: CN
Inventors: 王琛; 刘高扬; 田泽豪; 彭凯
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-10-12
Filing date: 2021-10-12
Publication date: 2021-12-24
Anticipated expiration: 2041-10-12
Also published as: CN113642029A

Abstract

The invention discloses a method and a system for measuring the relevance of a data sample and a model decision boundary, belonging to the field of data protection of the Internet of things, wherein the method comprises the following steps: after an input sample of a model to be evaluated is obtained from the Internet of things, an initial confrontation sample is generated at a model decision boundary, gradient estimation is carried out to obtain a normal vector perpendicular to the decision boundary, the correlation between a difference vector from the input sample to the initial confrontation sample and the normal vector is solved, the sample on the decision boundary is updated, and finally the distance matrix from each sample to each model decision boundary in the deep learning training process is obtained by calculating the distance between the final sample and the input sample, so that the correlation between each data sample and the model decision boundary is measured. Therefore, under the condition that the internal information of the model does not need to be deeply learned and the training flow of the model is not modified, the privacy protection of the data can be realized, and the method has high practicability and universality.

Description

Method and system for measuring correlation between data sample and model decision boundary

Technical Field

The invention belongs to the field of data protection of the Internet of things, and particularly relates to a method and a system for measuring correlation between a data sample and a model decision boundary.

Background

With the increase of the data volume of the internet of things and the improvement of computing power of computing equipment, the deep learning technology is widely applied. However, the current deep learning technology requires a large amount of data for training, so that the current deep learning model faces serious problems of data security and privacy protection. For example, most companies adopt a centralized learning mode to train a model, large-scale collection of data information of users is required, but no uniform standard exists for privacy protection of the users, and an attacker can shift a decision boundary of the model by modifying, deleting or injecting bad data, so that wrong prediction is generated. With the coming of the general data protection regulation, the data privacy protection and security of users are improved to some extent, but the privacy protection of data samples in the deep learning model still faces great challenges. Accurately characterizing and measuring the correlation between the data samples and the model decision boundary can provide technical and theoretical support for evaluating the safety of the deep learning model and the privacy of data.

At present, researchers at home and abroad carry out systematic and deep research on the correlation between data and models in deep learning, but existing research works all have certain defects and problems: 1. most of the existing research works are to evaluate the relevance between the model and the data on the premise of knowing the internal parameters and the training settings of the deep learning model. However, in practical scenarios, in order to secure the model and training data, the model owner typically only discloses the prediction interface of the model for use by the evaluator. Therefore, most of the existing works cannot be used in actual scenes; 2. part of the metric work needs to use different training data combinations to retrain the deep learning model, and then the correlation between the multiple evaluation models and the data is obtained. The training overhead of the method is obviously increased along with the increase of the data volume, so that the application of the method in an actual scene is greatly reduced; 3. part of the research work evaluated the relationship between the model and the data using challenge samples. However, most of the existing countermeasure sample generation techniques only focus on disturbance magnitude control of the countermeasure samples, and ignore the geometric association between the samples and the decision boundary. The obtained confrontation sample cannot accurately represent the decision boundary, so that the result of the correlation analysis is deviated. 4. The existing research works are all static analysis works, namely, a decision boundary of a model after training is analyzed, and the change of the relation between the decision boundary and data in the whole training process is ignored.

In summary, how to dynamically evaluate the correlation between the data sample and the model decision boundary under the condition of only the deep learning model black box prediction interface is an urgent problem to be solved for the privacy and security of deep learning.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a method and a system for measuring correlation between a data sample and a model decision boundary, so as to evaluate correlation between the data sample and the model in a training process, thereby implementing privacy protection of data, and having extremely high practicability and universality.

In order to achieve the above object, the present invention provides a method for measuring correlation between data samples and model decision boundaries, comprising the following steps: s1, obtaining an input sample of the model to be evaluated from the Internet of things, and adding Gaussian noise to the input sample to generate an initial confrontation sample at a model decision boundary; s2, calculating a normal vector of the initial confrontation sample on the model decision boundary and a difference vector from the input sample to the initial confrontation sample, and calculating an angle difference loss between a unit vector of the difference vector and a unit vector of the normal vector; s3, calculating an updated challenge sample with the angular difference loss as a loss function for updating the initial challenge sample; s4, projecting the updated confrontation sample onto the model decision boundary, and taking the projected sample as an initial confrontation sample for the next iteration; s5, repeating the steps S2 to S4 until the convergence of the loss function or the iteration turns reach the set times, and obtaining a final confrontation sample; calculating the distance between the input sample and the final confrontation sample as the distance between the input sample and a model decision boundary; and S6, taking the final confrontation sample as an initial confrontation sample, repeating the step S5 to start the next round of model training, sequentially calculating the distance from the input sample to the model decision boundary in each round of model training process to obtain a distance matrix, and measuring the correlation between the input sample and the model decision boundary according to the distance matrix.

Further, in S1, generating an initial confrontation sample at a model decision boundary by adding gaussian noise to the input sample, including: adding multiple groups of random Gaussian noises to the input sample until a first noise which enables the model to be subjected to error classification is obtained; projecting the disturbed sample to a model decision boundary by utilizing a dichotomy to obtain an initial confrontation sample; the perturbed sample is a superposition of the input sample and a first noise.

Further, in S2, calculating a normal vector of the initial confrontation sample on the model decision boundary includes: s21, to the initial confrontation sample

To carry outBGaussian perturbation of individual direction

To obtainBA perturbed sample

，

，

Is a disturbance constant; s22, calculating a disturbance sample

Determination value

，

(ii) a Wherein the content of the first and second substances,

for disturbing the sample

Is antagonistic, and

，

is a sample

The real label of (a) is,

is a sample

The prediction tag of (a) is determined,

representing the probability that the model predicts as a true tag,

representing the maximum probability value that the model predicts as a non-genuine tag; s23, the initial challenge sample

Normal vectors at decision boundaries of the model

Expressed as:

。

further, in S2, the angle difference loss is expressed as:

wherein the content of the first and second substances,

representing the input samples in question, and,

the inner product is represented by the sum of the two,

representing a two-norm.

Further, the S3 includes: s31, taking the inverse of the angular difference loss as the loss function; s32, obtaining an update direction of the initial confrontation sample by using a monte carlo gradient estimation method, and calculating an updated confrontation sample by using a first-order gradient optimization method.

Further, the S4 includes: confrontation sample after the update

And input samples

On the connection line of (2), by searching

Will satisfy

Of (2) a sample

As an initial confrontation sample for the next iteration; wherein the content of the first and second substances,

，

。

further, in S5, calculating a distance between the input sample and the final confrontation sample as a distance between the input sample and the model decision boundary includes: and calculating the norm value of the input sample and the final confrontation sample as the distance between the input sample and the decision boundary of the model.

Further, in the step S6, if the models are performed togetherKRound of training, the principleKThe distance from the input sample to the model decision boundary in the round of training is expressed asD ^T= [d ₁, d ₂, …, d _K](ii) a Wherein the content of the first and second substances,d _kis shown askThe distance between the input sample and the final confrontation sample in the training process is calculated,k=1,2,…,K(ii) a Get firstkIn the round training process, before obtaining the final confrontation sampleUAll challenge samples generated within a sub-iteration and calculating each studentThe distance between the confrontation sample and the input sample is expressed asD _k= [d ^T-U, d ^T-U+1, …, d ^T](ii) a Wherein the content of the first and second substances,d ^uis shown askInputting samples and the second in the round training processuThe distance of the challenge sample generated by the sub-iteration,u=T-U,T-U+1,…,T，Tis as followskObtaining the total iteration times of the final confrontation sample in the round training process;Kinputting a distance matrix of the sample and a model decision boundary after the round training is finishedDExpressed as:D=

(ii) a Wherein the content of the first and second substances,

is shown inkIn the course of round traininguThe distance of the challenge sample generated by the sub-iteration from the input sample.

To achieve the above object, the present invention further provides a system for measuring correlation between data samples and model decision boundaries, comprising: the system comprises a data initial module, a model decision boundary evaluation module and a data analysis module, wherein the data initial module is used for acquiring an input sample of a model to be evaluated from the Internet of things and generating an initial confrontation sample at the model decision boundary by adding Gaussian noise to the input sample; the difference calculation module is used for calculating a normal vector of the initial confrontation sample on the model decision boundary and a difference vector from the input sample to the initial confrontation sample, and calculating an angle difference loss between a unit vector of the difference vector and a unit vector of the normal vector; a data updating module for calculating an updated confrontation sample with the angle difference loss as a loss function for updating the initial confrontation sample; projecting the updated confrontation sample to the model decision boundary, and taking the projected sample as an initial confrontation sample of the next iteration; the distance calculation module is used for repeatedly executing the operations of the difference calculation module and the data updating module until the convergence of the loss function or the iteration turns reach a set number of times, and obtaining a final confrontation sample; calculating the distance between the input sample and the final confrontation sample as the distance between the input sample and a model decision boundary; and the correlation measurement module is used for taking the final confrontation sample as an initial confrontation sample, repeatedly executing the operation of the distance calculation module to start next round of model training, sequentially calculating the distance from the input sample to a model decision boundary in each round of model training to obtain a distance matrix, and measuring the correlation between the input sample and the model decision boundary according to the distance matrix.

Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:

(1) after an input sample of a model to be evaluated is obtained from the Internet of things, an initial confrontation sample is generated at a model decision boundary, gradient estimation is carried out to obtain a normal vector perpendicular to the decision boundary, the correlation between a difference vector from the input sample to the initial confrontation sample and the normal vector is solved, the sample on the decision boundary is updated, and finally the distance matrix from each sample to each model decision boundary in the deep learning training process is obtained by calculating the distance between the final sample and the input sample, so that the correlation between each data sample and the model decision boundary is measured. Therefore, under the condition that the internal information of the model does not need to be deeply learned and the training flow of the model is not modified, the privacy protection of the data can be realized, and the method has high practicability and universality.

(2) The method can obtain the confrontation sample closest to the original sample, and calculate the minimum distance from the original sample to the decision boundary of the model, thereby evaluating the robustness and stability of the model.

(3) The invention takes the correlation between the data sample and the decision boundary of the model as a loss function to update the countermeasure sample, and has better accuracy and less query times.

(4) The invention can acquire all data meeting the confrontation conditions within a certain range, thereby better judging the stability of the model, realizing the privacy protection of the model and having generalization capability.

(5) The invention can capture the change of the decision boundary in the whole training process of the model and calculate the distance between the data sample and the decision boundary in real time, thereby evaluating the safety of the model more effectively.

Drawings

Fig. 1 is a flowchart of a method for measuring correlation between data samples and model decision boundaries according to an embodiment of the present invention.

Fig. 2 is a block diagram of a system for measuring correlation between data samples and model decision boundaries according to an embodiment of the present invention.

Fig. 3 is a second block diagram of a system for measuring correlation between data samples and model decision boundaries according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

In the present application, the terms "first," "second," and the like (if any) in the description and the drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

In this embodiment, the present invention can be divided into 2 stages: a data processing stage and a correlation measurement stage. The user needs to upload a query API and a certain amount of training samples, i.e., data for training the model, of the model to be evaluated, which operates in a black-box mechanism. In the data processing stage, a model at each moment of deep learning is taken, and a data sample needing to be evaluated is selected; in the correlation measurement stage, each sample is operated one by one, firstly, an initial sample is generated at a decision boundary, gradient estimation is carried out to obtain a vector value vertical to the decision boundary, the correlation of the vector between the vector and the sample is solved, the sample on the decision boundary is updated, and finally, the distance matrix from each sample to each model decision boundary in the deep learning training process is obtained by calculating the distance between the final sample and the initial sample to measure the correlation between each data sample and the model decision boundary.

Fig. 1 is a flowchart of a method for measuring correlation between data samples and model decision boundaries according to an embodiment of the present invention. The method includes operation S1-operation S6.

Operation S1, an input sample of the model to be evaluated is obtained from the internet of things, and an initial confrontation sample is generated at a model decision boundary by adding gaussian noise to the input sample.

It should be noted that, in this embodiment, the model to be evaluated and the input sample are input by the end user, and the input sample is from the data set of the internet of things; the data set of the internet of things is a data set formed by integrating a plurality of data collected by devices such as sensors in the internet of things. For example, the model to be evaluated is an image recognition model, accordingly, data representing an image is extracted as features in the internet of things data set, and data representing an image name is extracted as a label to serve as an input sample.

Specifically, in S1, generating an initial challenge sample at a model decision boundary by adding gaussian noise to the input sample comprises:

adding multiple groups of random Gaussian noises to the input sample until a first noise which enables the model to be subjected to error classification is obtained; projecting the disturbed sample to a model decision boundary by utilizing a dichotomy to obtain an initial confrontation sample; the perturbed sample is a superposition of the input sample and a first noise.

In this embodiment, an initial input sample is given

Wherein

Which is representative of the characteristics of the input,

for class labeling, multiple sets of random Gaussian noise are added to the sample

WhereiniIs counted until noise is obtained that satisfies the criteria that cause the model to be misclassified

I.e. by

Wherein

As a predictive label of the model, i.e.

，xIn order to input the model, the model is input,

the representation model is predicted to bekThe probability of a class; using dichotomy to divide the disturbed sample

Projecting the image on a decision boundary to obtain an initial confrontation sample

。

In operation S2, a normal vector of the initial challenge sample on the model decision boundary and a difference vector of the input sample to the initial challenge sample are calculated, and an angle difference loss between a unit vector of the difference vector and a unit vector of the normal vector is calculated.

In this embodiment, operation S2 includes sub-operations S21 through S25.

In sub-operation S21, the samples are updated at the model decision boundary, through multiple iterations of steps S2 through S4, to betThe samples generated by the wheel at the model decision boundaries are recorded as

Wherein

，TAs a result of the total number of iterations,

the samples of time are the initial challenge samples generated in step S1

(ii) a For the obtained confrontation sample

Performing multi-directional Gaussian disturbance

Wherein

Is a covariance matrix, obtainBGroup perturbation samples

，

，

For perturbing constants, e.g. taking

=1.01 or 1.001.

In sub-operation S22, the disturbance samples are calculated

Determination value

，

(ii) a Wherein the content of the first and second substances,

for disturbing the sample

Is antagonistic, and

，

is a sample

The real label of (a) is,

is a sample

The prediction tag of (a) is determined,

representing the probability that the model predicts as a true tag,

representing the maximum probability value that the model predicts as a non-genuine tag;

if the prediction result output by the model is inconsistent with the original sample, then

Otherwise

。

In sub-operation S23, the decision value is taken as the direction of the perturbation vector, and the direction of each perturbation vector is averaged, so that the obtained result is the sample

Gradient values of, i.e. challenge samples

Normal vectors at model decision boundaries

Expressed as:

。

in sub-operation S24, the secondtGenerating countermeasure samples at model boundaries

And the original input sample

Vector of (2)

。

In sub-operation S25, a vector is calculated

And the normal vector

Cosine similarity between them, expressed as:

wherein, the numerator is the inner product of two vectors, and the denominator is the product of the lengths of the two vectors, which are respectively expressed by two-norm distances.

The invention expects to obtain an original input sample

To the mouldDistance of decision boundary, need to generate the sample closest to the original input sample at the decision boundary

When the sample is

And the original input sample

The closer the vector is

And

the closer the direction of the gradient of (a),

the larger. Therefore, the inverse of the similarity is used as the loss function of the updating process of the countersample, so that the updating of the data sample can be better realized, namely

. The optimization objective of the present invention is that the function can be expressed as:

Subjectto:

operation S3 calculates an updated challenge sample with the angular difference loss as a loss function for updating the initial challenge sample.

In this embodiment, operation S3 includes sub-operations S31 through S33.

In sub-operation S31, a sample is estimated by using an absolute difference method

The gradient values of (A) are:

wherein the content of the first and second substances,

in order to be the objective function, the target function,

is a normal basis vector, whereintThe number of the components is 1 and,

；

in sub-operation S32, the objective function is optimized using first order gradient optimization to obtain the best coordinate update

. Taking Adam's algorithm as an example, the moving average is updated

Square gradient of

And calculate

、

To obtain the best coordinate update

. In addition, the optimization can be performed by methods such as SGD and RMSprop.

In sub-operation S33, the countermeasure sample is performed according to the update direction obtained in step S32Updating

In the optimization process, it is necessary to make the updated samples

Satisfy the confrontation condition

。

Operation S4, project the updated confrontation sample onto the model decision boundary, and take the projected sample as the initial confrontation sample for the next iteration.

In particular, the challenge sample after updating

And input samples

On the connection line of (2), by searching

Will satisfy

Of (2) a sample

，

。

operation S5, repeat steps S2 to S4 until the loss function converges or the iteration turns reach a set number of times, and obtain a final confrontation sample; and calculating the distance between the input sample and the final confrontation sample as the distance between the input sample and the decision boundary of the model.

Specifically, the sample is mixed

Repeating the steps S2 to S4 as the initial confrontation sample of the next iteration until the loss function converges or the iteration number reaches the set number, and obtaining the final confrontation sample

(ii) a Computing raw input samples

And final confrontation sample

In betweenpNorm value as distance of final input sample to model decision boundary

。

Operation S6, taking the final confrontation sample as an initial confrontation sample, repeating step S5 to start the next round of model training, sequentially calculating distances from the input sample to model decision boundaries in each round of model training, obtaining a distance matrix, and measuring the correlation between the input sample and the model decision boundaries according to the distance matrix.

Specifically, if the models are performed togetherKRound of training, the principleKThe distance from the input sample to the model decision boundary in the round of training is expressed asD ^T= [d ₁, d ₂, …, d _K](ii) a Wherein the content of the first and second substances,d _kis shown askThe distance between the input sample and the final confrontation sample in the training process is calculated,k=1,2,…,K；

get firstkIn the round training process, before obtaining the final confrontation sampleUAll the challenge samples generated in the sub-iteration are calculated, and the distance between each generated challenge sample and the input sample is expressed asD _k= [d ^T-U, d ^T-U+1, …, d ^T](ii) a Wherein the content of the first and second substances,d ^uis shown askInputting samples and the second in the round training processuThe distance of the challenge sample generated by the sub-iteration,u=T-U,T-U+1,…,T，Tis as followskObtaining the total iteration times of the final confrontation sample in the round training process;

Kinputting a distance matrix of the sample and a model decision boundary after the round training is finishedDExpressed as:D=

(ii) a Wherein the content of the first and second substances,

It should be noted that the distance matrix represents only the correlation measure between the data sample and one decision boundary, and for a multi-class model, the present invention can generate a corresponding distance matrix for the decision boundary of each class, which represents the correlation measure between the data sample and all the decision boundaries of the model.

Fig. 2 is a block diagram of a system for measuring correlation between data samples and model decision boundaries according to an embodiment of the present invention. Referring to fig. 2, the system 200 includes a data initialization module 210, a difference calculation module 220, a data update module 230, a distance calculation module 240, and a correlation metric module 250.

The data initialization module 210, for example, performs operation S1, configured to obtain an input sample of a model to be evaluated from the internet of things, and generate an initial confrontation sample at a model decision boundary by adding gaussian noise to the input sample;

the difference calculating module 220 performs, for example, operation S2, to calculate a normal vector of the initial confrontation sample on the model decision boundary and a difference vector of the input sample to the initial confrontation sample, and calculate an angle difference loss between a unit vector of the difference vector and a unit vector of the normal vector;

the data update module 230 performs, for example, operations S3 and S4, for calculating an updated confrontation sample with the angular difference loss as a loss function for updating the initial confrontation sample; projecting the updated confrontation sample to the model decision boundary, and taking the projected sample as an initial confrontation sample of the next iteration;

the distance calculation module 240, for example, performs operation S5, configured to repeatedly perform the operations of the difference calculation module and the data update module until the loss function converges or the iteration turns reach a set number of times, and obtain a final confrontation sample; calculating the distance between the input sample and the final confrontation sample as the distance between the input sample and a model decision boundary;

the correlation measurement module 250, for example, performs operation S6, and is configured to start a next round of model training by using the final confrontation sample as an initial confrontation sample and repeatedly performing the operation of the distance calculation module, sequentially calculate distances from the input sample to model decision boundaries in each round of model training, obtain a distance matrix, and measure the correlation between the input sample and the model decision boundaries according to the distance matrix.

The system 200 is used to perform the method for measuring correlation between data samples and model decision boundaries in the embodiment shown in FIG. 1. For details that are not described in the present embodiment, please refer to the method for measuring the correlation between data samples and model decision boundaries in the embodiment shown in fig. 1, which is not described herein again.

Fig. 3 is a second block diagram of a system for measuring correlation between data samples and model decision boundaries according to an embodiment of the present invention, where the system includes an initial data generation module, a gradient estimation module, a correlation calculation module, a data update module, and a distance calculation module. Inputting a data sample set and a model provided by a user into a system, and generating a confrontation sample which is closer to an original sample on a model decision boundary by an initial data generation module; the gradient estimation module is used for estimating a gradient value of a data point on the decision boundary to represent the normal vector direction and the magnitude of the decision boundary of the model; the correlation calculation module calculates the correlation between the vector between the countermeasure sample and the original input sample and the gradient vector by adopting a cosine similarity method; the data updating module comprises an optimal gradient updating part and a sample projecting part, the optimal data updating direction at the moment is calculated, the updated data is projected to the model decision boundary, the next moment is updated, a confrontation sample with the minimum loss function is obtained through repeated iterative updating, and the distance between the sample and the original sample is used as the shortest distance from the original data to the model decision boundary; the distance calculation module calculates the two-norm distance between the original sample and the confrontation sample as the distance between the original sample and the model decision boundary, and solves each model in the training process to generate a distance matrix of the final data sample to represent the correlation measurement between the data sample and the model decision boundary.

The effects of the present invention are further illustrated by the following experimental results: the method is applied to member inference attack under deep learning, a model at each moment in a training process is selected, and an result is tested by adopting an Adult, MNIST and Purchase (10) data set. By adopting the method for measuring the correlation between the data samples and the model decision boundaries, corresponding confrontation samples are generated at the decision boundaries, and the distance change matrix from the data samples to each model decision boundary is calculated. With the training of the deep learning model, the model decision boundary changes continuously, and because the training data participates in the training of the model and the test data does not participate, the distance change from the training data to the model decision boundary is different from that from the test data; and respectively selecting training data and testing data to obtain a corresponding distance change matrix. And (3) taking the distance characteristics of the data as input, and taking whether the data is training data as an output label, and training the member inference attack model. Through simulation tests, the success rate of resisting samples and the accuracy rate of member inference attacks of the method under three data sets are shown in table 1.

It can be seen that the method for measuring the correlation between the data samples and the model decision boundary provided by the invention has higher success rate of resisting the samples on each data set and exceeds the baseline level; the accuracy of the completed member inference attack exceeds that of most of the current experiments. The method can accurately measure the relevance between the data sample in the deep learning model and the decision boundary of the model, thereby realizing the privacy protection of the data and having extremely high practicability and universality.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for measuring correlation between data samples and model decision boundaries is used for protecting data of the Internet of things, and is characterized by comprising the following steps:

s1, obtaining an input sample of the model to be evaluated from the Internet of things, and adding Gaussian noise to the input sample to generate an initial confrontation sample at a model decision boundary; the model to be evaluated is a deep learning model;

s2, calculating a normal vector of the initial confrontation sample on the model decision boundary and a difference vector from the input sample to the initial confrontation sample, and calculating an angle difference loss between a unit vector of the difference vector and a unit vector of the normal vector;

s3, calculating an updated challenge sample with the angular difference loss as a loss function for updating the initial challenge sample;

s4, projecting the updated confrontation sample onto the model decision boundary, and taking the projected sample as an initial confrontation sample for the next iteration;

s5, repeating the steps S2 to S4 until the convergence of the loss function or the iteration turns reach the set times, and obtaining a final confrontation sample; calculating the distance between the input sample and the final confrontation sample as the distance between the input sample and a model decision boundary;

and S6, taking the final confrontation sample as an initial confrontation sample, repeating the step S5 to start the next round of model training, sequentially calculating the distance from the input sample to a model decision boundary in each round of model training process to obtain a distance matrix, and measuring the correlation between the input sample and the model decision boundary according to the distance matrix, wherein the correlation is used for evaluating the privacy of the input sample.

2. The method of claim 1, wherein the step of generating the initial confrontation sample at the model decision boundary by adding gaussian noise to the input sample in S1 comprises:

adding multiple groups of random Gaussian noises to the input sample until a first noise which enables the model to be subjected to error classification is obtained;

projecting the disturbed sample to a model decision boundary by utilizing a dichotomy to obtain an initial confrontation sample; the perturbed sample is a superposition of the input sample and a first noise.

3. The method of claim 1 or 2, wherein the step of calculating the normal vector of the initial confrontation sample on the model decision boundary in the step S2 comprises:

s21, to the initial confrontation sample

To carry outBGaussian perturbation of individual direction

To obtainBA perturbed sample

，

，

Is a disturbance constant;

s22, calculating a disturbance sample

Determination value

，

(ii) a Wherein the content of the first and second substances,

for disturbing the sample

Is antagonistic, and

，

is a sample

The real label of (a) is,

is a sample

The prediction tag of (a) is determined,

the representation model predicts asThe probability of a true tag being present,

s23, the initial challenge sample

Normal vectors at decision boundaries of the model

Expressed as:

。

4. the method of claim 3, wherein in S2, the angular difference loss is expressed as:

wherein the content of the first and second substances,

representing the input samples in question, and,

the inner product is represented by the sum of the two,

representing a two-norm.

5. The method of claim 4, wherein the S3 comprises:

s31, taking the inverse of the angular difference loss as the loss function;

s32, obtaining an update direction of the initial confrontation sample by using a monte carlo gradient estimation method, and calculating an updated confrontation sample by using a first-order gradient optimization method.

6. The method of claim 5, wherein the S4 comprises:

confrontation sample after the update

And input samples

On the connection line of (2), by searching

Will satisfy

Of (2) a sample

，

。

7. the method of claim 1 or 6, wherein the step of calculating the distance between the input sample and the final confrontation sample as the distance between the input sample and the model decision boundary in the step S5 comprises: and calculating the norm value of the input sample and the final confrontation sample as the distance between the input sample and the decision boundary of the model.

8. The method of claim 7, wherein in step S6, if models are performed togetherKRound of training, the principleKThe distance from the input sample to the model decision boundary in the round of training is expressed asD ^T= [d ₁, d ₂, …, d _K](ii) a Wherein the content of the first and second substances,d _kis shown askThe distance between the input sample and the final confrontation sample in the training process is calculated,k=1,2,…,K；

(ii) a Wherein the content of the first and second substances,

9. A system for measuring correlation between data samples and model decision boundaries, which is used for data protection of the Internet of things, is characterized by comprising:

the system comprises a data initial module, a model decision boundary evaluation module and a data analysis module, wherein the data initial module is used for acquiring an input sample of a model to be evaluated from the Internet of things and generating an initial confrontation sample at the model decision boundary by adding Gaussian noise to the input sample; the model to be evaluated is a deep learning model;

the difference calculation module is used for calculating a normal vector of the initial confrontation sample on the model decision boundary and a difference vector from the input sample to the initial confrontation sample, and calculating an angle difference loss between a unit vector of the difference vector and a unit vector of the normal vector;

a data updating module for calculating an updated confrontation sample with the angle difference loss as a loss function for updating the initial confrontation sample; projecting the updated confrontation sample to the model decision boundary, and taking the projected sample as an initial confrontation sample of the next iteration;

the distance calculation module is used for repeatedly executing the operations of the difference calculation module and the data updating module until the convergence of the loss function or the iteration turns reach a set number of times, and obtaining a final confrontation sample; calculating the distance between the input sample and the final confrontation sample as the distance between the input sample and a model decision boundary;

and the correlation measurement module is used for starting next round of model training by taking the final confrontation sample as an initial confrontation sample and repeatedly executing the operation of the distance calculation module, sequentially calculating the distance from the input sample to a model decision boundary in each round of model training process to obtain a distance matrix, and measuring the correlation between the input sample and the model decision boundary according to the distance matrix, wherein the correlation is used for evaluating the privacy of the input sample.