CN111914928A

CN111914928A - Method for defending confrontation sample for image classifier

Info

Publication number: CN111914928A
Application number: CN202010749009.8A
Authority: CN
Inventors: 诸渝; 许封元; 仲盛
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2020-11-10
Anticipated expiration: 2040-07-30
Also published as: CN111914928B

Abstract

The invention discloses a method for defending a confrontation sample for an image classifier, which comprises the steps of firstly constructing a model, preparing image training data and initializing a hyper-parameter; secondly, dividing the image training data into a plurality of batches; updating model parameters using a batch of image training data, comprising: generating a countermeasure sample, mixing the countermeasure sample with image training data, adjusting the relative position of the image data in the countermeasure sample, and updating model parameters by using a back propagation algorithm; the image training data of the rest batches are repeatedly used to update the model parameters; restarting a new round of training until the training is finished; and outputting the trained model. The invention combines the Siamese framework with the countertraining, is an improvement on the traditional countertraining algorithm, and can better cope with the attack of countersamples in the image classifier.

Description

Method for defending confrontation sample for image classifier

Technical Field

The invention relates to a method for defending confrontation samples for an image classifier, which is an image classifier based on a neural network and belongs to the field of image classification.

Background

In recent years, with explosive growth of data size and computing power, deep learning has been rapidly developed, in which neural networks serve a wide range of applications with excellent performance. For example, in image classification, an image classifier using a neural network technique can achieve an excellent classification effect. However, neural networks also face severe safety issues, and countermeasure samples are one typical example.

The countermeasure sample is a malicious picture formed by adding weak disturbance to a normal picture by a human means, and can mislead an image classification model based on a neural network to generate wrong output. The robustness of the classification model is seriously threatened by the existence of the confrontation samples, and the threat is even more serious especially when the security requirements involved in the model are high.

Resistance training is widely used as an effective defense method. The core idea is that in each iteration of model training, a confrontation sample is dynamically generated by means of a current model and a certain attack algorithm, and is used as training data to realize the training of the current round of the model together with original image training data. The model obtained through the confrontation training can remarkably improve the self defending ability of the confrontation sample. However, the use of image training data for countertraining is not sufficient, and the interrelationship between different image data is neglected, so that certain disadvantages exist. On the feature space of the model trained through countermeasures, the same-class data features are not close enough, and the different-class data features are not far enough apart and overlap with each other, so that the robustness of the model is not influenced.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems and the defects in the prior art, the invention provides the method for defending the confrontation sample for the image classifier, so that the defects of the confrontation training algorithm of the traditional image classifier are effectively overcome, and the defending capability of the classification model in dealing with the confrontation sample attack is further improved.

There are three relationships between any set of challenge samples and normal image data: 1) the confrontation sample is generated by normal image data, and the confrontation sample and the normal image data correspond to each other one by one; 2) the confrontation sample and the normal image data belong to the same category, but do not correspond to each other one by one; 3) the countermeasure sample and the normal image data belong to different categories. The invention combines the traditional confrontation training algorithm with the Siamese framework, and designs the rearrangement mechanism of the confrontation sample, thereby fully utilizing the three relations between the confrontation sample and the normal image data, effectively reducing the class internal distance and expanding the class distance on the characteristic space, and leading the trained model to have stronger resisting capability to the confrontation sample.

The technical scheme is as follows: a method of countering sample defense for an image classifier, comprising the steps of:

step 1, constructing a model and preparing image training data;

step 2, randomly dividing the image training data into a plurality of mini-batch;

step 3, realizing one-time parameter updating of the model by means of image training data of a mini-batch;

a) selecting image training data of a mini-batch which does not participate in calculation, and generating a corresponding confrontation sample;

b) mixing the confrontation sample generated in the last step with corresponding image training data, and adjusting the relative position of each image data in the confrontation sample;

c) updating the model parameters for the first time by means of a back propagation algorithm;

step 4, repeating the step 3 until all the mini-batch divided in the step 2 participates in the calculation;

step 5, repeating the steps 2-4 until the model is trained;

and 6, outputting the model which is trained in the step 5.

Stochastic partitioning of an image training dataset into a plurality of mini-batch

Assuming that the image training set contains D pieces of image training data, the preset mini-batch size is n. Firstly, randomly disordering D pieces of image training data, and then sequentially selecting the image training data according to the size of the mini-batch, so that the image training set is divided into m mini-batches.

Selecting image training data of the mini-batch which does not participate in the calculation yet, and generating a corresponding confrontation sample

Any set of not-used mini-batch image training data X ═ X (X)₁,x₂,…,x_n)，Under the current model, a certain counterattack algorithm is adopted to generate X corresponding countersample

Mixing the generated confrontation sample with corresponding image training data, and adjusting the relative position of each image data in the confrontation sample

Presetting a division ratio lambda, lambda belongs to [0,1 ∈]Confrontation of samples according to lambda

Is divided into two parts

And

wherein:

random adjustment

In which each image instance is

Is located at a position of, thus is

Performing rearrangement and splicing

And after rearrangement

To obtain

The same operation is carried out on the data label Y of the image training data X to obtain a confrontation sample

Corresponding label

Calculating identity T ═ T (T)₁,t₂,…,t_n) When x is_iAnd

of the same kind, i.e. x_iAnd

corresponding data tag y_iAnd

equal time t

_i1, otherwise t_i＝0。

Updating primary model parameters by means of back propagation algorithm

Assuming that the parameter of the model is W, the learning rate is set to a. The loss function is defined as follows:

wherein alpha, beta and gamma are preset hyper-parameters, l (-) is a cross entropy loss function, l_con(. cndot.) is a comparative loss function. When training, a Siam framework, X and

at the same time, as the input of the network, the gradient of W is calculated by using a loss function L (-) to

And update

Repeating the steps 2-4 until the model finishes training

Presetting the number N of training rounds, randomly dividing the image training set into m mini-batchs in each round, realizing m times of parameter updating of the model, and finishing model training.

The hyper-parameters include: the model learning rate, the size of the mini-batch, the maximum iteration number, coefficients alpha, beta and gamma of each component of the loss function and the division ratio of the confrontation sample.

The model is an image classifier based on a neural network.

The image training data is data in a picture format.

The certain counter attack algorithm comprises: target attack or non-target attack, iterative attack or single-step attack, and attack with different norms, wherein the different norms comprise: l is₀、L₁、L₂And L_∞。

Has the advantages that: compared with the prior art, the method for defending the confrontation sample for the image classifier provided by the invention has the following advantages: the invention combines the countermeasure training with the Siamese framework for the first time, designs the rearrangement mechanism of the countermeasure sample, enables the image classification model to fully utilize the interrelation between image data in the training process, and further improves the defense capability of the image classification model to the countermeasure sample.

Drawings

Fig. 1 is a Siamese framework diagram in an embodiment of the present invention.

Detailed Description

The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.

The present invention will be described in detail with respect to the classification of the Cifar10 dataset. The cfar 10 data set contained 60000 color pictures of 32 x 32, grouped into 10 categories, 5000 training pictures and 1000 test pictures for each category. We chose neural network ResNet18 as the classification model to classify the Cifar10 dataset. It should be noted that the classification of the Cifar10 data set is merely illustrative and not restrictive, and that various equivalent modifications of the invention may be made by those skilled in the art within the scope of the invention as defined in the appended claims.

For the classification problem of the Cifar10 data set, the specific implementation corresponds to the following specific steps:

step 1, constructing a neural network ResNet18 as an image classifier, preparing 50000 pictures of a Cifar10 training data set, and setting a hyper-parameter: the learning rate of the model is 0.1, the size of the mini-batch is 256, the maximum iteration number is 100, each classification coefficient alpha of the loss function classification is 0.5, beta is 0.5, gamma is 1.0, and the confrontation sample segmentation proportion is 0.5;

step 2, randomly dividing the training data into m mini-batch according to the preset size of the mini-batch,

wherein

And 3, realizing one-time parameter updating of the model by virtue of the training data of a mini-batch. Before the algorithm is implemented, a PGD algorithm is selected for generation of a challenge sample, and the number S of PGD algorithm iterations is set to 7, the step size is set to α 2/255, and the perturbation time is set to 8/255. Then, the method is sequentially executed according to the following steps:

a) taking the current model as a target model, and changing the training data X of a group of mini-batch into (X)₁,x₂,…,x_n) Generating corresponding confrontation samples with PGD algorithm

The PGD algorithm recursion formula is as follows:

wherein,

is from x_iTo

The intermediate result of t iterations is the perturbation last, y_iIs x_iCorresponding data tags, Clip function to limit output range, sign is sign function,

the function calculates the gradient of the loss function/to the input.

b) Adjusting the position of each element in the batch countermeasure sample generated in a). Since the preset division ratio λ is 0.5, first, the method starts with

Is divided into two equal parts

And

random arrangement

To make the confrontation sample

Under the condition that the normal training data X do not correspond one to one, the data label corresponding to the countermeasure sample is obtained in the same way

Calculating identity T ═ T (T)₁,t₂,…,t_n)；

c) Let the model parameters of the current ResNet18 be W and b. Two ResNet18 with the same parameters are combined into a form of a Siamese architecture, as shown in FIG. 1, and a countermeasure sample and normal training data are respectively used as input of a network. The gradient of the parameter is calculated using the loss function as follows:

wherein, beta is 0.5, gamma is 1.0. Updating the parameters of the model:

and 4, repeating the step 3, and updating the model parameters W and b by using the divided mini-batch in sequence until all the mini-batch participate in calculation.

And 5, repeating the steps 2, 3 and 4 for 100 times according to the preset maximum iteration number, wherein each repetition completes one round of training of the model. Meanwhile, in order to obtain a better effect for the training of the model, the learning rate of the model is attenuated. Specifically, the learning rate is reduced to 1/10 as it is every 40 rounds.

And 6, outputting the trained ResNet18 model. As the Siamese framework is composed of two ResNet18 with the same parameters, one of the parameters can be output arbitrarily.

We label the classification model obtained according to the above procedure as SAT. Meanwhile, in order to better evaluate the model performance, a traditional confrontation training method is adopted, the network structure is kept unchanged, basic hyper-parameters are consistent, and a classification model of the Cifar10 data set is obtained and is marked as AT. Table one compares the model accuracy of the two against different attacks.

TABLE-comparison of model accuracy under different attacks on Cifar10 dataset

The performance of the algorithm is evaluated by adopting model accuracy, wherein the model accuracy is the number of correctly classified confrontation samples/total test confrontation samples. 5 common attacks FGSM, PGD, BIM, CW and JSMA were chosen, where PGD was also used for model training. The model accuracy of the confrontation training algorithm provided by the invention in dealing with the attack is higher than that of the traditional confrontation training algorithm, and the amplification is between 2 and 7.3 percent, so that the attack of the confrontation sample can be better defended.

Common counterattack algorithms are found in the following papers:

FGSM，GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining and harnessing adversarial examples[J].arXiv preprint arXiv:1412.6572,2014；

PGD，MADRY A,MAKELOV A,SCHMIDT L,et al.Towards deep learning models resistant to adversarial attacks[J].arXiv preprint arXiv:1706.06083,2017；

BIM，KURAKIN A,GOODFELLOW I,BENGIO S.Adversarial machine learning at scale[J].arXiv preprint arXiv:1611.01236,2016；

CW，CARLINI N,WAGNER D.Towards Evaluating the Robustness of Neural Networks[C]//2017IEEE Symposium on Security and Privacy(SP).2017:39–57.；

JSMA，PAPERNOT N,MCDANIEL P,JHA S,et al.The limitations of deep learning in adversarial settings[C]//2016IEEE European symposium on security and privacy(EuroS&P).2016:372–387。

Claims

1. a method of countering sample defense for an image classifier, comprising the steps of:

step 1, constructing a model and preparing image training data;

step 5, repeating the steps 2-4 until the model is trained;

and 6, outputting the model which is trained in the step 5.

2. The method of claim 1, wherein the random partitioning of the image training data set into a plurality of mini-batchs is implemented as follows:

assuming that the image training set contains D pieces of image training data, the preset mini-batch size is n. Firstly, randomly disordering D pieces of image training data, and then sequentially selecting the image training data according to the size of the mini-batch, so that an image training set is divided into m mini-batches;

3. the method of claim 1, wherein the method for defending the image classifier against the challenge sample is implemented by selecting an image training data of a mini-batch that does not participate in the calculation yet and generating the corresponding challenge sample as follows:

any set of not-used mini-batch image training data X ═ X (X)₁，x₂，...，x_n) Under the current model, a certain counterattack algorithm is adopted to generate X corresponding countersample

4. The method of claim 1, wherein the generated confrontation samples are mixed with corresponding image training data, and the relative position of each image data in the confrontation samples is adjusted by:

Is divided into two parts

And

wherein:

random adjustment

In which each image instance is

Is located at a position of, thus is

Performing rearrangement and splicing

And after rearrangement

To obtain

Corresponding label

Calculating identity T ═ T (T)₁，t₂，...，t_n) When x is_iAnd

of the same kind, i.e. y_iAnd

equal time t_i1, otherwise t_i＝0。

5. The method of claim 1, wherein the model parameters are updated once by a back propagation algorithm, and the method comprises the following steps:

assuming that the parameter of the model is W, and setting the learning rate as a; the loss function is defined as follows:

wherein alpha, beta and gamma are preset hyper-parameters, l (-) is a cross entropy loss function, l_con(. cndot.) is a contrast loss function; when training, a Siam framework, X and

And update

6. The method of claim 1, wherein steps 2-4 are repeated until the model is trained; presetting the number N of training rounds, randomly dividing the image training set into m mini-batchs in each round, realizing m times of parameter updating of the model, and finishing model training.

7. The method of claim 1, wherein the model hyper-parameters comprise: the model learning rate, the size of the mini-batch, the maximum iteration number, coefficients alpha, beta and gamma of each component of the loss function and the division ratio of the confrontation sample.

8. The method of claim 1, wherein the image training data is in a picture format.

9. The method of claim 1, wherein the certain counterattack algorithm comprises: target attack or non-target attack, iterative attack or single-step attack, and attack with different norms, wherein the different norms comprise: l is₀、L₁、L₂And L_∞。