CN109948658B

CN109948658B - Feature diagram attention mechanism-oriented anti-attack defense method and application

Info

Publication number: CN109948658B
Application number: CN201910138087.1A
Authority: CN
Inventors: 陈晋音; 郑海斌; 熊晖; 成凯回; 杨东勇; 宣琦
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-02-25
Filing date: 2019-02-25
Publication date: 2021-06-15
Anticipated expiration: 2039-02-25
Also published as: CN109948658A

Abstract

The invention discloses an attention mechanism-oriented anti-attack defense method, which comprises the following steps: (1) extracting the profile characteristics of the target profile by adopting an attention mechanism, adding a tiny disturbance quantity based on the profile characteristics to obtain a countermeasure sample, and optimizing a disturbance variable in a momentum iteration mode to update the countermeasure sample so as to realize countermeasure attack on the depth model; (2) and carrying out countermeasure training on the depth model by using the countermeasure sample based on a multi-strength countermeasure training strategy so as to realize the defense of the depth model against the countermeasure attack. The method improves the robustness and generalization capability of the classifier on resisting sample attacks, so that the classifier is more reliable and stable, and the safety of the deep learning model in the actual application process is improved. The application of the attention-oriented anti-attack defense method in image classification is also disclosed.

Description

Feature diagram attention mechanism-oriented anti-attack defense method and application

Technical Field

The invention belongs to the field of safety application research of a deep learning algorithm in artificial intelligence in an image classification task, and particularly relates to an attention mechanism-oriented anti-attack defense method and application of the anti-attack defense method in image classification.

Background

In recent years, deep learning has been widely used in various industries and achieves good effects by virtue of strong feature learning capabilities, for example: computer vision, bioinformatics, complex networks, natural language processing, and the like. However, with the wide application of deep learning, the disadvantages of the deep learning are gradually revealed, and one of the main disadvantages is that the deep learning model is vulnerable to the attack of resisting samples and is very vulnerable. For example, a normal picture taken under natural conditions can be classified as the correct landmark with a high degree of confidence, but once a carefully designed small perturbation is added to obtain a confrontation image, the confrontation sample image is misclassified by the deep learning model. Worse still, the human visual system cannot distinguish these carefully designed challenge samples because the added perturbations are very small.

As research progresses, counterattack patterns for depth models have gradually been systematized. According to the degree of understanding of an attacker on the depth model, the depth model can be divided into a black box attack, a white box attack and a gray box attack, wherein the black box attack is to resist the attack without solving any parameter and structure of the model, the white box attack is to know all attributes of the model, and the gray box attack is to know the parameters and the structure of the model, namely to know part of the parameters and the structure of the model. The method comprises the steps that a target attack and a target attack can be classified according to a misclassification result realized by the countermeasure sample, the countermeasure sample of the target attack only needs to be classified wrongly, the target attack not only needs to be classified wrongly, but also needs to be classified wrongly into a target class preset by an attacker. Generally, the optimization objective function is different according to the purpose of the non-target attack and the target attack. Furthermore, these attack methods exist not only in the digital space, but also in the physical world. If an attacker can pretend other people by wearing carefully designed glasses, the attacker can cheat the face recognition system; an attacker can also attach small stickers to license plates or guideboards, causing false identifications to fool license plate identification systems or guideboard identification systems of autonomous vehicles. It can be seen that the performance of the deep learning model is seriously damaged by the attack, so that the safety of a system based on the deep learning model is threatened, and even the safety of lives and property of people is threatened. Therefore, it is necessary to study the vulnerability existing in the depth model and perform defense.

Meanwhile, research on defense methods for resisting attacks by aiming at a depth model is gradually focused, and current defense measures mainly comprise 3 types: the added countermeasure disturbance can be destroyed by modifying the input data defense, for example, adding some random noise to the input image to be recognized or turning and scaling the image; modifying the defense of the model network structure, such as modifying the size of a convolution kernel, the pooling range, increasing the number of network layers, modifying an activation function and the like; and adding an external network to the model for defense, for example, adding an external model at the test time to realize the detection or recovery of the countermeasure sample by the model. Although most defense methods have a certain defense effect on resisting attacks, the migration is limited, and novel resisting attacks cannot be well defended.

Meanwhile, the latest research shows that the modification of the training data set of the model, namely the addition of the countermeasure sample in the training data to carry out the countermeasure training on the model, is a defense means with better effect at present. However, the defense effect of the countertraining depends on the quality of the generated countersample, and the migration capability of the countersample generated by the current attack method is weak, so that the better defense effect of the countertraining is difficult to achieve.

Disclosure of Invention

The invention aims to provide a method for defending against attacks facing a feature map attention mechanism, which focuses the contour features of a target in an image and increases disturbance quantity on the focused contour features through the feature map attention mechanism, realizes that countermeasures samples are easy to generate for resisting attacks on a depth model, and trains the depth model by utilizing the countermeasures and normal samples so as to improve the robustness of a classification model for defending against the attacks.

The invention also aims to provide application of the feature diagram attention mechanism-oriented anti-attack defense method in image classification, the feature diagram attention mechanism-oriented anti-attack defense method can obtain an image classification model capable of defending attack, and the image classification model can greatly improve the accuracy of image classification.

In order to achieve the purpose, the invention provides the following technical scheme:

an attention-oriented anti-attack defense method comprises the following steps:

(1) extracting the contour features of a target contour in an image by adopting an attention mechanism, designing a tiny disturbance quantity based on the extracted contour features to be added into an original normal sample to obtain a countercheck sample, and optimizing a disturbance variable in a momentum iteration mode to update the countercheck sample so as to realize countercheck attack on a depth model;

(2) and performing countermeasure training on the depth model based on a multi-strength countermeasure training strategy by using the data set obtained by mixing the countermeasure sample and the normal sample so as to realize defense of the depth model against countermeasure attack.

The method utilizes the spatial key information of the accurate classification of the target contour on the characteristic diagram of the spatial attention mechanism, further obtains the position to be added for the anti-disturbance through the gradient calculation of the output loss function value, and optimizes the disturbance value of each time based on the momentum iteration method so as to generate high-quality anti-sample to realize effective attack. And then carrying out multi-strength countermeasure training on the depth model so as to realize the robustness and the mobility of the depth model to the countermeasure attack defense.

Wherein, the extracting the contour feature of the target contour in the image by adopting the attention mechanism, and designing a tiny disturbance amount to be added into the original normal sample based on the extracted contour feature, and the obtaining the confrontation sample comprises:

a reconstruction feature extraction step, namely extracting a shallow feature image of an input original image as a feature image by adopting an attention mechanism based on shallow network features of a depth model, and performing up-sampling operation on the feature image to obtain a reconstruction feature image;

a channel space attention weight calculation step, namely calculating a channel space attention weight matrix according to the original image and the reconstructed characteristic image;

a pixel space attention weight calculation step, namely calculating a pixel space attention weight matrix according to the reconstructed channel space attention weight matrix and the original image;

and a countermeasure sample generation step, namely calculating the added disturbance quantity according to the pixel space attention weight matrix, and adding the disturbance quantity to the original image to obtain a countermeasure sample.

The attention mechanism can be divided into a soft attention mechanism and a hard attention mechanism, wherein the hard attention mechanism is a random weight distribution process based on Bernoulli distribution, the soft attention mechanism is an embeddable weighting method parameterized by a neural network, and a better effect can be achieved by end-to-end training by using global information in a depth model. Therefore, the present invention uses a soft attention mechanism for the anti-perturbation computation.

In the depth model classifier, the deep features have a larger field of view than the shallow features, but the spatial information of the deep feature map is largely lost. Therefore, the shallow feature output of the deep neural network is reconstructed through bilinear interpolation to be H and W which are the same as those of the input sample, wherein H represents the number of pixel points in the vertical direction of the image, and W represents the number of pixel points in the horizontal direction of the image. Attention mechanisms for searching for the disturbance distribution include channel space attention, which focuses on the channel feature distribution by performing weighted feature mapping on different channels, and pixel space attention, which focuses on the pixel feature distribution by performing weighted feature mapping on different pixel regions.

Specifically, in the channel space attention weight calculation step,

will have a size of [ H, W,3 ]]Is converted into a size of [3, l ] by reshape operation]Picture x of^reH represents the number of pixels in the vertical direction of the image, W represents the number of pixels in the horizontal direction of the image, 3 represents a color image with three channels of RGB, and l is H × W;

the size of the shallow hidden layer after up-sampling is [ H, W, c ]]Reconstructed characteristic image f of_mConverted to the size [ c, l ] by reshape operation]Reconstructed characteristic image f of_mm；

By the formula

To obtain a size of [3, c]Of the channel space attention weight matrix W_cWherein softmax (·) is an activation function.

In the pixel space attention weight calculation step,

using formulas

Calculated size of [3, l]OfChannel spatial attention weight

Wherein the content of the first and second substances,

represents multiplication of a matrix;

using formulas

Calculated size of [1, l]Pixel space attention weight W_pWhere, denotes multiplication of each corresponding element of the matrix, softmax (·) is an activation function.

In the step of generating the challenge sample,

operate by reshape function to get size [1, l]Pixel space attention weight W_pIs changed to a size of [ H, W,1 ]]Attention mapping weight W of_map；

The added disturbance amount ρ is calculated by the following formula:

wherein, represents the multiplication of corresponding elements of the two matrixes; y represents the correct class mark corresponding to the original image x;

representing calculated gradients

1-norm of (i.e., the sum of the absolute numbers of vector elements); x is the number ofⁱA pixel matrix representing an ith channel;

finally, by the formula

Obtaining a confrontation sample x^*Wherein, in the step (A),

representing the addition of corresponding elements of the matrix.

Specifically, optimizing the disturbance variables to update the countermeasure samples by way of momentum iteration includes:

the maximum iteration number of the trained deep learning classifier f is set to be T, the original image is x, and the correct class corresponding to the original image x is marked as y. At the beginning of the iteration, order

Setting an initial velocity vector g₀＝0；

Defining an attack optimization objective function of an iterative process as follows:

the over parameter kappa is more than or equal to 0, the confidence coefficient of the misclassification class target of the generated countermeasure sample is represented, the requirement for producing the countermeasure sample is higher when the value of kappa is larger, and the obtained sample attack performance is more reliable; x is the number of₀Representing an initial image without added disturbance, i.e. an original image x; z (x)_yConfidence that the sample is classified as y, Z (x)_y′Represents the confidence that the sample is classified as y';

represents x-x₀Is used to limit the magnitude of the counterdisturbance, i.e. the sum of the squares of the absolute values of the vector elements is then root-opened, y_t' represents a specific target label preset by an attacker;

(1) inputting an image

To the deep learning classifier f, calculating the gradient of the deep learning classifier f to the input

And capturing an image

Shallow feature images in a network

Method for aligning shallow feature image by bilinear interpolation

Performing an upsampling operation to obtain a reconstructed feature image

Obtaining a pixel space attention weight by the following calculation formula

Wherein the content of the first and second substances,

representing the reconstructed channel spatial attention weight,

representing the channel space attention weight before reconstruction. By reshape function pair

Performing a reconstruction operation to obtain

Representing a matrix multiplication, softmax (-) is an activation function,

representing a reconstructed image matrix

Representing multiplication of corresponding elements of the matrix, pair before execution of the softmax (·) function

The calculated matrix is summed in the column direction once so that

(2) Weighting pixel spatial attention by a reconstruction operation

Reconstructing as attention mapping weights

(3) Updating velocity vector g by gradient-based direction_i+1：

Wherein mu is an attenuation factor,

representing calculated gradients

1-norm of (1);

(4) based on the velocity vector g_i+1Calculating the disturbance amount rho required to be added_i：

ρ_i＝g_i+1×α

Wherein alpha represents the disturbance step length added each time in the iterative process;

(5) will disturb the quantity ρ_iAdding to images

Is updatedThe latter challenge samples:

repeating the steps (1) to (5) until the disturbance is larger than the preset value

Or to achieve a successful attack

I.e., the challenge sample has been successfully generated, wherein,

representing infinite norm, i.e.

And the maximum value of the medium absolute value, epsilon is a preset disturbance size, and y is a correct class mark of the original image x.

Performing countermeasure training on the depth model based on a multi-strength countermeasure training strategy using countermeasure samples includes:

(1) based on a preset disturbance amplitude parameter epsilon, adopting step (1) in an attention-oriented system counterattack defense method to generate a batch of counterattack sample subset { x_adv1And then continuously adjusting the disturbance amplitude value to be epsilon/2, epsilon/3, epsilon/4 to obtain a confrontation sample subset { x }_adv2H, a subset of confrontation samples { x }_adv3H, a subset of confrontation samples { x }_adv4}；

(2) Mixing all the confrontation sample subsets obtained in the step (1) to obtain a total confrontation sample set with different attack capabilities, and mixing the confrontation samples and normal samples from 0.1, 0.2, 0.3, … and 1.0 according to the value of the attack strength AIn to obtain new training data sets with different attack strengths;

(3) and (3) carrying out fine tuning training on the weight parameters of the depth model by using the new training data set with different attack strengths obtained in the step (2).

The application of the attention-oriented anti-attack defense method in image classification is characterized by comprising the following processes:

firstly, an image set with similar characteristics to an image to be classified is used as an original image, a deep neural network is used as an image classification model, a large number of confrontation samples are generated by using the confrontation attack defense method facing the characteristic image attention mechanism, the confrontation samples are used for carrying out multi-strength confrontation training on the trained image classification model to discover and repair existing loopholes, and the image classification model with the capability of defending the confrontation samples is obtained;

and then, classifying the classified images by adopting a trained image classification model with the capability of defending against the sample to obtain a reliable classification result.

The invention provides a feature map attention mechanism-oriented method for defending against attacks, which is characterized in that a countermeasure sample with more tiny disturbance but capable of reliably misleading a classifier is obtained through the feature map attention mechanism, and the robustness and generalization capability of the classifier on the countermeasure sample attack are improved by carrying out multi-strength countermeasure training on an original classifier through the countermeasure sample, so that the classifier is more reliable and stable, and the safety of a deep learning model in the actual application process is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of a countermeasure sample generation method FineFool based on a feature map attention mechanism;

FIG. 2 is a graph of the challenge samples generated by the depth model ResNet-v2 under the attack of MI-FGSM, PGD, and FineFool attack methods;

FIG. 3 is a graph of challenge samples generated by the depth model inclusion-v 3 under the attack of MI-FGSM, PGD, and FineFool attack methods;

FIG. 4 is a confidence reduction curve of the original correct class labels of the countermeasure samples generated by the depth model ResNet-v2 under the attack of the MI-FGSM, PGD and FineFool attack methods;

FIG. 5 is a confidence curve of the error classification class labels of the countermeasure samples generated by the depth model increment-v 3 under the attack of the MI-FGSM, PGD and FineFool attack methods.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

In order to improve the robustness of the deep learning model, the embodiment provides a method for defending against attacks facing to a feature map attention mechanism, which mainly comprises two stages, namely a stage for generating a confrontation sample and a stage for training the confrontation of the deep learning model, and the specific process is as follows:

for the challenge sample generation phase:

in the stage, the method mainly utilizes an attention mechanism to extract the contour features of the target contour, adds a small disturbance quantity based on the contour features, and optimizes disturbance variables in a momentum iteration mode, so as to realize counterattack on the depth model, and the counterattack method is named as FineFool and can generate a countersample, and specifically, as shown in FIG. 1, the counterattack method comprises a reconstruction feature extraction step, a channel space attention weight calculation step, a pixel space attention weight calculation step and a generation step of the countersample.

The method comprises a step of reconstructing characteristic, a step of extracting a shallow network characteristic graph in a deep learning model, and a step of extracting a shallow network characteristic graph in the deep learning model. For a size of [ H, W,3 ]]H is the number of pixels in the vertical direction of the image, W is the number of pixels in the horizontal direction of the image, and 3 is the number of RGB channels included in the original image, the original image (original image) is divided intox are input into a depth classification model (i.e., classifier f) and are calculated to extract dimensions [ H1, W1, C ]]Shallow feature image x of^fThe shallow feature image has better spatial feature as the feature image, and then the feature image is subjected to bilinear up-sampling, namely a bilinear interpolation method is used for carrying out bilinear up-sampling on the feature image x^fThe upsampling operation is performed to obtain a dimension of [ H, W, c]Reconstructed characteristic image f of_m。

The channel (or channel) space attention weight calculation step is mainly used for calculating the channel space attention weight W_c. The specific process is as follows: will have a size of [ H, W,3 ]]Is converted into a size of [3, l ] by reshape operation]Picture x of^reH represents the number of pixels in the vertical direction of the image, W represents the number of pixels in the horizontal direction of the image, and l is H × W; the size is [ H, W, c ]]Reconstructed characteristic image f of_mConverted to the size [ c, l ] by reshape operation]Reconstructed characteristic image f of_mmThen, by the formula

The pixel space attention weight calculation step is mainly used for calculating the pixel space attention weight W_p. The specific process is as follows: first, using a formula

Calculated size of [3, l]Reconstructed channel space attention weight of

Wherein the content of the first and second substances,

represents multiplication of a matrix; then, using the formula

Calculated size of [1, l]Pixel space attention rights ofHeavy W_pWhere, denotes multiplication of each corresponding element of the matrix, softmax (·) is an activation function.

The challenge sample generation step is mainly used for generating a challenge sample x^*The specific process is as follows: first, the size is [1, l ] by reshape function operation]Pixel space attention weight W_pIs changed to a size of [ H, W,1 ]]Attention mapping weight W of_mapThen, the added disturbance amount ρ is calculated by the following formula:

representing calculated gradients

finally, by the formula

Obtaining a confrontation sample x^*Wherein, in the step (A),

representing the addition of corresponding elements of the matrix.

On the basis of the generation of the countermeasure sample, the specific process of updating the countermeasure sample by a momentum iteration method is as follows:

Setting an initial velocity vector g₀＝0。

on the basis, the iteration process is as follows:

(1) inputting an image

And capturing an image

Shallow feature images in a network

Method for aligning shallow feature image by bilinear interpolation

Performing an upsampling operation to obtain a reconstructed feature image

Obtaining a pixel space attention weight by the following calculation formula

Wherein the content of the first and second substances,

representing the reconstructed channel spatial attention weight,

Performing a reconstruction operation to obtain

Representing a matrix multiplication, softmax (-) is an activation function,

representing a reconstructed image matrix

The calculated matrix is summed in the column direction once so that

(2) Weighting pixel spatial attention by a reconstruction operation

Reconstructing as attention mapping weights

(3) Updating velocity vector g by gradient-based direction_i+1：

Wherein mu is an attenuation factor,

representing calculated gradients

1-norm of (1);

ρ_i＝g_i+1×α

(5) will disturb the quantity ρ_iAdding to images

Obtaining an updated confrontation sample:

Or to achieve a successful attack

I.e. the challenge sample has been successfully generated. Wherein the content of the first and second substances,

representing infinite norm, i.e.

The maximum value of the medium absolute value, epsilon is a preset disturbance size, and y is a correct class mark of the original image x;

and if the countermeasure sample is successfully generated, jumping out of iteration and outputting the countermeasure sample. Otherwise, judging whether the current iteration time i exceeds the maximum iteration time T, if not, continuing momentum iteration, if so, stopping iteration and outputting attack failure.

The resulting confrontational sample visualization is shown in the last column of FIGS. 2 and 3, where ρ_FineFoolShowing the anti-disturbance visualization by the FineFool method, Adv_FineFoolIndicating that the confrontation sample after the confrontation disturbance is added on the original normal sample.

Countermeasure training phase for depth model:

in the stage, the countermeasure samples generated in the countermeasure sample generation stage are utilized to carry out multi-strength countermeasure training on the depth model, and the multi-strength countermeasure training method specifically comprises the following steps:

under the same other conditions, different anti-disturbance upper limit values, namely different epsilon values, are set, so that anti-samples with attack capabilities of different strengths are obtained. The method comprises the steps of mixing countermeasure samples with different strengths and normal samples according to a certain proportion to obtain different training data sets for countermeasure training, and performing batch-wise countermeasure training on a depth model by using the training data sets, so that the depth model improves the generalization capability on countermeasure attack defense under the condition that the classification accuracy of the normal samples is reduced as little as possible, namely, the countermeasure samples generated by different attack methods can be defended.

The attack strength (AIn) of the training data set is defined as:

AIn＝Num(Adv)/Num(Nor)

wherein num (adv) and num (nor) respectively represent the number of samples of the confrontation sample and the normal sample, generally, the number of samples of the normal image in the training data set is fixed, the confrontation sample can be generated according to different parameters of the attack method, so the number far exceeds the number of the normal sample, and the value range of AIn is Ain ≧ 0.

The specific process of performing countermeasure training on the depth model comprises the following steps:

(1) based on a preset disturbance amplitude parameter epsilon, generating a batch of confrontation sample subsets { x ] through the confrontation attack method attack depth model based on the feature map attention machine mechanism_adv1And then continuously adjusting the disturbance amplitude values to be epsilon/2, epsilon/3 and epsilon/4 to obtain more data sample subset { x }_adv2}、{x_adv3}、{x_adv4And as the preset disturbance amplitude is reduced, the attack success rate is reduced, the number of corresponding confrontation samples is reduced, and the overall attack capability of the confrontation samples of each set is weakened.

(2) Mixing all the confrontation samples obtained in the step (1) to obtain a total set of the confrontation samples with different attack capabilities, ensuring the balance and diversity of data distribution, and then mixing the confrontation samples with normal samples from 0.1, 0.2, 0.3, … and 1.0 according to the value of AIn to obtain a new training data set with different attack strengths; the normal samples in these new training data sets are all the same, and the challenge samples have a certain randomness.

(3) And (3) carrying out fine tuning training on the weight parameters of the depth model by using the training data sets with different attack strengths obtained in the step (2), so that the depth model has better robustness to the attack of the anti-sample, and the reliability of the application of the depth model is improved.

Application example

The method for defending against attacks facing the feature map attention mechanism is applied to image classification, and specifically can be used for classifying animal images and target images such as face images.

When the method is applied, firstly, an image set with similar characteristics to an image to be classified is used as an original image, a deep learning network (which can be Resnet-v2 or inclusion-v 3) is used as an image classification model, a large number of confrontation samples are generated by using the confrontation attack defense method facing the characteristic map attention mechanism, the confrontation samples are used for carrying out multi-strength confrontation training on the trained image classification model to discover and repair existing bugs, an image classification model with the capability of defending the confrontation samples is obtained, and then the trained image classification model with the capability of defending is used for classifying the classified images to obtain a reliable classification result.

Specific experiments are as follows:

the image dataset used in this experiment was a subset of the ImageNet image dataset from http:// www.image-net.org/, the basic cases of which include: (a) the image dataset had 130000 training image samples, 100000 test image samples, and 50000 validation set samples, each image sample having a size of 64 x 64 matrices; (b) the data set can be divided into 1000 classes, each class has the same number of image samples, namely 130 samples in each class in the training set, 50 samples in each class in the verification set and 100 samples in each class in the testing set; (c) a simple normalization operation was performed for each picture for ease of experiment.

And performing parameter fine-tuning training on the trained image classification model by using the training set, and generating a confrontation sample by using a FineFool method.

The image classification models used in this experiment are Resnet-v2 and inclusion-v 3, the resulting confrontation sample visualization results are shown in FIG. 2, the last column of FIG. 3, and original in FIG. 2 represents the original normal image, ρ_MI-FGSM、Adv_MI-FGSM、ρ_PGD、Adv_PGD、ρ_FineFool、Adv_FineFoolRespectively representing the perturbation and challenge profiles obtained by the MI-FGSM, PGD and FineFool attack methods. Fig. 2 and 3 show the results obtained by the attack depth models respet-v 2 and inclusion-v 3, respectively. Fig. 4 and 5 show a confidence decreasing curve of the original correct class target and a confidence increasing curve of the misclassified class target of the countermeasure sample as shown in fig. 2 and 3 during the attack.

Among them, PGD and MI-FGSM are comparative attack methods. The PGD applies a standard gradient descent once, and then cuts all coordinates into a region, and research shows that the local maximum obtained by the PGD has similar loss functions compared with a network of normal training or resistance training, and the phenomenon shows that the resistance sample generated by the method has good robustness. The MI-FGSM attack method introduces a generalized momentum iteration algorithm to enhance the anti-attack capability, and the momentum term is embedded into the attack iteration process, so that the updating direction of disturbance can be stabilized in the iteration process, and the problem of local optimum is avoided.

The MI-FGSM, PGD and FineFool anti-attack methods attack the Resnet-v2 and inclusion-v 3 depth models, and then the generated anti-sample is used for multi-strength anti-training defense operation, and the obtained defense effect is shown in Table 1. Shown in table 1 is the attack success rate, the smaller the value, the less the model is successfully attacked, and the better the defense ability is. It can be seen that the FineFool provided by the invention can generate better confrontation samples, so that the model has better defense effect after confrontation training. Different attack methods attack the model after the countertraining of the countersample generated by using the FineFool attack method.

TABLE 1 attack success rate after countertraining based on FineFool attack method

The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. An attention-oriented anti-attack defense method comprises the following steps:

step 1, extracting the contour features of a target contour in an image by adopting an attention mechanism, designing a small disturbance amount to be added into an original normal sample based on the extracted contour features, and obtaining a confrontation sample: the method comprises the following steps:

a reconstruction feature extraction step, namely extracting a shallow feature image of an input original image as a feature image by adopting an attention mechanism based on shallow network features of a depth classification model, and performing up-sampling operation on the feature image to obtain a reconstruction feature image;

a channel space attention weight calculation step, which is used for calculating a channel space attention weight matrix according to the original image and the reconstructed characteristic image, and comprises the following steps: will have a size of [ H, W,3 ]]Is converted into a size of [3, l ] by reshape operation]Picture x of^reH represents the number of pixels in the vertical direction of the image, W represents the number of pixels in the horizontal direction of the image, 3 represents a color image with three channels of RGB, and l is H × W; the size of the shallow hidden layer after up-sampling is [ H, W, c ]]Reconstructed characteristic image f of_mConverted to the size [ c, l ] by reshape operation]Reconstructed characteristic image f of_mm(ii) a By the formula

To obtain a size of [3, c]Of the channel space attention weight matrix W_cWherein softmax (·) is an activation function;

a pixel space attention weight calculation step, which is used for calculating a pixel space attention weight matrix according to the reconstructed channel space attention weight matrix and the original image, and comprises the following steps: using formulas

Calculated size of [3, l]Reconstructed channel space attention weight of

Wherein the content of the first and second substances,

represents multiplication of a matrix; using formulas

Calculated size of [1, l]Pixel space attention weight W_pWherein, representing multiplication of each corresponding element of the matrix, and softmax (·) is an activation function;

an antagonistic sample generation step of calculating an added disturbance amount according to the pixel space attention weight matrix, and adding the disturbance amount to the original image to obtain an antagonistic sample, including: operate by reshape function to get size [1, l]Pixel space attention weight W_pIs changed to a size of [ H, W,1 ]]Attention mapping weight W of_map(ii) a Calculating an added disturbance amount ρ by formula (1); by the formula

Obtaining a confrontation sample x^*Wherein, in the step (A),

representing the addition of corresponding elements of the matrix;

representing calculated gradients

1-norm of (i.e., the sum of the absolute numbers of vector elements); x is the number of^kA pixel matrix representing a k-th channel;

step 2, optimizing disturbance variables in a momentum iteration mode to update a counterattack sample, so that counterattack on the depth classification model is realized, and the method comprises the following steps:

firstly, setting the maximum iteration number of a trained depth classification model f as T, an original image as x, and a correct class corresponding to the original image x as y, and enabling the depth classification model f to be trained when iteration starts

Setting an initial velocity vector g₀＝0；

Then, defining an attack optimization objective function of the iterative process as:

the over parameter kappa is more than or equal to 0, the confidence coefficient of the misclassification class target of the generated countermeasure sample is represented, the requirement for producing the countermeasure sample is higher when the value of kappa is larger, and the obtained sample attack performance is more reliable; x is the number of₀Representing an initial image without added disturbance, i.e. an original image; z (x)_yConfidence that the sample is classified as y, Z (x)_y′Represents the confidence that the sample is classified as y';

finally, the process of optimizing the disturbance variable in a momentum iteration mode to update the countermeasure sample is as follows:

(1) inputting an image

To a depth classification model f, calculating the gradient of the depth classification model f to the input

And capturing an image

Shallow feature images in a network

Method for aligning shallow feature image by bilinear interpolation

Performing an upsampling operation to obtain a reconstructed feature image

Obtaining a pixel space attention weight by the following calculation formula

Wherein the content of the first and second substances,

representing the reconstructed channel spatial attention weight,

representing the channel space attention weight before reconstruction; by reshape function pair

Performing a reconstruction operation to obtain

Representing a matrix multiplication, softmax (-) is an activation function,

representing a reconstructed image matrix

The calculated matrix is summed in the column direction once so that

(2) Weighting pixel spatial attention by a reconstruction operation

Reconstructing as attention mapping weights

(3) Updating velocity vector g by gradient-based direction_i+1：

Wherein mu is an attenuation factor,

representing calculated gradients

1-norm of (1);

ρ_i＝g_i+1×α (5)

(5) will disturb the quantity ρ_iAdding to images

Obtaining an updated confrontation sample:

Or to achieve a successful attack

I.e., the challenge sample has been successfully generated, wherein,

representing infinite norm, i.e.

and 3, performing countermeasure training on the deep classification model by using a data set obtained by mixing the countermeasure sample and the normal sample based on a multi-strength countermeasure training strategy to realize defense of the deep classification model against countermeasure attack, wherein the defense comprises the following steps:

(1) generating a batch of confrontation sample subsets { x ] by adopting steps 1 and 2 in the confrontation attack defense method facing the attention mechanism based on the preset disturbance size epsilon_adv1And then continuously adjusting the disturbance amplitude value to be epsilon/2, epsilon/3, epsilon/4 to obtain a confrontation sample subset { x }_adv2H, a subset of confrontation samples { x }_adv3H, a subset of confrontation samples { x }_adv4}；

(2) Mixing all the confrontation sample subsets to obtain a total confrontation sample set with different attack capabilities, and mixing the confrontation samples and normal samples from 0.1, 0.2, 0.3, … and 1.0 according to the value of the attack intensity AIn to obtain a new training data set with different attack intensities, wherein the attack intensity is the ratio of the number of the confrontation samples to the number of the normal samples;

(3) and carrying out fine tuning training on the weight parameters of the deep classification model by using the new training data sets with different attack strengths.

2. Use of the method of defending against attacks directed to the attention mechanism according to claim 1, in image classification, characterized in that it comprises the following processes:

firstly, an image set with similar characteristics to an image to be classified is used as an original image, a deep neural network is used as an image classification model, a large number of confrontation samples are generated by using the confronting attack defense method facing the attention mechanism in claim 1, the confrontation samples are used for carrying out multi-strength confrontation training on the trained image classification model to discover and repair existing bugs, and the image classification model with the capability of defending the confrontation samples is obtained;