CN113255816B - Directional attack countermeasure patch generation method and device - Google Patents

Directional attack countermeasure patch generation method and device Download PDF

Info

Publication number
CN113255816B
CN113255816B CN202110646139.3A CN202110646139A CN113255816B CN 113255816 B CN113255816 B CN 113255816B CN 202110646139 A CN202110646139 A CN 202110646139A CN 113255816 B CN113255816 B CN 113255816B
Authority
CN
China
Prior art keywords
loss
countermeasure
patch
attack
white
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110646139.3A
Other languages
Chinese (zh)
Other versions
CN113255816A (en
Inventor
蒋玲玲
罗娟娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202110646139.3A priority Critical patent/CN113255816B/en
Publication of CN113255816A publication Critical patent/CN113255816A/en
Application granted granted Critical
Publication of CN113255816B publication Critical patent/CN113255816B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a directional attack countermeasure patch generation method and a device, wherein the method adopts a plurality of continuous white box models with different structures to iteratively update countermeasure patches so that the obtained target general countermeasure patch can have a better attack effect on a black box model with an unknown structure. By introducing triple loss, the success rate of outputting the target class can be improved in the process of directional attack. By introducing attention transfer loss, the migration effect of the target universal countermeasure patch on the model attention area can be improved, so that the directional attack effect of the target universal countermeasure patch is greatly improved. By introducing the smoothing loss, the difference between the pixel points of the target universal countermeasure patch can be reduced, and the attention of human eyes is not easily caused. Furthermore, by adding a mode of resisting patches, the directional attack can be simultaneously carried out at the physical level and the digital level, and the implementation is more convenient.

Description

Directional attack countermeasure patch generation method and device
Directional attack countermeasure patch generation method and device
Technical Field
The invention relates to the technical field of artificial intelligence security, in particular to a directional attack countermeasure patch generation method and device.
Background
Deep Neural Networks (DNNs) have achieved tremendous success in the fields of image classification, object detection, text classification, speech recognition, etc., and have been widely used in production and life. However, research in recent years has shown that deep learning networks are fragile and susceptible to challenge samples. The countermeasure sample is modified and disturbed by the clean sample, so that the trained neural network generates misclassification or misidentification, and the target task cannot be completed.
The existence of the countermeasure sample is twofold, on one hand, the countermeasure sample attacks or misleads the application generated based on the deep learning, such as the automobile driving and the face recognition system, thereby causing potential security threat, possibly causing economic loss or casualties. On the other hand, the training of the countermeasure sample on the deep neural network is valuable and beneficial, and the defense capability and robustness of the deep neural network can be effectively enhanced by using the countermeasure sample for performing the countermeasure training. Therefore, the research of the confrontation sample has an important promotion effect on the development of the artificial intelligence safety field. However, a method for generating a countermeasure patch for a black box model with an unknown structure is lacked in the prior art, and the application requirements for attack countermeasures and defense promotion of the black box model are difficult to meet.
Disclosure of Invention
The embodiment of the invention provides a directional attack anti-patch generation method and device, which are used for solving the problems that the generated anti-patch ignores the characteristics of mutual attention among models, the migration capability of the model attention area is weak, and the success rate is low when the directional attack is carried out on a black box model with an uncertain structure in the prior art.
The technical scheme of the invention is as follows:
in one aspect, the present invention provides a method for generating a directional attack countermeasure patch, including:
acquiring a plurality of white box models with the same task as a black box model to be attacked, wherein model structures and parameters of the white box models are different;
acquiring a random initialization counterattack patch, determining the target category of the directional attack, and updating and iterating the initialization counterattack patch by adopting each white box model in a plurality of continuous iteration cycles to obtain a target universal counterattack patch; wherein the output of a preceding iteration loop is taken as the input of a following iteration loop, each iteration loop comprising:
obtaining a plurality of undisturbed clean pictures, inputting each clean picture into a first white box model corresponding to a current iteration cycle, and outputting a first prediction contribution weight matrix and a first attention key area corresponding to each clean picture according to attention characteristics of the first white box model;
replacing and connecting random positions in each clean picture by adopting a first countermeasure patch input by the current iteration cycle to obtain a countermeasure sample corresponding to each clean picture;
adding the target category into labels of each countermeasure sample, inputting the labels into the first white box model, and calculating joint loss by adopting a preset loss function, wherein the preset loss function at least comprises countermeasure loss, attention transfer loss, triple loss and smooth loss, and the attention transfer loss is calculated according to a first prediction contribution weight matrix corresponding to each clean picture, a first attention key area and a random position adopted when the first countermeasure patch is connected;
and performing back propagation to update the countermeasure patch according to the combined loss value by a gradient descent method, repeating iteration, inputting a countermeasure sample corresponding to each iteration into the black box model to obtain a first confidence coefficient of an output target class, stopping iteration and outputting the current first countermeasure patch when the first confidence coefficient is greater than a preset confidence coefficient or the iteration frequency reaches a preset value.
In some embodiments, the predetermined loss function is a joint loss of opponent loss, attention transfer loss, triple loss, and smoothing loss, calculated as follows:
Figure 100002_DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE002
is the preset loss function;
Figure 100002_DEST_PATH_IMAGE003
outputting the countermeasure loss for the target class label associated with the probability;
Figure 100002_DEST_PATH_IMAGE006
for the attention transfer loss associated with the first white-box model region of interest migration,
Figure 100002_DEST_PATH_IMAGE007
a weight coefficient for the attention transfer loss;
Figure 100002_DEST_PATH_IMAGE008
in order for the loss of the smoothness to be said,
Figure 100002_DEST_PATH_IMAGE009
a weight coefficient that is the smoothing loss;
Figure 100002_DEST_PATH_IMAGE010
for the loss of the triplet in question,
Figure 100002_DEST_PATH_IMAGE011
the weight coefficients lost for the triples.
In some embodiments, the countermeasure loss
Figure 596977DEST_PATH_IMAGE003
The calculation formula of (A) is as follows:
Figure 100002_DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE013
in order to combat the sample,
Figure DEST_PATH_IMAGE015
probability of a target class output by a softmax layer after inputting a challenge sample into the first white-box model.
In some embodiments, the attention transfer loss
Figure 783239DEST_PATH_IMAGE006
The calculation formula of (A) is as follows:
Figure 100002_DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE017
the first prediction contribution weight matrix is used for representing the contribution degree of each area of the confrontation sample to model prediction;
Figure 100002_DEST_PATH_IMAGE018
is the first attention key region; mask is a binary mask marking the location of the first countermeasure patch,
Figure 100002_DEST_PATH_IMAGE019
the area value of the first anti-patch is 1, and the rest are 0;
further, the air conditioner is provided with a fan,
Figure 303082DEST_PATH_IMAGE017
the calculation formula of (A) is as follows:
Figure 100002_DEST_PATH_IMAGE020
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE021
Figure 100002_DEST_PATH_IMAGE022
a feature map representing the output of the last convolutional layer of the first white-box model
Figure 100002_DEST_PATH_IMAGE023
To (1) a
Figure 100002_DEST_PATH_IMAGE024
A channel pair
Figure 100002_DEST_PATH_IMAGE025
The degree of sensitivity of the class object,
Figure 100002_DEST_PATH_IMAGE026
representing an output probability of a tth class target of the first white-box model;
Figure 865257DEST_PATH_IMAGE023
outputting a characteristic diagram of the last layer of convolution layer of the white box model;
Figure 100002_DEST_PATH_IMAGE027
is a normalization constant;
Figure 100002_DEST_PATH_IMAGE028
and
Figure 100002_DEST_PATH_IMAGE029
respectively representing the row sequence and the line sequence corresponding to the pixels in the image;
further, the air conditioner is provided with a fan,
Figure 100002_DEST_PATH_IMAGE030
the calculation formula of (A) is as follows:
Figure 100002_DEST_PATH_IMAGE031
in some embodiments, the triplet penalty is calculated as:
Figure 100002_DEST_PATH_IMAGE032
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE033
Figure 100002_DEST_PATH_IMAGE034
for the purpose of the challenge sample,
Figure 100002_DEST_PATH_IMAGE035
is a one-hot vector of the target class,
Figure 100002_DEST_PATH_IMAGE036
is a one-hot vector of the true class,
Figure 100002_DEST_PATH_IMAGE037
a logits value for a target category label derived for the challenge sample input to the first white-box model,
Figure 100002_DEST_PATH_IMAGE038
is a threshold value.
In some embodiments, the smoothing loss is calculated as:
Figure 100002_DEST_PATH_IMAGE039
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE040
is the second in the first countermeasure patch
Figure 100002_DEST_PATH_IMAGE041
Column No. 2
Figure 100002_DEST_PATH_IMAGE042
The pixel values on the row.
In some embodiments, the initialization countermeasure patch is generated in a set size and shape.
In some embodiments, the initialization countermeasure patch is gaussian distributed-compliant noise.
In another aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method are implemented.
In another aspect, the present invention also provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the steps of the above-mentioned method.
The invention has the beneficial effects that:
in the directional attack countermeasure patch generation method and device, the method adopts a plurality of continuous white box models with different structures to update the countermeasure patch in an iterative manner, so that the obtained target universal countermeasure patch can have a better attack effect on a black box model with an unknown structure. By introducing triple loss, the success rate of outputting the target class can be improved in the process of directional attack. By introducing attention transfer loss, the migration effect of the target universal countermeasure patch on the model attention area can be improved, so that the directional attack effect of the target universal countermeasure patch is greatly improved. By introducing the smoothing loss, the difference between the pixel points of the target universal countermeasure patch can be reduced, and the attention of human eyes is not easily caused.
Furthermore, by adding a mode of resisting patches, the directional attack can be simultaneously carried out at the physical level and the digital level, and the implementation is more convenient.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present invention will be more clearly understood from the detailed description that follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
fig. 1 is a schematic flow chart of a method for generating a directional attack countermeasure patch according to an embodiment of the present invention;
FIG. 2 is a comparison of Vgg16, Resnet50 and inclusion V3 models to the same region of interest;
fig. 3 is a logic diagram of a method for generating a directional attack countermeasure patch according to an embodiment of the present invention;
FIG. 4 is a logic diagram of a single iteration loop in the directional attack countermeasure patch generation method of FIG. 3;
fig. 5 is a logic diagram of a method for generating a directional attack countermeasure patch according to another embodiment of the present invention;
fig. 6 is a logic diagram of a single iteration loop in the directional attack countermeasure patch generation method described in fig. 5.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
In the era of Deep Learning (DL) algorithm-driven data computation, it is important to ensure the safety and robustness of the algorithm. The increase of computer processing capability at present enables Deep Learning (DL) to be widely applied to processing various Machine Learning (ML) tasks, such as image classification, natural language processing, game theory and the like, but the potential safety hazard of deep learning is also exposed. It has been found that by adding specific noise or disturbances to benign samples, the neural network model can be easily tricked into making false judgments, and the added noise or disturbances are difficult to perceive. This counter-attack poses a significant risk to the generation of neural network models deployed in life. The counterattack can be classified into white-box attack and black-box attack according to the degree of recognition of the model structure and parameters by an attacker. In a white-box attack, an attacker has complete knowledge about his white-box model, including the model architecture and parameters; in the black box attack, an attacker does not know the structure and parameters of the black box model.
For research and defense against attacks, firstly, a countermeasure sample capable of causing model error prediction needs to be generated, and the countermeasure sample can be generated by perturbing a clean picture (original benign sample). For the white-box model in the white-box attack, since the structure and parameters of the white-box model are known, the countermeasure sample can be made by various methods. For the black box model in the black box attack, the returned results of query access are mostly relied on to generate the countermeasure sample. Based on the difference of attack effect, the method can be further divided into directional attack and non-directional attack. The directional attack means that the attack resisting method can specify the type after the attack aiming at the input sample, and the difficulty of specifying the specific type after the attack is higher. The non-directional attack means that the specific category is not concerned as long as the predicted result of the countercheck sample generated by the attack method is not the correct category.
In the prior art, the mode of generating the countermeasure sample for the white box model is mature, and a higher countermeasure effect can be obtained by generating the countermeasure patch and randomly replacing the countermeasure patch in a clean picture. When the method is applied to the black box model, the countermeasure effect is poor due to the fact that the countermeasure patch cannot be generated efficiently. Furthermore, under the condition of directional attack, because the structure and parameters of the black box model are unknown, the difficulty of completing the directional attack is higher, and the prior art cannot generate corresponding countersamples.
The invention provides a directional attack countermeasure patch generation method, which is used for generating a universal patch capable of being applied to various pictures to carry out directional attack on a specific black box model, and as shown in figures 1, 3 and 5, the method comprises the following steps of S101-S102:
it should be noted that, the steps S101 to S102 and S1021 to S1024 described in this embodiment are not limited to the order of the steps, and it should be understood that, under certain conditions, some steps may be parallel or the order may be changed.
Step S101: and acquiring a plurality of white box models with the same task as the black box model to be attacked, wherein the model structures and parameters of the white box models are different.
Step S102: and obtaining a random initialization counterpatch, determining the target category of the directional attack, and updating and iterating the initialization counterpatch by adopting each white box model in a plurality of continuous iteration cycles to obtain a target universal counterpatch.
The output of the previous iteration loop is used as the input of the next iteration loop, as shown in fig. 4 and 6, each iteration loop comprises steps S1021 to S1024:
step S1021: the method comprises the steps of obtaining a plurality of undisturbed clean pictures, inputting each clean picture into a first white box model corresponding to a current iteration cycle, and outputting a first prediction contribution weight matrix and a first attention key area corresponding to each clean picture according to attention characteristics of the first white box model.
Step S1022: and replacing and connecting the random positions in each clean picture by adopting a first countermeasure patch input by the current iteration cycle to obtain a countermeasure sample corresponding to each clean picture.
Step S1023: and adding the target category into labels of each pair of resisting samples, inputting the labels into a first white box model, and calculating joint loss by adopting a preset loss function, wherein the preset loss function at least comprises resisting loss, attention transfer loss, triple loss and smooth loss, and the attention transfer loss is calculated according to a first prediction contribution weight matrix corresponding to each clean picture, a first attention key area and a random position adopted when connecting a first resisting patch.
Step S1024: and performing back propagation to update the countermeasure patch according to the combined loss value by a gradient descent method, repeating iteration, inputting the countermeasure sample corresponding to each iteration into the black box model to obtain a first confidence coefficient of an output target class, and stopping iteration and outputting the current first countermeasure patch when the first confidence coefficient is greater than a preset confidence coefficient or the iteration times reach a preset value.
In step S101, in order to obtain an efficient countermeasure patch for the black box model, the embodiment updates the iterative countermeasure patch with a plurality of white box models with different structures and parameters to obtain higher robustness. Specifically, the tasks executed by the white box models are the same as those of the black box model to be attacked, and different neural network structures can be adopted among the white box models, or models with different parameters generated by training based on the same neural network structure can be adopted. The architecture, parameter values, training methods of the respective white-box models should be known.
In step S102, an initialization patch, which may be generated in a set size and shape, is randomly generated. Further, the size of the clean picture can be scaled according to a set ratio, and the size of the patch is initialized. The shape of the initialization patch can be set according to the needs of the actual application scenario, such as a circle, an ellipse, a square, a rectangle, or other shapes. Since the present embodiment is used for the directional attack, it is further necessary to determine the target category of the directional attack, that is, the result that is desired to be finally output after the black box model is processed. In step S102, the initialization patch is continuously updated by using a plurality of white-box models with known structures and parameters based on a gradient descent method, so that the continuously updated countermeasure patch can adapt to white-box models with different structures and different parameters, that is, can better adapt to a black-box model. Each white-box model is subjected to an iterative loop, so that the input counterpatch adapts to the white-box model corresponding to the current loop.
Specifically, in an iterative loop, the countermeasure patch is propagated backward by a gradient descent method under the condition that the structure and parameters of a single white-box model are known. In step S1021, the clean picture may be obtained from an existing database, or may be acquired according to actual needs. Specifically, in order to introduce the attention feature of the model to the input image and serve as a basis for subsequently adjusting the countercheck patch, the obtained clean picture is input into the first white-box model corresponding to the current iteration loop in the embodiment, and since the structure and parameters of the first white-box model are known, the first prediction contribution weight matrix and the first attention key region can be calculated. Specifically, after the clean picture is input into the first white-box model, a feature map output by the last layer of convolution layer of the first white-box model is obtained. Then
Figure DEST_PATH_IMAGE044
Class object the first of the feature map
Figure DEST_PATH_IMAGE045
The sensitivity of each channel is
Figure DEST_PATH_IMAGE046
Further will
Figure 343511DEST_PATH_IMAGE046
The weighted linear combination of the feature maps of the last layer is sent to an activation function as a weight to obtain a first prediction contribution weight matrix
Figure DEST_PATH_IMAGE047
Further, the air conditioner is provided with a fan,
Figure 813807DEST_PATH_IMAGE047
the calculation formula of (A) is as follows:
Figure DEST_PATH_IMAGE048
(1)
Figure DEST_PATH_IMAGE049
(2)
wherein the content of the first and second substances,
Figure 137341DEST_PATH_IMAGE046
a feature map representing the output of the last convolutional layer of the first white-box model
Figure DEST_PATH_IMAGE050
To (1) a
Figure 757285DEST_PATH_IMAGE045
A channel pair
Figure DEST_PATH_IMAGE051
The degree of sensitivity of the class object,
Figure DEST_PATH_IMAGE052
represents the first white-box model
Figure 422752DEST_PATH_IMAGE051
Output probability of class object;
Figure 895191DEST_PATH_IMAGE050
outputting a characteristic diagram of the last layer of convolution layer of the white box model;
Figure DEST_PATH_IMAGE053
is a normalization constant;
Figure DEST_PATH_IMAGE054
and
Figure DEST_PATH_IMAGE055
respectively representing the column sequence number and the line sequence number corresponding to the pixel in the image.
Further, the first attention key region of the first white-box model is
Figure DEST_PATH_IMAGE056
The calculation formula is:
Figure DEST_PATH_IMAGE057
(3)
wherein, the threshold value can be adjusted according to the actual application requirement.
In step S1022, each clean picture is processed by using the first countermeasure patch input into the current iteration loop, so as to obtain a countermeasure sample. The first anti-patch is added in the clean picture in a random replacement connection mode, and the connection position can adopt a two-dimensional mask
Figure DEST_PATH_IMAGE058
The mark is marked on the surface of the substrate,
Figure DEST_PATH_IMAGE059
the area value of the countermeasure patch is 1, and the rest is 0. Thus, the challenge sample may be expressed as:
Figure DEST_PATH_IMAGE060
(4)
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE061
in order to perform the dot-product operation,
Figure DEST_PATH_IMAGE062
is the first countermeasure patch.
In a specific implementation process, the first anti-aliasing patch can be randomly translated, scaled and rotated and then connected with the clean picture.
In step S1023, based on the first white-box model with known structure and parameters, the first countermeasure patch is subjected to update iteration so as to satisfy the condition that the directional attack is completed on the first white-box model. Specifically, in order to enable the first countermeasure patch to effectively migrate the region of interest of the first white-box model to the position where the patch is located, the embodiment introduces attention-shifting loss in the loss function; in order to distinguish the target class obtained by the generated confrontation sample from the original class, the embodiment introduces the triple loss, so that the features of the same label are as close as possible in spatial position, and the features of different labels are as far as possible in spatial position, and meanwhile, in order to prevent the features of the sample from being aggregated into a very small space, the positive example and the negative example of the same class are required to be at least far from each other by a threshold value compared with the positive example
Figure DEST_PATH_IMAGE063
. Further, in order to improve the naturalness of the countermeasure patch, it is in conformity with the vision of human eyes, and a smoothing loss is also introduced. In addition to this, there is a penalty on the success rate of the attack, i.e. the probability of the target class tag output. The synthetic confrontation loss, the attention transfer loss, the triple loss and the smoothing loss constitute a preset loss function.
Specifically, in some embodiments, the predetermined loss function is a combined loss of the opponent loss, the attention-transfer loss, the triple loss, and the smoothing loss, and is calculated as follows:
Figure DEST_PATH_IMAGE064
(5)
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE065
is a preset loss function;
Figure DEST_PATH_IMAGE066
outputting a probability-related confrontation loss for the target class label;
Figure DEST_PATH_IMAGE067
for the attention transfer loss associated with the first white-box model region of interest migration,
Figure DEST_PATH_IMAGE068
a weight coefficient that is a loss of attention transfer;
Figure DEST_PATH_IMAGE069
in order to smooth out the losses,
Figure DEST_PATH_IMAGE070
a weight coefficient that is a smoothing loss;
Figure DEST_PATH_IMAGE071
for the loss of a triplet, the loss of the triplet,
Figure DEST_PATH_IMAGE072
the weight coefficients lost for the triples.
In some embodiments, combat losses
Figure 999299DEST_PATH_IMAGE066
The calculation formula of (A) is as follows:
Figure DEST_PATH_IMAGE073
(6)
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE074
in order to combat the sample,
Figure DEST_PATH_IMAGE075
probability of target class output by softmax layer after inputting the first white-box model for the challenge sample.
In some embodiments, attention transfer loss
Figure 647318DEST_PATH_IMAGE067
The calculation formula of (A) is as follows:
Figure DEST_PATH_IMAGE076
(7)
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE077
a first prediction contribution weight matrix used for representing the contribution degree of each area of the confrontation sample to the model prediction;
Figure DEST_PATH_IMAGE078
is a first attention critical region; mask is a binary mask that marks where the first countermeasure patch is located,
Figure DEST_PATH_IMAGE079
the area value of the first countermeasure patch is 1, and the rest are 0.
Specifically, different neural network models first extract different features before making a correct decision, and then assign appropriate weights to the features, i.e., assign appropriate attention to the extracted features. Although different model network architectures differ, the features of interest to the model tend to be the same. As shown in fig. 2, when Vgg16, respet 50 and inclusion V3 identify images of cats, there are significant differences in the areas of interest of the three models (the highlighted areas indicated by the arrows are the areas of interest), Vgg16 focuses only on the face of the cat, respet 50 focuses on the face and neck of the cat, and inclusion V3 combines features of the face, neck and part of the forelimb of the cat, but overall all models tend to focus on features related to the face of the cat. In view of such characteristics, the present embodiment may cause the first countermeasure patch to suppress the attention feature of the first white-box model and transfer the attention feature of the first white-box model from the target region to the non-target region in the update iteration by introducing attention transfer loss, which may cause misclassification of the model since the first white-box model no longer focuses on objects within the key region.
In some embodiments, the triplet penalty is calculated as:
Figure DEST_PATH_IMAGE080
(8)
Figure DEST_PATH_IMAGE081
(9)
wherein the content of the first and second substances,
Figure 152861DEST_PATH_IMAGE074
in order to combat the sample,
Figure DEST_PATH_IMAGE082
is a one-hot vector of the target class,
Figure DEST_PATH_IMAGE083
is a one-hot vector of the true class,
Figure DEST_PATH_IMAGE084
to combat the logits value of the target category label resulting from the sample input to the first white-box model,
Figure 286164DEST_PATH_IMAGE063
is a threshold value.
In a directional attack, the loss function is always only related to the object class. However, the generated challenge samples may be too close to the original classes, and thus the challenge samples are still classified as original classes by the target model. Thus, the present embodiment introduces triple losses
Figure DEST_PATH_IMAGE085
Is aimed atFeatures of the same label are spatially as close as possible, while features of different labels are spatially as far apart as possible, and to avoid clustering features of a sample into a very small space requires that positive and negative examples of the same class are at least thresholded further away than the positive example
Figure 764550DEST_PATH_IMAGE063
In some embodiments, the smoothing loss is calculated as:
Figure DEST_PATH_IMAGE086
(10)
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE087
is the second in the first countermeasure patch
Figure DEST_PATH_IMAGE088
Column No. 2
Figure DEST_PATH_IMAGE089
The pixel values on the row.
In order to further improve the naturalness of the countermeasure patch to conform to the vision of human eyes, the smoothness of the countermeasure patch can be improved by reducing the square of the difference between adjacent pixels. Smoothing is useful to improve the robustness of the antagonistic case in the physical environment. This embodiment therefore introduces a smoothing penalty in the physical environment.
In step S1024, the first countermeasure patch is updated by using a gradient descent method according to the joint loss, the countermeasure sample obtained in each iteration is input into the black box model to be attacked, and a first confidence coefficient about the target class is output, where the first confidence coefficient may be used as a parameter for determining whether to end the iteration of updating the first countermeasure patch. And only when the first confidence coefficient meets the requirement, the first anti-patch obtained through updating iteration is considered to be capable of generating the expected attack effect on the black box model to be attacked. Or stopping updating when the set iteration number reaches a set value.
In another aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method are implemented.
In another aspect, the present invention also provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the steps of the above-mentioned method.
The invention is illustrated below with reference to specific examples:
example 1
The embodiment provides a directional attack anti-patch generation method, which is used for performing directional attack on a black box model executing a specific task, and as shown in fig. 3 and 4, the method specifically includes the following steps:
1. and acquiring a plurality of white box models with the same task as the black box model to be attacked, wherein the model structures and parameters of the white box models are different.
2. And obtaining a random initialization counterpatch, determining the target category of the directional attack, and updating and iterating the initialization counterpatch by adopting each white box model in a plurality of continuous iteration cycles to obtain a target universal counterpatch. Through a plurality of white box models with known structures and parameters, the countercheck patches are continuously updated in an iterative manner, so that the final countercheck patches can have universality for all the white box models, and directional attack on the black box model to be attacked can be realized under the task.
Specifically, when a white-box model is used to iteratively update a countermeasure patch, the method includes:
2.1, obtaining a plurality of undisturbed clean pictures, inputting each clean picture into a first white box model corresponding to the current iteration cycle, and outputting a first prediction contribution weight matrix and a first attention key area corresponding to each clean picture according to the attention feature of the first white box model. The feature of interest of the first white-box model for each clean picture is obtained here in order to calculate the migration effect of the first anti-patch on the feature of interest of the model as a parameter in step 2.3.
And 2.2, replacing and connecting the random positions in each clean picture by adopting a first countermeasure patch input by the current iteration cycle to obtain a countermeasure sample corresponding to each clean picture.
And 2.3, adding the target category into labels of each countermeasure sample, inputting the labels into a first white box model, and calculating joint loss by adopting a preset loss function, wherein the preset loss function at least comprises countermeasure loss, attention transfer loss, triple loss and smooth loss, and the attention transfer loss is calculated according to a first prediction contribution weight matrix corresponding to each clean picture, a first attention key area and a random position adopted when connecting a first countermeasure patch.
And 2.4, performing back propagation to update the countermeasure patch according to the joint loss value by a gradient descent method, repeating iteration, stopping iteration when the iteration times reach a preset value, and outputting the current first countermeasure patch.
Finally, the countercheck patch output by the previous iteration loop is used as the input of the next iteration loop, and through continuous updating iteration, the countercheck patch with universality is finally obtained and can be used for directional attack on the black box model to be attacked.
Example 2
On the basis of embodiment 1, as shown in fig. 5 and fig. 6, in each iteration loop, the countermeasure sample of each update iteration is input to the black box model to be attacked to output a first confidence degree about the target class, the first confidence degree is used as a condition for stopping the update, when the first confidence degree reaches a preset confidence degree, the update in the current iteration loop is stopped, and the current first countermeasure patch is output.
The method has the advantages that most of the existing attack methods generate pixel-level perturbation and are superposed on the original image, so that the method is difficult to realize in the physical world, and the countermeasure patch generated by the method can be printed out and has certain practical significance when being applied to the physical world. The existing method ignores the common concerned characteristics among the models, does not utilize the common concerned characteristics among the models, and has the same attack effect on a white box but has poor attack effect on a black box. The attention transfer loss adopted in the invention inhibits the characteristics concerned by different models, and transfers the characteristics concerned by the models from the key area to the area where the counterpatch is located, thus, the models are misclassified because the models do not concern objects in the key area. Therefore, the effect of the attack on the black box is good. The method and the device introduce triple losses in face recognition aiming at the requirement of the directional attack, and can improve the attack success rate of the directional attack. From the visual angle, if the difference between the pixel values of the adjacent pixel points is too large, the pixel values of the adjacent pixel points are unnatural, and the attention of human eyes is easy to draw. Therefore, smooth loss is provided, and the phenomenon that the pixel value difference of adjacent pixel points is too large is avoided. In an attack mode, the method is connected with a plurality of white box models in series to continuously fit the gradient of the black box, finally generates a universal countermeasure patch, and improves the success rate of resisting the attack of the sample on the black box.
In summary, in the directional attack countermeasure patch generation method and apparatus, the method employs a plurality of continuous white-box models with different structures to iteratively update the countermeasure patch, so that the obtained target universal countermeasure patch can have a better attack effect on the black-box model with an unknown structure. By introducing triple loss, the success rate of outputting the target class can be improved in the process of directional attack. By introducing attention transfer loss, the migration effect of the target universal countermeasure patch on the model attention area can be improved, so that the directional attack effect of the target universal countermeasure patch is greatly improved. By introducing the smoothing loss, the difference between the pixel points of the target universal countermeasure patch can be reduced, and the attention of human eyes is not easily caused.
Furthermore, by adding a mode of resisting patches, the directional attack can be simultaneously carried out at the physical level and the digital level, and the implementation is more convenient.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A directional attack countermeasure patch generation method is characterized by comprising the following steps:
acquiring a plurality of white box models with the same task as a black box model to be attacked, wherein model structures and parameters of the white box models are different;
acquiring a random initialization counterattack patch, determining the target category of the directional attack, and updating and iterating the initialization counterattack patch by adopting each white box model in a plurality of continuous iteration cycles to obtain a target universal counterattack patch; wherein the output of a preceding iteration loop is taken as the input of a following iteration loop, each iteration loop comprising:
obtaining a plurality of undisturbed clean pictures, inputting each clean picture into a first white box model corresponding to a current iteration cycle, and outputting a first prediction contribution weight matrix and a first attention key area corresponding to each clean picture according to attention characteristics of the first white box model;
replacing and connecting random positions in each clean picture by adopting a first countermeasure patch input by the current iteration cycle to obtain a countermeasure sample corresponding to each clean picture;
adding the target category into labels of each countermeasure sample, inputting the labels into the first white box model, and calculating joint loss by adopting a preset loss function, wherein the preset loss function at least comprises countermeasure loss, attention transfer loss, triple loss and smooth loss, and the attention transfer loss is calculated according to a first prediction contribution weight matrix corresponding to each clean picture, a first attention key area and a random position adopted when the first countermeasure patch is connected;
and performing back propagation to update the countermeasure patch according to the combined loss value by a gradient descent method, repeating iteration, inputting a countermeasure sample corresponding to each iteration into the black box model to obtain a first confidence coefficient of an output target class, stopping iteration and outputting the current first countermeasure patch when the first confidence coefficient is greater than a preset confidence coefficient or the iteration frequency reaches a preset value.
2. The method of generating a directional-attack-countermeasure patch according to claim 1, wherein the predetermined loss function is a joint loss of a countermeasure loss, an attention-transfer loss, a triple loss, and a smoothing loss, and is calculated as follows:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE002
is the preset loss function;
Figure DEST_PATH_IMAGE003
outputting the countermeasure loss for the target class label associated with the probability;
Figure DEST_PATH_IMAGE004
for the attention transfer loss associated with the first white-box model region of interest migration,
Figure DEST_PATH_IMAGE005
a weight coefficient for the attention transfer loss;
Figure DEST_PATH_IMAGE006
in order for the loss of the smoothness to be said,
Figure DEST_PATH_IMAGE007
a weight coefficient that is the smoothing loss;
Figure DEST_PATH_IMAGE008
for the loss of the triplet in question,
Figure DEST_PATH_IMAGE009
the weight coefficients lost for the triples.
3. A directional-attack-countermeasure patch generation method as claimed in claim 2, wherein the countermeasure loss is
Figure 85659DEST_PATH_IMAGE003
The calculation formula of (A) is as follows:
Figure DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE011
in order to combat the sample,
Figure DEST_PATH_IMAGE012
probability of a target class output by a softmax layer after inputting a challenge sample into the first white-box model.
4. A directed attack countermeasure patch generation method as claimed in claim 3, wherein the attention diversion loss
Figure DEST_PATH_IMAGE013
The calculation formula of (A) is as follows:
Figure DEST_PATH_IMAGE014
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE016
the first prediction contribution weight matrix is used for representing the contribution degree of each area of the confrontation sample to model prediction;
Figure DEST_PATH_IMAGE017
is the first attention key region; mask is a binary mask marking the location of the first countermeasure patch,
Figure DEST_PATH_IMAGE018
the area value of the first anti-patch is 1, and the rest are 0;
further, the air conditioner is provided with a fan,
Figure DEST_PATH_IMAGE019
the calculation formula of (A) is as follows:
Figure DEST_PATH_IMAGE020
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE021
Figure DEST_PATH_IMAGE022
a feature map representing the output of the last convolutional layer of the first white-box model
Figure DEST_PATH_IMAGE023
To (1) a
Figure DEST_PATH_IMAGE024
A channel pair
Figure DEST_PATH_IMAGE025
The degree of sensitivity of the class object,
Figure DEST_PATH_IMAGE026
representing an output probability of a tth class target of the first white-box model;
Figure DEST_PATH_IMAGE027
outputting a characteristic diagram of the last layer of convolution layer of the white box model;
Figure DEST_PATH_IMAGE028
is a normalization constant;
Figure DEST_PATH_IMAGE029
and
Figure DEST_PATH_IMAGE030
respectively representing the row sequence and the line sequence corresponding to the pixels in the image;
further, the air conditioner is provided with a fan,
Figure DEST_PATH_IMAGE031
the calculation formula of (A) is as follows:
Figure DEST_PATH_IMAGE032
5. the method of generating a directional-attack-countermeasure patch according to claim 4, wherein the triple penalty is calculated as:
Figure DEST_PATH_IMAGE033
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE034
Figure DEST_PATH_IMAGE035
for the purpose of the challenge sample,
Figure DEST_PATH_IMAGE036
is a one-hot vector of the target class,
Figure DEST_PATH_IMAGE037
is a one-hot vector of the true class,
Figure DEST_PATH_IMAGE038
a logits value for a target category label derived for the challenge sample input to the first white-box model,
Figure DEST_PATH_IMAGE039
is a threshold value.
6. A directional-attack-antipodal patch generation method according to claim 5, characterized in that the calculation of the smoothing loss is:
Figure DEST_PATH_IMAGE040
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE041
is the second in the first countermeasure patch
Figure DEST_PATH_IMAGE042
Column No. 2
Figure DEST_PATH_IMAGE043
The pixel values on the row.
7. A directional attack countermeasure patch generation method according to claim 1, wherein the initialization countermeasure patch is generated in a set size and shape.
8. A directional attack countermeasure patch generation method according to claim 1, wherein the initialization countermeasure patch is gaussian distributed-compliant noise.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the processor executes the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
CN202110646139.3A 2021-06-10 2021-06-10 Directional attack countermeasure patch generation method and device Active CN113255816B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110646139.3A CN113255816B (en) 2021-06-10 2021-06-10 Directional attack countermeasure patch generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110646139.3A CN113255816B (en) 2021-06-10 2021-06-10 Directional attack countermeasure patch generation method and device

Publications (2)

Publication Number Publication Date
CN113255816A CN113255816A (en) 2021-08-13
CN113255816B true CN113255816B (en) 2021-10-01

Family

ID=77187320

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110646139.3A Active CN113255816B (en) 2021-06-10 2021-06-10 Directional attack countermeasure patch generation method and device

Country Status (1)

Country Link
CN (1) CN113255816B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113689338B (en) * 2021-09-08 2024-03-22 北京邮电大学 Method for generating scaling robustness countermeasure patch
CN113792806A (en) * 2021-09-17 2021-12-14 中南大学 Anti-patch generation method
CN114742170B (en) * 2022-04-22 2023-07-25 马上消费金融股份有限公司 Countermeasure sample generation method, model training method, image recognition method and device
CN115544499B (en) * 2022-11-30 2023-04-07 武汉大学 Migratable black box anti-attack sample generation method and system and electronic equipment
CN117253094B (en) * 2023-10-30 2024-05-14 上海计算机软件技术开发中心 Method, system and electronic equipment for generating contrast sample by image classification system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11636332B2 (en) * 2019-07-09 2023-04-25 Baidu Usa Llc Systems and methods for defense against adversarial attacks using feature scattering-based adversarial training
CN110956185B (en) * 2019-11-21 2023-04-18 大连理工大学人工智能大连研究院 Method for detecting image salient object
CN111898645A (en) * 2020-07-03 2020-11-06 贵州大学 Movable sample attack resisting method based on attention mechanism
CN112085069B (en) * 2020-08-18 2023-06-20 中国人民解放军战略支援部队信息工程大学 Multi-target countermeasure patch generation method and device based on integrated attention mechanism

Also Published As

Publication number Publication date
CN113255816A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN113255816B (en) Directional attack countermeasure patch generation method and device
CN109948658B (en) Feature diagram attention mechanism-oriented anti-attack defense method and application
Wiyatno et al. Adversarial examples in modern machine learning: A review
Carlini et al. Towards evaluating the robustness of neural networks
Huang et al. Adversarial attacks on neural network policies
Wang et al. Fca: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack
CN112215251A (en) System and method for defending against attacks using feature dispersion based countermeasure training
CN111753881A (en) Defense method for quantitatively identifying anti-attack based on concept sensitivity
CN111027628B (en) Model determination method and system
CN111737691A (en) Method and device for generating confrontation sample
CN113643278B (en) Method for generating countermeasure sample for unmanned aerial vehicle image target detection
Gragnaniello et al. Perceptual quality-preserving black-box attack against deep learning image classifiers
CN111754519B (en) Class activation mapping-based countermeasure method
CN111178504B (en) Information processing method and system of robust compression model based on deep neural network
CN113066002A (en) Generation method of countermeasure sample, training method of neural network, training device of neural network and equipment
CN115481716A (en) Physical world counter attack method based on deep network foreground activation feature transfer
Khan et al. A hybrid defense method against adversarial attacks on traffic sign classifiers in autonomous vehicles
Guesmi et al. Advart: Adversarial art for camouflaged object detection attacks
CN113935396A (en) Manifold theory-based method and related device for resisting sample attack
CN113240080A (en) Prior class enhancement based confrontation training method
CN117011508A (en) Countermeasure training method based on visual transformation and feature robustness
Du et al. Local aggregative attack on SAR image classification models
CN114359653A (en) Attack resisting method, defense method and device based on reinforced universal patch
CN114202678A (en) Anti-attack method, system and storage medium in license plate character recognition
WO2022222087A1 (en) Method and apparatus for generating adversarial patch

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant