CN112861601A

CN112861601A - Method for generating confrontation sample and related equipment

Info

Publication number: CN112861601A
Application number: CN202011413261.8A
Authority: CN
Inventors: 吴炜滨; 赵沛霖
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2021-05-28

Abstract

The application relates to the technical field of artificial intelligence, and particularly provides a method for generating a confrontation sample and related equipment, wherein the method comprises the following steps: obtaining a first sample set, wherein the first sample set comprises a plurality of first samples and labels corresponding to the first samples, and the labels corresponding to the first samples are first classification categories obtained by classifying the first samples through a target model; training the first model through the first sample set to obtain a second model; the first model is constructed according to a trained open source model; and performing white-box attack on the second model through second samples to generate countermeasure samples corresponding to the second samples, wherein the countermeasure samples are used for performing countermeasure training on the target model. The scheme can realize the targeted generation of the countermeasure sample for the target model.

Description

Method for generating confrontation sample and related equipment

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a method for generating a confrontation sample and related equipment.

Background

With the maturation of deep learning techniques, models constructed based on neural networks are widely used in various classification tasks, such as for classifying images, classifying texts, classifying voices, and the like. Upon training a model through samples, the model may be made to automatically classify objects (e.g., images, text, speech, etc.). In practical applications, some objects including interference inevitably exist, and if the objects including interference need to be accurately classified by the model, the model is required to have high capability of resisting interference.

In order to improve the anti-interference capability of the model, the prior art generally performs countermeasure training on the model through countermeasure samples. How to specifically generate countermeasure samples for the model is a problem to be solved in the prior art.

Disclosure of Invention

The embodiment of the application provides a method and related equipment for generating countermeasure samples, which can realize targeted generation of countermeasure samples for a target model.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to an aspect of an embodiment of the present application, there is provided a method of generating a challenge sample, including: obtaining a first sample set, wherein the first sample set comprises a plurality of first samples and labels corresponding to the first samples, and the labels corresponding to the first samples are first classification categories obtained by classifying the first samples through a target model; training the first model through the first sample set to obtain a second model; the first model is constructed according to a trained open source model; and performing white-box attack on the second model through second samples to generate countermeasure samples corresponding to the second samples, wherein the countermeasure samples are used for performing countermeasure training on the target model.

According to an aspect of an embodiment of the present application, there is provided an apparatus for generating a challenge sample, including: the system comprises an acquisition module, a classification module and a processing module, wherein the acquisition module is used for acquiring a first sample set, the first sample set comprises a plurality of first samples and labels corresponding to the first samples, and the labels corresponding to the first samples are first classification categories obtained by classifying the first samples through a target model; the training module is used for training the first model through the first sample set to obtain a second model; the first model is constructed according to a trained open source model; and the attack module is used for carrying out white box attack on the second model through a second sample and generating a countermeasure sample corresponding to the second sample, wherein the countermeasure sample is used for carrying out countermeasure training on the target model.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method as described above.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor, implement a method as described above.

In the scheme of the application, a first classification category obtained by classifying a first sample by a target model is used as a label of the first sample, and the first model is trained through the first sample and the label of the first sample, so that the decision behavior of the target model can be learned by the first model in the training process, and then, in the process of attacking the second model through a second sample, the decision behavior of the target model can be accurately simulated by the second model, so that an antagonistic sample generated by attacking the second model through the second sample has pertinence on the target model.

Moreover, the known open source model is adopted to construct the first model, and then the first model is trained to obtain the second model, the structure and parameters of the open source model are known, the structure and parameters of the first model are correspondingly known, and the parameters and structure of the second model are also known because the second model is obtained by locally training the first model. On the basis that the structure and parameters of the second model are known, the second model can be adopted to generate the countersample by adopting a white box attack method, and the countersample does not need to be generated by a more expensive black box attack method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 illustrates a system architecture diagram to which the subject technology of embodiments of the present application may be applied;

FIG. 2 is a flow diagram illustrating a method of generating a challenge sample according to one embodiment of the present application;

FIG. 3 is a flow diagram of step 220 in one embodiment;

FIG. 4 is a schematic diagram illustrating the structure of a model in accordance with one embodiment;

FIG. 5 is a flow diagram of step 310 in one embodiment;

FIG. 6 is a flow chart of step 310 in another embodiment;

FIG. 7 is a flow diagram of step 230 in one embodiment;

FIG. 8 is a block diagram illustrating an apparatus to generate countermeasure samples in accordance with one embodiment;

FIG. 9 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like.

With the development of artificial intelligence technology, the application of automatic classification and recognition based on artificial intelligence is more and more extensive, such as image recognition classification, speech classification, text recognition classification, and the like. Image classification recognition, such as recognizing expressions of figures in an image, classifying and recognizing animals in the image, classifying and recognizing plants in the image, classifying and recognizing buildings in the image, and the like; text recognition classification, for example, automatically subject matter classification of content in text, and the like.

Specifically, the automatic classification and identification based on artificial intelligence is realized through a neural network model, the neural network model is trained through a sample, and then the classification and identification are carried out through the trained neural network model.

For a neural network model for implementing classification, in order to improve robustness and anti-interference capability of the neural network model, it is generally required to perform countermeasure training on the neural network model through countermeasure samples. The antagonizing sample refers to a sample which is formed by deliberately adding slight interference on a normal sample and can mislead decision judgment of a neural network model. The antagonism training refers to training a neural network model by using an antagonism sample so that the neural network model can learn to resist local disturbance and improve the anti-interference capability.

The scheme of the application can be realized by generating the countermeasure sample for the model in a targeted manner, so that the generated countermeasure sample is used for performing countermeasure training on the model, and the anti-interference capability of the model is improved.

Fig. 1 shows a system architecture diagram to which the technical solution of the embodiment of the present application can be applied.

As shown in fig. 1, the system architecture may include a terminal device (e.g., one or more of a smartphone 101, a tablet computer 102, and a portable computer 103 shown in fig. 1, but may also be a desktop computer, etc.), a network 104, and a server 105. The network 104 serves as a medium for providing communication links between terminal devices and the server 105. Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.

In an embodiment of the application, a server may obtain a first sample set and a second sample uploaded by a terminal device, and train a first model according to a first sample in the first sample set and a label corresponding to the first sample, where the label corresponding to the first sample is a first classification category obtained by classifying the first sample by a target model; after the training is finished, taking the first model as a second model; and performing white-box attack on the second model through the second sample to obtain a confrontation sample corresponding to the second sample.

After the server generates the confrontation sample, the confrontation sample can be returned to the terminal device, so that the terminal device can use the confrontation sample for training the target model. In some embodiments of the present application, the server performs a challenge training on the target model through the challenge sample after generating the challenge sample for the second sample.

It should be noted that the method for generating the countermeasure sample provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the apparatus for generating the countermeasure sample is generally disposed in the server 105. However, in other embodiments, the terminal device may have a similar function as the server, so as to execute the method for generating the countermeasure sample provided in the embodiment of the present application.

The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:

fig. 2 is a flow diagram illustrating a method of generating a challenge sample, which may be performed by a processing-capable computer device, such as a server, according to one embodiment of the present application. Referring to fig. 2, the method for generating a challenge sample at least includes steps 210 to 230, which are described in detail below.

Step 210, obtaining a first sample set, where the first sample set includes a plurality of first samples and labels corresponding to the first samples, where the labels corresponding to the first samples are first classification categories obtained by classifying the first samples through a target model.

Step 220, training the first model through the first sample set to obtain a second model; the first model is constructed from a trained open source model.

The first sample refers to a sample used for training the first model. The first classification category is a classification category obtained by classifying the first sample by the target model.

It can be understood that, for the target model, the data types corresponding to the first sample and the second sample in the following text are different according to the object to be classified. Specifically, if the object to be classified by the target model is an image, the first sample and the second model in the following text are also images; if the object to be classified of the target model is a text, the first sample and the second sample are also texts; if the object to be classified by the target model is speech, the first sample and the second sample are also speech.

It is understood that, before step 210, the first sample is classified by the target model to obtain a first classification category corresponding to the first sample, and then the classification category corresponding to the first sample is used as a label of the first sample to construct the first sample set.

An open source model refers to a model whose structure is known and which has been trained. In some embodiments of the present application, in order to ensure that the trained second model can generate countermeasure samples for the target model in a targeted manner, the object selected for classification by the open source model used for constructing the first model is the same as the object classified by the target model. For example, if the target model is used to classify the image, the selected open-source model is also used to classify the image; if the target model is used for classifying the text, the selected open-source model is also used for classifying the text.

In some embodiments of the present application, if the target model is for classifying an image, an open source model for image classification may be selected to construct the first model, such as ResNet, VGG Net, and inclusion.

The first model may be constructed according to one open source model, or may be constructed according to two or more open source models. For example, an open source model may be used directly as the first model; the multiple open source models may also be weighted, and the model obtained by weighting the multiple open source models may be used as the first model.

In a scene where the first model is constructed by weighting a plurality of open source models, the weighting coefficients of the open source models may be equal or unequal, and may be configured specifically according to actual needs. Preferably, the weighting coefficients of the open source models are positive numbers, and the sum of the weighting coefficients is 1.

In some embodiments of the present application, since the number of classification classes that can be output by the open source model is different from the number of classification classes that can be output by the target model, in order to avoid that this situation affects the generation of the countermeasure sample, before step 220, the structure of the open source model is fine-tuned such that the number of neurons in the classification layer of the open source model is adjusted to be the same as the number of classification classes that can be output by the target model. For the neural network model, the number of the neurons in the classification layer is the same as the number of the classification classes which can be output by the model, so that the number of the classification classes which can be output by the open source model after adjustment can be ensured to be the same as the number of the classification classes which can be output by the target model by adjusting the number of the neurons in the classification layer of the open source model, and further the number of the classification classes which can be output by the first model constructed according to the open source model is ensured to be the same as the number of the classification classes which can be output by the target model.

In the process of training the first model through the first sample set, classifying each first sample through the first model to obtain a second classification category corresponding to the first sample; and if the second classification category of the first sample is different from the first classification category indicated by the label corresponding to the first sample, adjusting the parameters of the first model, and classifying the first sample again through the first model after parameter adjustment until the obtained second classification category corresponding to the first sample is the same as the second classification category indicated by the label of the first sample.

In the scheme of the application, a first classification category obtained by classifying a first sample by a target model is used as a label of the first sample, and the first model is trained through the first sample and the label of the first sample, so that the first model can learn the decision-making behavior of the target model in the training process, and then the confrontation sample generated by a second model obtained through training has pertinence to the target model.

And 230, performing white-box attack on the second model through the second sample, and generating a countermeasure sample corresponding to the second sample, wherein the countermeasure sample is used for performing countermeasure training on the target model.

The second sample refers to a sample used to attack the second model to generate a challenge sample.

The white-box attack refers to an attack process of adding disturbance to an input sample under the condition of knowing the structure and parameters of a model so that the model outputs an incorrect classification result. The samples to which the perturbations are added and which cause the model to output a misclassification result are the generated countermeasure samples.

Since the challenge sample is obtained by adding a perturbation to the initial sample, the challenge sample corresponds to the input initial sample.

In the solution of the present application, since the first model is constructed by a known open source model, the structure and parameters of the open source model are known, and correspondingly, the parameters and structure of the first model are also known. The structure and parameters of the second model obtained by training the first model through the first sample set may also be known, so in this case, the second model may be attacked by a white-box attack to obtain a countersample corresponding to the second sample.

In some embodiments of the present application, the white-box attack may be performed by Fast Gradient Sign Method. Specifically, the process of generating the countermeasure sample by the fast gradient notation can be expressed as:

wherein epsilon is an allowable disturbance quantity;

sign (t) is a sign function, and the expression is as follows:

representing the gradient of the target training function J relative to x, the expression of the target training function J is:

where M is the number of predictable classification categories, p_x，cThe probability that the sample x is predicted by the model to be of class c; y is_x，cTo indicate the function (0 or 1), whether the true label for the sample x is a classification category c, if the true label of the sample x is the classification category c, the value of the indication function is 1, otherwise, if the true label of the sample x is not the classification category c, the value of the indication function is 0.

In the process of white box attack according to the fast gradient symbolic algorithm, only one iteration is carried out on the second sample, namely, the second sample is classified through the second model to obtain a classification category corresponding to the second sample; then determining the gradient of the target training function corresponding to the second sample for the classification category output by the second sample according to the second model; multiplying the determined gradient by the allowable disturbance quantity epsilon to obtain a target disturbance quantity; and adding the target disturbance amount to the second sample to obtain a confrontation sample corresponding to the second sample.

Of course, in other embodiments, other algorithms may be employed to make a white-box attack on the second model based on the second sample to generate the countersample accordingly.

Moreover, the known open source model is adopted to construct the first model, and then the first model is trained to obtain the second model, so that the structure and parameters of the open source model are known, and the parameters and structure of the second model are also known, thereby ensuring that the countersample can be generated by adopting a white-box attack method on the basis of the known parameters and structure, and the countersample does not need to be generated by a more expensive black-box attack method.

In some embodiments of the present application, as shown in FIG. 3, step 220 comprises:

and 310, performing classification prediction on each first sample through the first model to obtain a second classification category corresponding to the first sample.

The second classification category is a classification category obtained by classifying the first sample by the first model.

Step 320, if the second classification type corresponding to the first sample is different from the first classification type corresponding to the first sample, adjusting parameters of a classification layer of the first model so that the second classification type obtained by the first model for the first sample through prediction is the same as the first classification type corresponding to the first sample.

The classification realized by the model is that the feature extraction is firstly carried out on the object to be classified, and then the classification category corresponding to the object is predicted according to the extracted feature. Therefore, from this perspective, the structure of the model can be divided into a feature extraction layer and a classification layer.

FIG. 4 is a block diagram illustrating a model that implements classification through multiple interconnected neural network layers, each of which includes a number of neurons, according to an embodiment. As shown in fig. 4, the model includes 5 layers of neural network: a neural network layer 1, a neural network layer 2, a neural network layer 3, a neural network layer 4 and a neural network layer 5. The first 4 layers of the model (namely, the neural network layers 1-4) are used for extracting the features of the object to be classified according to the functional division to obtain the feature vectors of the object to be classified, so that the whole formed by the neural network layers 1-4 can be called as a feature extraction layer. The last layer of the neural network layer (neural network layer 5) of the model is used for outputting the classification category corresponding to the object to be classified according to the feature vector of the object to be classified, and therefore, the model can be called as a classification layer.

Similarly, in the scheme of the application, the structure of the open source model may be divided into a feature extraction layer and a classification layer, and the structure of the first model may also be divided into a feature extraction layer and a classification layer.

Since the first model is constructed according to the trained open-source models, it can be understood that the feature extraction layer of the first model is constructed according to the feature extraction layer of each open-source model, and the feature extraction layer of the second model is constructed according to the classification layer of each open-source model.

For example, if the first model is constructed from n (where n is a positive integer) trained open source models. Wherein, n open source models can be respectively expressed as: f'₁(x)、f′₂(x)、......、f′_n(x) In that respect Further, the model is divided into a feature extraction layer and a classification layer according to the above, and then the ith open source model can be further expressed as: f. of_i′(x)＝c_i(g_i(x) N), wherein i is a positive integer, 1, 2. Wherein, c_i(x) A classification layer representing an ith open source model; g_i(x) A feature extraction layer representing the ith open source model.

If the n open source models are weighted by equal coefficients to construct a first model, the first model may be represented as:

correspondingly, the feature extraction layer of the first model is obtained by weighting the feature extraction layers of the n open-source models, and the classification layer of the first model is obtained by weighting the classification layers of the n open-source models.

Since each open source model is trained, and the feature extraction layer of each open source model has the capability of accurately extracting features, the feature extraction layer of the first model also has the capability of accurately extracting features. On the basis, in order to reduce the training time of the first model, in the process of training the first model, parameters of a feature extraction layer of the first model are not adjusted, and only parameters of a classification layer of the first model are adjusted.

Continuing with the example above, in the process of training the first model, the feature extraction layer g of each open source model is made₁(x)、g₂(x)、...、g_n(x) Is kept constant, and only the classification level c of each open source model is adjusted₁(x)、c₂(x)、...、c₃(x) The parameter (c) of (c).

In the training process of the first model, if a second classification category obtained by classifying a first sample by the first model is different from a first classification category corresponding to the first sample, adjusting parameters of a classification layer of the first model, classifying the first sample again through the first model after parameter adjustment, and if the second classification category corresponding to the first sample obtained again is the same as the corresponding first classification category, continuing to train the first model by using a next first sample until a training end condition is reached; and if the second classification category corresponding to the first sample is obtained again and is different from the corresponding first classification category, repeating the process, adjusting the parameters of the classification layer again and classifying again until the second classification category output by the first model for the first sample is the same as the first classification category corresponding to the first sample.

And step 330, when the training end condition is reached, taking the first model after parameter adjustment as the second model.

The training end condition may be that the set number of model iterations is reached, or the set loss function for the first model converges, where the set loss function for the first model may be set according to actual needs, and is not specifically limited herein.

In the embodiment, since the first model is constructed according to the trained open source model, the feature extraction layer of the first model already has the capability of extracting features of a specific object. On the basis, in the process of training the first model, only the parameters of the classification layer of the first model are adjusted, and the parameters of the feature extraction layer are not adjusted. Compared with the training of a new model from zero, the training of the open source model is completed, only the parameters of the classification layer can be adjusted in the training process of the first model, the parameter adjustment amount is greatly reduced, the amount of samples required to be used for training is greatly reduced, and the model training time is greatly shortened.

In some embodiments of the present application, the first model is constructed from at least two open source models; as shown in fig. 5, step 310 includes:

and 510, respectively extracting the features of the first samples by the feature extraction layers of the at least two open source models to obtain respectively extracted feature vectors of the first samples.

The first sample feature vector is a feature vector obtained by extracting features of the first sample by a feature extraction layer of the open source model.

It can be understood that the feature extraction layer of each open source model correspondingly extracts a first sample feature vector for the first sample. Thus, in step 510, the number of first sample feature vectors obtained is the same as the number of open source models from which the first model was constructed.

And step 520, weighting the respectively extracted first sample feature vectors to obtain a target feature vector of the first sample.

The weighting coefficients of the first sample feature vectors may be the same or different, and are not particularly limited herein.

In one embodiment, the weighting coefficients of each first sample feature vector are positive numbers, and the sum of the weighting coefficients of the plurality of first sample feature vectors is 1.

In one embodiment of the present application, the weighting coefficients of each first sample feature vector are equal, and the sum of the weighting coefficients is zero. Continuing with the example above where the first model is built from n open source models. The feature vector obtained by feature extraction of the ith open source model for the first sample t is assumed to be: l_i(t), then the target feature vector of the first sample is:

step 530, predicting, by the classification layer of the first model, a second classification type corresponding to the first sample according to the target feature vector of the first sample.

In some embodiments of the present application, the classification layer may classify by a normalized exponential function (also called Softmax function), assuming that there are m elements in an array a, a_kRepresents the kth element in the array A, then element A_kThe corresponding Softmax values are:

the calculated Softmax value is one of (0, 1), and the sum of the Softmax values respectively corresponding to all elements in the array a is 1, so that the property of probability is satisfied, and therefore, the calculated Softmax value can be understood as probability, and when the probability is output, the value with the maximum probability is selected to be output.

Continuing with the above example, the target feature vector l of the first sample t is obtained_tThen, if the classification layer of the first model is classified by the Softmax function, the classification layer is classifiedAccording to the formula (6), the Softmax value corresponding to the first sample t can be calculated as:

where C is the number of classification categories. The probability range corresponding to each classification category is preset, the classification category corresponding to the maximum probability is determined after the probability of each classification category is obtained through calculation of a Softmax function, and the classification category corresponding to the determined maximum probability is the second classification category corresponding to the first sample.

In the scheme of this embodiment, a plurality of first feature vectors of a first sample are weighted and fused to obtain a target feature vector, and then a second classification category corresponding to the first sample is correspondingly determined according to the target feature vector.

In this embodiment, first sample feature vectors extracted from the first samples by the open source model are weighted to obtain target feature vectors, and then the classification layer of the first model determines the second classification type of the first sample according to the target feature vectors of the first samples.

Due to the fact that the structures and the training data of the open source models are different, the features extracted by the open source models for the same object have some differences, namely the features concerned by the open source models for the object to be classified have differences, and therefore the features of the first sample can be obtained through the target feature vectors obtained by weighting at least two first feature vectors in a multi-dimension mode. On the basis, the accuracy of the second classification type determined for the first sample according to the target feature vector can be ensured.

In other embodiments of the present application, the first model is constructed from at least two open source models; as shown in fig. 6, step 310 includes:

and step 610, performing classification prediction on the first sample by each open source model to obtain a classification probability corresponding to the first sample, wherein the classification probability is used for indicating the predicted classification category.

The classification prediction of the first sample by each open source model comprises the following steps: extracting the features of the first sample through a feature extraction layer to construct a feature vector corresponding to the first sample; then, the probability of the first sample corresponding to each classification category is predicted according to the feature vector, and the predicted maximum probability is used as the classification probability of the first sample.

If the classification layer calculates the classification probability by using the aforementioned Softmax function, the classification probability corresponding to the first sample can be calculated according to the aforementioned equation (6).

The classification probability predicted for the first sample by each open source model is obtained via step 610.

And step 620, weighting the classification probabilities predicted by the first sample by at least two open source models respectively to obtain a target classification probability corresponding to the first sample.

Step 630, determining a classification category corresponding to the target classification probability in a mapping relationship between the classification probability and the classification category, where the classification category corresponding to the determined target classification probability is used as a second classification category corresponding to the first sample.

The weighting of the at least two classification probabilities corresponding to the first sample may be performed according to the same weighting coefficient, and of course, the weighting coefficients of the classification probabilities may also be different, which is not specifically limited herein.

In some embodiments of the present application, in order to avoid the classification probabilities from being scaled up or down due to the introduction of the weighting coefficients, the sum of the weighting coefficients corresponding to the classification probabilities of the first sample may be set to be 1, for example, if the number of open source models is n, the weighting coefficient of each classification probability may be 1/n. Assuming that the classification probability predicted by the ith open source model for the first sample t is P_i(t), then the target classification probability for the first sample t may be:

on this basis, the classification category corresponding to the calculated target classification probability can be determined according to the mapping relation between the set classification probability and the classification category, namely, the classification category is the second classification category corresponding to the first sample.

In the scheme of this embodiment, the classification probabilities are weighted and fused first, and then the second classification category corresponding to the first sample is determined according to the target classification probability obtained by the weighted and fused.

In some embodiments of the present application, as shown in fig. 7, step 230, comprises:

step 710, obtaining an input sample of the second model in the last iteration, wherein the input sample of the second model in the first iteration process is the second sample.

Step 720, calculating a gradient of the target training function of the second model relative to the second sample according to a third classification category corresponding to the input sample, where the third classification category corresponding to the input sample is a classification category obtained by classifying the input sample through the second model.

And step 730, determining the sample disturbance quantity according to the calculated gradient.

Step 740, superposing the sample disturbance quantity to the input sample to obtain an output sample; and when the iteration end condition is not met, taking the output sample as an input sample of the second model in the next iteration.

And step 750, when the iteration end condition is reached, taking the output sample as a confrontation sample corresponding to the second sample and outputting the confrontation sample.

In the present embodiment, a Basic Iterative Method (Basic Iterative Method) is employed to perform a white-box attack to generate a countersample. The basic iterative algorithm can be expressed as:

wherein the content of the first and second substances,

for the output sample obtained from the t-th iteration, α is the disturbance amount set for each iteration, and J is the objective training function of the model, whose expression is the above equation (3), i.e.:

for convenience of description, assume

For the input sample of the second model in the last iteration, the input sample is predicted by the second model

After the corresponding third classification category, calculating the gradient of the target training function relative to the second sample according to the formula (3), namely

The input samples are then calculated

Amount of sample disturbance

And then, adding the determined sample disturbance amount to the input sample to obtain an output sample of the second model:

in some embodiments of the present application, the end-of-iteration condition may be that the second sample succeeds in attacking the second model. In this embodiment, before step 750, the method further includes: obtaining a classification category obtained by classifying the second sample by a second model; obtaining a classification category obtained by classifying the output sample by a second model; and if the classification category corresponding to the output sample is different from the classification category corresponding to the second sample, determining that an iteration end condition is reached.

That is, if the classification type obtained by the input sample of the second model is different from the classification type corresponding to the second sample, the second sample successfully attacks the second model, and the output sample obtained in the iterative process is correspondingly used as the countermeasure sample corresponding to the second sample.

In some embodiments of the present application, the iteration end condition may also be that a set number of iterations is reached, and if the set number of iterations is reached, the iteration end condition is considered to be reached, and an output sample obtained in the last iteration is taken as a countermeasure sample of the second sample and is output.

Embodiments of the apparatus of the present application are described below, which may be used to perform the methods of the above-described embodiments of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the above-described embodiments of the method of the present application.

Fig. 8 is a block diagram illustrating an apparatus for generating a countermeasure sample according to an embodiment, as shown in fig. 8, the apparatus for generating a countermeasure sample includes:

an obtaining module 810, configured to obtain a first sample set, where the first sample set includes a plurality of first samples and tags corresponding to the first samples, where the tags corresponding to the first samples are first classification categories obtained by classifying the first samples through a target model;

a training module 820, configured to train a first model through the first sample set to adjust parameters of a classification layer in the first model to obtain a second model; the first model is constructed according to a trained open source model;

and the attack module 830 is configured to perform a white-box attack on the second model through the second sample, and generate a countermeasure sample corresponding to the second sample, where the countermeasure sample is used for performing countermeasure training on the target model.

In some embodiments of the present application, training module 820 includes: a second classification type determining unit, configured to perform classification prediction on each first sample through the first model to obtain a second classification type corresponding to the first sample; a parameter adjusting unit, configured to adjust a parameter of a classification layer of the first model so that a second classification type obtained by the first model by predicting the first sample again is the same as a first classification type corresponding to the first sample, if the second classification type corresponding to the first sample is different from the first classification type corresponding to the first sample; and the second model determining unit is used for taking the first model after parameter adjustment as the second model when the training end condition is reached.

In some embodiments of the present application, the first model is constructed from at least two open source models; the second classification category determination unit includes: the first extraction unit is used for respectively extracting the features of the first samples by the feature extraction layers of the at least two open-source models to obtain respectively extracted first sample feature vectors; a first weighting unit, configured to weight the first sample feature vectors extracted respectively to obtain a target feature vector of the first sample; and the prediction unit is used for predicting the classification layer of the first model according to the target feature vector of the first sample to obtain a second classification type corresponding to the first sample.

In some embodiments of the present application, the first model is constructed from at least two open source models; a second classification category determination unit including: the second prediction unit is used for performing classification prediction on the first sample by each open source model to obtain a classification probability corresponding to the first sample, and the classification probability is used for indicating the predicted classification category; the second weighting unit is used for weighting the classification probabilities predicted by the first sample by at least two open source models respectively to obtain target classification probabilities corresponding to the first sample; and the classification category determining unit is used for determining a classification category corresponding to the target classification probability in a mapping relation between the classification probability and the classification category, and the classification category corresponding to the determined target classification probability is used as a second classification category corresponding to the first sample.

In some embodiments of the present application, attack module 830 includes: an input sample acquiring unit, configured to acquire an input sample of the second model in a last iteration, where the input sample of the second model is the second sample in a first iteration process; a gradient calculation unit, configured to calculate a gradient of a target training function of the second model relative to the second sample according to a third classification category corresponding to the input sample, where the third classification category corresponding to the input sample is a classification category obtained by classifying the input sample through the second model; the sample disturbance quantity calculation unit is used for determining a sample disturbance quantity according to the calculated gradient; an output sample determining unit, configured to superimpose the sample perturbation amount on the input sample to obtain an output sample; when the iteration end condition is not met, taking the output sample as an input sample of the second model in the next iteration; and the confrontation sample determining unit is used for taking the output sample as the confrontation sample corresponding to the second sample and outputting the confrontation sample when the iteration end condition is reached.

In some embodiments of the present application, the end-of-iteration condition includes that the second sample succeeds in attacking the second model, and the means for generating a countersample further includes: the first obtaining module is used for obtaining a classification category obtained by classifying the second sample by the second model; the second obtaining module is used for obtaining classification categories obtained by classifying the output samples by the second model; and the judging module is used for determining that an iteration end condition is reached if the classification category corresponding to the output sample is different from the classification category corresponding to the second sample.

In some embodiments of the present application, before the training of the first model by the first sample set to obtain the second model, the number of neurons in the classification layer of the open source model is adjusted to be the same as the number of classification classes that the target model can output.

It should be noted that the computer system 900 of the electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments.

As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901, which can perform various appropriate actions and processes, such as executing the method in the above-described embodiment, according to a program stored in a Read-Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for system operation are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An Input/Output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage portion 908 including a hard disk and the like; and a communication section 909 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer-readable storage medium carries computer-readable instructions that, when executed by a processor, implement the method in the embodiments described above.

According to an aspect of the present application, there is also provided an electronic device, including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method of the above embodiments.

According to an aspect of an embodiment of the present application, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the above-mentioned alternative embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of generating a challenge sample, comprising:

obtaining a first sample set, wherein the first sample set comprises a plurality of first samples and labels corresponding to the first samples, and the labels corresponding to the first samples are first classification categories obtained by classifying the first samples through a target model;

training the first model through the first sample set to obtain a second model; the first model is constructed according to a trained open source model;

and performing white-box attack on the second model through second samples to generate countermeasure samples corresponding to the second samples, wherein the countermeasure samples are used for performing countermeasure training on the target model.

2. The method of claim 1, wherein training the first model through the first set of samples to obtain a second model comprises:

performing classification prediction on each first sample through the first model to obtain a second classification category corresponding to the first sample;

if the second classification type corresponding to the first sample is different from the first classification type corresponding to the first sample, adjusting parameters of a classification layer of the first model so that the second classification type obtained by the first model for the first sample through re-prediction is the same as the first classification type corresponding to the first sample;

and when the training end condition is reached, taking the first model after parameter adjustment as the second model.

3. The method of claim 2, wherein the first model is constructed from at least two open source models;

the classifying and predicting the first sample through the first model to obtain a second classification type corresponding to the first sample includes:

respectively extracting the features of the first samples by the feature extraction layers of the at least two open source models to obtain respectively extracted first sample feature vectors;

weighting the respectively extracted first sample feature vectors to obtain target feature vectors of the first samples;

and predicting by a classification layer of the first model according to the target feature vector of the first sample to obtain a second classification type corresponding to the first sample.

4. The method of claim 2, wherein the first model is constructed from at least two open source models;

performing classification prediction on the first sample by each open source model to obtain a classification probability corresponding to the first sample, wherein the classification probability is used for indicating the predicted classification category;

weighting the classification probabilities predicted by the first sample by at least two open source models respectively to obtain target classification probabilities corresponding to the first sample;

and determining a classification category corresponding to the target classification probability in a mapping relation between the classification probability and the classification category, wherein the classification category corresponding to the determined target classification probability is used as a second classification category corresponding to the first sample.

5. The method of claim 1, wherein the performing a white-box attack on the second model with the second sample to generate a challenge sample corresponding to the second sample comprises:

obtaining an input sample of the second model in a last iteration, wherein the input sample of the second model is the second sample in a first iteration process;

calculating the gradient of the target training function of the second model relative to the second sample according to a third classification category corresponding to the input sample, wherein the third classification category corresponding to the input sample is a classification category obtained by classifying the input sample through the second model;

determining a sample disturbance quantity according to the calculated gradient;

superposing the sample disturbance quantity to the input sample to obtain an output sample;

when an iteration end condition is reached, taking the output sample as a confrontation sample corresponding to the second sample and outputting the confrontation sample;

and when the iteration end condition is not met, taking the output sample as an input sample of the second model in the next iteration.

6. The method of claim 5, wherein the iteration end condition comprises success of attack on the second model by a second sample, and before the output sample is taken as a countermeasure sample corresponding to the second sample and output when the iteration end condition is reached, the method further comprises:

obtaining a classification category obtained by classifying the second sample by a second model; and

obtaining a classification category obtained by classifying the output sample by a second model;

and if the classification category corresponding to the output sample is different from the classification category corresponding to the second sample, determining that an iteration end condition is reached.

7. The method of claim 1, wherein the number of neurons in the classification layer of the open-source model is adjusted to be the same as the number of classification classes that the target model can output before the training of the first model by the first set of samples to obtain the second model.

8. An apparatus for generating a challenge sample, comprising:

the system comprises an acquisition module, a classification module and a processing module, wherein the acquisition module is used for acquiring a first sample set, the first sample set comprises a plurality of first samples and labels corresponding to the first samples, and the labels corresponding to the first samples are first classification categories obtained by classifying the first samples through a target model;

the training module is used for training the first model through the first sample set to obtain a second model; the first model is constructed according to a trained open source model;

and the attack module is used for carrying out white box attack on the second model through a second sample and generating a countermeasure sample corresponding to the second sample, wherein the countermeasure sample is used for carrying out countermeasure training on the target model.

9. An electronic device, comprising:

a processor;

a memory having computer-readable instructions stored thereon which, when executed by the processor, implement the method of any one of claims 1-7.

10. A computer-readable storage medium having computer-readable instructions stored thereon, which when executed by a processor, implement the method of any one of claims 1-7.