WO2021109695A1

WO2021109695A1 - Adversarial attack detection method and device

Info

Publication number: WO2021109695A1
Application number: PCT/CN2020/118659
Authority: WO
Inventors: 宗志远
Original assignee: 支付宝(杭州)信息技术有限公司
Priority date: 2019-12-06
Filing date: 2020-09-29
Publication date: 2021-06-10
Also published as: TW202123043A; TWI743787B; CN111046379A; CN111046379B

Abstract

The description discloses an adversarial attack detection method and device. The method comprises: acquiring an adversarial sample space of a target model; collecting input data of the target model when being called; determining whether or not the input data falls into the adversarial sample space; and calculating, according to a determination result, a detection parameter of the input data falling into the adversarial sample space during a detection period, and if the detection parameter meets a preset attack condition, determining that an adversarial attack against the target model has been detected. The solution enables effective detection of adversarial attacks, thus effectively reducing security risks such as privacy leakage and capital loss, and ensuring data security.

Description

一种对抗攻击的监测方法和装置A monitoring method and device for resisting attacks

技术领域Technical field

本说明书涉及人工智能领域，尤其涉及一种对抗攻击的监测方法及装置。This specification relates to the field of artificial intelligence, and in particular to a method and device for monitoring against attacks.

背景技术Background technique

随着人工智能的不断发展，机器学习模型越来越复杂，精确度越来越高。然而精确度越高的模型，鲁棒性却可能越差，即模型的稳健性越差，这就给攻击制造了机会。With the continuous development of artificial intelligence, machine learning models are becoming more and more complex and more accurate. However, the higher the accuracy of the model, the worse the robustness, that is, the worse the robustness of the model, which creates opportunities for attacks.

以对抗攻击为例，攻击者对样本进行细微的修改形成对抗样本，并输入模型，以使模型输出错误的预测结果。对抗攻击可能会带来安全风险，例如，对于依靠人脸识别进行身份认证的场景，攻击者构造了一对抗样本并输入人脸识别模型，若模型将该对抗样本识别为某合法用户，攻击者就能够通过身份认证，带来私有数据泄露、资金损失等安全风险。Taking the adversarial attack as an example, the attacker modifies the sample slightly to form an adversarial sample, and inputs the model to make the model output the wrong prediction result. Adversarial attacks may bring security risks. For example, for scenarios that rely on face recognition for identity authentication, the attacker constructs a confrontation sample and inputs the face recognition model. If the model recognizes the confrontation sample as a legitimate user, the attacker It will be able to pass identity authentication, which brings security risks such as private data leakage and capital loss.

发明内容Summary of the invention

有鉴于此，本说明书提供一种对抗攻击的监测方法和装置。In view of this, this specification provides a monitoring method and device against attacks.

具体地，本说明书是通过如下技术方案实现的：一种对抗攻击的监测方法，包括：获取目标模型的对抗样本空间；采集调用所述目标模型的输入数据；判断所述输入数据是否落入所述对抗样本空间；根据判断结果计算监测周期内落入到所述对抗样本空间的输入数据的监测参数，当所述监测参数满足预设的攻击条件时，确定监测到面向所述目标模型的对抗攻击。Specifically, this specification is implemented through the following technical solutions: a monitoring method for countering attacks, including: obtaining a countermeasure sample space of a target model; collecting input data for calling the target model; judging whether the input data falls into the target model. The confrontation sample space; the monitoring parameters of the input data falling into the confrontation sample space during the monitoring period are calculated according to the judgment result, and when the monitoring parameters meet the preset attack conditions, it is determined that the confrontation facing the target model is detected attack.

一种对抗攻击的监测装置，包括：获取单元，获取目标模型的对抗样本空间；采集单元，采集调用所述目标模型的输入数据；判断单元，判断所述输入数据是否落入所述对抗样本空间；监测单元，根据判断结果计算监测周期内落入到所述对抗样本空间的输入数据的监测参数，当所述监测参数满足预设的攻击条件时，确定监测到面向所述目标模型的对抗攻击。A monitoring device for resisting attacks, comprising: an acquisition unit, which acquires a confrontation sample space of a target model; an acquisition unit, which collects input data for invoking the target model; and a judging unit, which determines whether the input data falls into the confrontation sample space The monitoring unit calculates the monitoring parameters of the input data falling into the confrontation sample space during the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, it is determined that the confrontation attack facing the target model is detected .

一种对抗攻击的监测装置，包括：处理器；用于存储机器可执行指令的存储器；其中，通过读取并执行所述存储器存储的与对抗攻击的监测逻辑对应的机器可执行指令，所述处理器被促使：获取目标模型的对抗样本空间；采集调用所述目标模型的输入数据；判断所述输入数据是否落入所述对抗样本空间；根据判断结果计算监测周期内落入到所述对抗样本空间的输入数据的监测参数，当所述监测参数满足预设的攻击条件时，确定监测到面向所述目标模型的对抗攻击。An anti-attack monitoring device includes: a processor; a memory for storing machine-executable instructions; wherein, by reading and executing the machine-executable instructions stored in the memory and corresponding to the anti-attack monitoring logic, the The processor is prompted to: acquire the confrontation sample space of the target model; collect input data for invoking the target model; determine whether the input data falls into the confrontation sample space; The monitoring parameters of the input data of the sample space, when the monitoring parameters meet the preset attack conditions, it is determined that the confrontation attack facing the target model is detected.

本说明书一个实施例实现了采集调用目标模型的输入数据，判断输入数据是否落入所述目标模型的对抗样本空间，并根据判断结果计算监测周期内落入到对抗样本空间的输入数据的监测参数，若监测参数满足攻击条件，则确认监测到面向目标模型的对抗攻击。上述方法不影响目标模型的正常使用，还可以及时监测到对抗攻击，有效降低私有数据泄露、资金损失等安全风险。An embodiment of this specification realizes the collection of input data for calling the target model, determines whether the input data falls into the countermeasure sample space of the target model, and calculates the monitoring parameters of the input data falling into the countermeasure sample space during the monitoring period according to the judgment result If the monitoring parameters meet the attack conditions, it is confirmed that the target-oriented model-oriented confrontation attack is detected. The above method does not affect the normal use of the target model, and can also detect counterattacks in a timely manner, effectively reducing security risks such as private data leakage and capital loss.

附图说明Description of the drawings

图1是本说明书一示例性实施例示出的一种对抗攻击的监测的方法的流程示意图。Fig. 1 is a schematic flowchart of a method for monitoring against an attack shown in an exemplary embodiment of the present specification.

图2是本说明书一示例性实施例示出的另一种对抗攻击的监测方法的流程示意图。Fig. 2 is a schematic flowchart of another method for monitoring against attacks according to an exemplary embodiment of the present specification.

图3是本说明书一示例性实施例示出的一种获取目标模型对抗样本空间的方法的流程示意图。Fig. 3 is a schematic flowchart of a method for obtaining a target model adversarial sample space according to an exemplary embodiment of the present specification.

图4是本说明书一示例性实施例示出的另一种对抗攻击监测的方法的流程示意图。Fig. 4 is a schematic flowchart of another method for counter-attack monitoring according to an exemplary embodiment of the present specification.

图5是本说明书一示例性实施例示出的一种用于对抗攻击监测装置的结构示意图。Fig. 5 is a schematic structural diagram of an anti-attack monitoring device shown in an exemplary embodiment of this specification.

图6是本说明书一示例性实施例示出的一种对抗攻击监测装置的框图。Fig. 6 is a block diagram of an anti-attack monitoring device shown in an exemplary embodiment of this specification.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本说明书的一些方面相一致的装置和方法的例子。The exemplary embodiments will be described in detail here, and examples thereof are shown in the accompanying drawings. When the following description refers to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with this specification. Rather, they are merely examples of devices and methods consistent with some aspects of this specification as detailed in the appended claims.

在本说明书使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本说明书。在本说明书和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in this specification are only for the purpose of describing specific embodiments, and are not intended to limit the specification. The singular forms of "a", "said" and "the" used in this specification and appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items.

应当理解，尽管在本说明书可能采用术语第一、第二、第三等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本说明书范围的情况下，第一信息也可以被称为第二信息，类似地，第二信息也可以被称为第一信息。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of this specification, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information. Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "in response to determination".

随着人工智能的不断发展，研究者们不断地设计出更深、更复杂的机器学习模型，以使模型输出更准确的预测结果。然而随着模型精确度的不断提高，模型的鲁棒性却可能越来越差，这使得模型很容易遭受攻击。With the continuous development of artificial intelligence, researchers continue to design deeper and more complex machine learning models to make the model output more accurate prediction results. However, as the accuracy of the model continues to improve, the robustness of the model may become worse and worse, which makes the model vulnerable to attacks.

以对抗攻击为例，通过对样本进行细微的修改形成对抗样本，将对抗样本输入模型后能使模型输出错误的预测结果。例如，在图像识别模型中，这种细微的修改可以是对图像增加一些具有干扰性的噪声。将修改后的图像输入图像识别模型后，图像识别模型可能会把一张小狗的图片识别为一辆汽车的图片，导致输出一个完全错误的识别结果。对抗攻击可以存在于图像识别、语音识别、文字识别等领域。Take the adversarial attack as an example. The adversarial sample is formed by making subtle modifications to the sample, and inputting the adversarial sample into the model can make the model output wrong prediction results. For example, in the image recognition model, this subtle modification can be to add some disturbing noise to the image. After inputting the modified image into the image recognition model, the image recognition model may recognize a picture of a puppy as a picture of a car, resulting in a completely wrong recognition result. Confrontation attacks can exist in image recognition, speech recognition, text recognition and other fields.

在一些场景下，对抗攻击可能会带来安全风险。例如，对于依靠人脸识别进行身份认证的场景，攻击者构造了一对抗样本并输入人脸识别模型，若人脸识别模型将该对抗样本识别为某合法用户，攻击者就能够通过身份认证，带来私有数据泄露、资金损失等安全风险。In some scenarios, adversarial attacks may bring security risks. For example, for a scenario that relies on face recognition for identity authentication, the attacker constructs a confrontation sample and inputs the face recognition model. If the face recognition model recognizes the confrontation sample as a legitimate user, the attacker can pass the identity authentication. Brings security risks such as private data leakage and capital loss.

本说明书提供了一种对抗攻击的监测方法及装置。This manual provides a monitoring method and device against attacks.

所述对抗攻击的监测方法可以应用于具有处理器、存储器的电子设备中，例如服务器或服务器集群等，本说明书对此不作特殊限制。The monitoring method for countering attacks can be applied to electronic devices with processors and memories, such as servers or server clusters, which are not particularly limited in this specification.

请参考图1，所述对抗攻击的监测方法可以包括步骤S101至S104。Please refer to FIG. 1, the method for monitoring the counterattack attack may include steps S101 to S104.

步骤S101，获取目标模型的对抗样本空间。Step S101: Obtain the adversarial sample space of the target model.

在本说明书中，在应用场景维度下，目标模型可以为语音识别模型、图像识别模型、文字识别模型等；在模型结构维度下，目标模型可以为基于神经网络的模型等，本说明书对此不作特殊限制。In this specification, in the application scenario dimension, the target model can be a speech recognition model, image recognition model, text recognition model, etc.; in the model structure dimension, the target model can be a neural network-based model, etc. This specification does not make any reference to this. Special restrictions.

在本说明书中，所述对抗样本空间可以是在目标模型完成训练后、正式上线前，经过预先计算得到的。当然，所述对抗样本空间也可以在目标模型上线后计算得到，本说明书对此不作特殊限制。In this specification, the adversarial sample space may be obtained through pre-calculation after the target model is trained and before it is officially launched. Of course, the adversarial sample space can also be calculated after the target model is online, which is not particularly limited in this specification.

在本说明书中，可以通过攻击测试得到对抗样本，并根据对抗样本生成对抗样本空间。In this specification, adversarial samples can be obtained through attack testing, and the adversarial sample space can be generated based on the adversarial samples.

在一个例子中，所述攻击测试可以为基于边界攻击的黑盒测试。In one example, the attack test may be a black box test based on a border attack.

边界攻击指的是先构造一个干扰性较大的对抗样本以测试目标模型，并在保证对抗性的前提下不断地降低样本的干扰性，最终得到干扰性较小的对抗样本。Boundary attack refers to constructing a more intrusive adversarial sample to test the target model, and continuously reducing the interference of the sample under the premise of ensuring adversarial resistance, and finally obtaining a less intrusive adversarial sample.

在实际应用中，在基于原始图像生成对抗样本时，可以先生成一个干扰性较大的对抗样本。例如，可随机更改原始图像上的一些像素点的像素值，并将修改后的原始图像输入目标模型，若目标模型输出误判的预测结果，则将修改后的图像作为对抗样本。获取对抗样本后，可根据该对抗样本的空间坐标和该原始图像的空间坐标，在空间中以所述对抗样本为起点，沿着靠近原始图像的方向对所述对抗样本进行随机扰动，在保证该对抗样本对抗性的前提下，不断减小扰动后的对抗样本与原始图像的距离。In practical applications, when generating a confrontation sample based on the original image, a confrontation sample with greater interference can be generated first. For example, the pixel values of some pixels on the original image can be randomly changed, and the modified original image can be input to the target model. If the target model outputs a misjudged prediction result, the modified image will be used as the adversarial sample. After obtaining the adversarial sample, according to the spatial coordinates of the adversarial sample and the spatial coordinates of the original image, the adversarial sample can be randomly disturbed along the direction close to the original image in the space, taking the adversarial sample as a starting point, and ensuring that Under the premise of the adversarial sample adversarial, the distance between the perturbed adversarial sample and the original image is continuously reduced.

例如，可将扰动后的对抗样本输入目标模型，若目标模型输出错误的预测结果，说明该对抗样本仍旧具有对抗性，则可进一步对该对抗样本进行上述方向的随机扰动，使得其更加靠近原始图像，最终得到与原始图像距离最近的对抗样本，即得到使干扰性最小的对抗样本。采用上述方法，可得到目标模型的多个对抗样本。For example, the perturbed adversarial sample can be input to the target model. If the target model outputs an incorrect prediction result, indicating that the adversarial sample is still adversarial, the adversarial sample can be further subjected to random perturbation in the above direction to make it closer to the original For the image, the confrontation sample with the closest distance to the original image is finally obtained, that is, the confrontation sample with the least interference is obtained. Using the above method, multiple adversarial examples of the target model can be obtained.

本说明书说，还可以通过其它方法构建对抗样本，本说明书对此不作特殊限制。This manual says that other methods can also be used to construct adversarial samples, and this manual does not impose special restrictions on this.

在另一个例子中，所述攻击测试还可以为基于边界攻击的白盒测试。白盒测试的步骤参照上述黑盒测试的步骤，在此不再赘述。In another example, the attack test may also be a white box test based on a border attack. The steps of the white box test refer to the steps of the above-mentioned black box test, which will not be repeated here.

值得说明的是，白盒测试需要预先获取完整的目标模型文件，所述目标模型文件可以包括目标模型的结构与参数等。在本说明书中，可以基于所述对抗样本确定目标模型的对抗样本空间。It is worth noting that the white box test needs to obtain a complete target model file in advance, and the target model file may include the structure and parameters of the target model. In this specification, the adversarial sample space of the target model can be determined based on the adversarial sample.

在一个例子中，可以确定目标模型每个对抗样本的空间坐标，基于所述空间坐标确定目标模型的对抗样本空间。In an example, the space coordinates of each confrontation sample of the target model can be determined, and the confrontation sample space of the target model can be determined based on the space coordinates.

以目标模型是图像识别模型为例，假设一对抗样本为像素是64*64的彩色图像，所述对抗样本具有64*64个像素点，每个像素点有3个像素值，该对抗样本共有64*64*3＝12288个像素值，则该图像识别模型的对抗样本的空间坐标有12288个维度，即对抗样本空间具有12288个维度，每个维度的取值分别为对抗样本对应像素点的某像素值。Taking the target model as an image recognition model as an example, suppose an adversarial sample is a color image with 64*64 pixels. The adversarial sample has 64*64 pixels, and each pixel has 3 pixel values. The adversarial sample shares 64*64*3=12288 pixel values, then the spatial coordinates of the confrontation sample of the image recognition model have 12288 dimensions, that is, the confrontation sample space has 12288 dimensions, and the value of each dimension is the corresponding pixel of the confrontation sample A certain pixel value.

例如，所述对抗样本空间的第一个维度可代表对抗样本第一个像素点的第1个像素值；所述对抗样本空间的第二个维度可代表对抗样本第一个像素点的第2个像素值；所述对抗样本空间的第三个维度可代表对抗样本第一个像素点的第3个像素值；所述对抗样本空间的第四个维度可代表对抗样本第二个像素点的第1个像素值……以此类推。For example, the first dimension of the adversarial sample space can represent the first pixel value of the first pixel of the adversarial sample; the second dimension of the adversarial sample space can represent the second pixel of the first pixel of the adversarial sample. Pixel values; the third dimension of the adversarial sample space can represent the third pixel value of the first pixel of the adversarial sample; the fourth dimension of the adversarial sample space can represent the second pixel of the adversarial sample The first pixel value...and so on.

基于所述对抗样本的空间坐标对所述对抗样本进行聚类，得到若干对抗样本簇。聚类算法可以为K-Means算法、DBSCAN(Density-Based Spatial Clustering of Applications with Noise，基于密度的聚类算法)算法等，本说明书对此不作特殊限制。Clustering the adversarial examples based on the spatial coordinates of the adversarial examples to obtain several adversarial example clusters. The clustering algorithm can be K-Means algorithm, DBSCAN (Density-Based Spatial Clustering of Applications with Noise, density-based clustering algorithm) algorithm, etc., which are not particularly limited in this specification.

在本例中，可以将所述若干对抗样本簇作为对抗样本空间。In this example, the several adversarial sample clusters can be used as adversarial sample spaces.

在另一例子中，获取了若干对抗样本簇后，还可以为每个对抗样本簇生成对应的凸包络，并将生成的若干凸包络作为对抗样本空间。凸包络的计算方法可以为Graham算法、Melkman算法、Andrew算法等，本说明书对此不作特殊限制。In another example, after obtaining several adversarial sample clusters, a corresponding convex envelope may be generated for each adversarial sample cluster, and the generated convex envelopes can be used as the adversarial sample space. The calculation method of the convex envelope can be Graham's algorithm, Melkman's algorithm, Andrew's algorithm, etc., which are not particularly limited in this specification.

步骤S102，采集调用所述目标模型的输入数据。Step S102: Collect input data for calling the target model.

模型上线后，目标模型可对调用方提供API(Application Programming Interface，应用程序接口)接口，以使调用方根据API接口对目标模型进行调用。采集模型调用方调用模型时的输入数据。例如，对于图像识别模型，输入数据可以为一张图像；对于语音识别模型，输入数据可以为一段语音。After the model is online, the target model can provide the caller with an API (Application Programming Interface) interface, so that the caller can call the target model according to the API interface. Collect the input data when the model caller calls the model. For example, for an image recognition model, the input data can be an image; for a speech recognition model, the input data can be a segment of speech.

在一个例子中，可以实时采集目标模型的输入数据。例如，可以监听目标模型的调用，在监听到目标模型被调用时，获取调用方输入的输入数据。In one example, the input data of the target model can be collected in real time. For example, you can monitor the call of the target model, and obtain the input data input by the caller when the target model is monitored.

在另一个例子中，还可以预设的时间间隔，周期性地采集目标模型的历史输入数据，所述时间间隔可以为下述对抗攻击的监测周期。In another example, the historical input data of the target model may be collected periodically at a preset time interval, and the time interval may be the following monitoring period against attacks.

值得说明的是，步骤S101还可以在步骤S102之后。例如，步骤S102为周期性地采集目标模型的历史输入数据，则可在采集目标模型的历史输入数据之后，获取目标模型的对抗样本空间，再执行步骤S103。It is worth noting that step S101 may also be after step S102. For example, step S102 is to periodically collect historical input data of the target model. After the historical input data of the target model is collected, the adversarial sample space of the target model can be obtained, and then step S103 is performed.

步骤S103，判断所述输入数据是否落入所述对抗样本空间。Step S103: It is judged whether the input data falls into the adversarial sample space.

在一个例子中，可以确定输入数据的空间坐标，判断所述空间坐标是否落入目标模型的对抗样本空间。In an example, the space coordinates of the input data can be determined, and it is determined whether the space coordinates fall into the adversarial sample space of the target model.

在一个例子中，可以将所述空间坐标输入预设的拟合函数，然后根据输出结果判断所述空间坐标是否落入任意一个凸包络。例如，所述空间坐标为x，所述拟合函数为 F，则可将x输入F得到F(x)，若F(x)<0，则确定落入凸包络，否则，确定未落入凸包络。若所述空间坐标落入了任意一个凸包络，则所述空间坐标落入了目标模型的对抗样本空间。In an example, the space coordinates can be input into a preset fitting function, and then it is determined whether the space coordinates fall into any convex envelope according to the output result. For example, if the spatial coordinate is x and the fitting function is F, then x can be input to F to obtain F(x). If F(x)<0, it is determined to fall into the convex envelope, otherwise, it is determined not to fall Into the convex envelope. If the space coordinates fall into any convex envelope, the space coordinates fall into the adversarial sample space of the target model.

在另一个例子中，还可以根据所述空间坐标计算所述输入数据与各个对抗样本簇的距离，判断所述输入数据与各个对抗样本簇的距离是否小于预设的距离阈值。例如，可计算所述输入数据与各个对抗样本簇的中心点的距离作为所述输入数据与对应对抗样本簇的距离。In another example, the distance between the input data and each confrontation sample cluster may be calculated according to the space coordinates, and it is determined whether the distance between the input data and each confrontation sample cluster is less than a preset distance threshold. For example, the distance between the input data and the center point of each adversarial sample cluster can be calculated as the distance between the input data and the corresponding adversarial sample cluster.

若存在一个对抗样本簇，使得所述输入数据与该对抗样本簇的距离小于所述预设的距离阈值，则确认所述输入数据落入对抗样本空间。If there is a confrontation sample cluster such that the distance between the input data and the confrontation sample cluster is less than the preset distance threshold, it is confirmed that the input data falls into the confrontation sample space.

所述距离阈值可预先确定。The distance threshold may be predetermined.

步骤S104，根据判断结果计算监测周期内落入到所述对抗样本空间的输入数据的监测参数，当所述监测参数满足预设的攻击条件时，确定监测到面向所述目标模型的对抗攻击。Step S104: Calculate the monitoring parameters of the input data falling into the confrontation sample space in the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, determine that the confrontation attack facing the target model is detected.

在一个例子中，监测参数为落入所述对抗样本空间的输入数据的数量，攻击条件为所述数量达到数量阈值。在实际应用中，可以在预设的监测周期内，监测输入数据落入对抗样本空间的数量是否达到数量阈值。若达到数量阈值，确定监测到面向目标模型的对抗攻击。In an example, the monitoring parameter is the number of input data that falls into the adversarial sample space, and the attack condition is that the number reaches the number threshold. In practical applications, it is possible to monitor whether the number of input data falling into the adversarial sample space reaches the number threshold within a preset monitoring period. If the number threshold is reached, it is determined that an adversarial attack on the target-oriented model is detected.

所述数量阈值的确定方式可以为：将目标模型在若干历史监测周期内，输入数据落入对抗样本空间的平均数量作为数量阈值。The method for determining the number threshold may be: taking the average number of input data falling into the adversarial sample space of the target model in a number of historical monitoring periods as the number threshold.

例如，假设监测周期是2小时，目标模型最近3天中每两个小时内输入数据落入对抗样本空间的平均数量为200个，则可将200个作为数量阈值。值得注意的是，考虑到调用方在一天内不同时间段对目标模型的调用需求可能是不同的，还可以对监测周期进行差异化的数量阈值确定。For example, assuming that the monitoring period is 2 hours, and the average number of input data falling into the adversarial sample space in the target model every two hours in the last 3 days is 200, 200 can be used as the number threshold. It is worth noting that, considering that the caller's call requirements for the target model may be different at different time periods in a day, the number threshold for the differentiation of the monitoring period can also be determined.

再例如，考虑到误差的存在，还可以将上述数量阈值乘以预设的误差系数，将计算得到的数值作为最终的数量阈值。For another example, considering the existence of errors, the above-mentioned quantity threshold may be multiplied by a preset error coefficient, and the calculated value may be used as the final quantity threshold.

再例如，也可以人工设置所述数量阈值。For another example, the number threshold can also be manually set.

在另一个例子中，监测参数还可以为落入所述对抗样本空间的输入数据的比例，攻击条件可以为所述比例达到比例阈值。In another example, the monitoring parameter may also be the proportion of input data falling into the adversarial sample space, and the attack condition may be that the proportion reaches a proportion threshold.

在实际应用中，可以在预设的监测周期内，监测输入数据落入对抗样本空间的数量占在该检测周期内所有输入数据的数量的比例是否达到比例阈值。若达到所述比例阈值，确认监测到面向目标模型的对抗攻击。In practical applications, it is possible to monitor whether the ratio of the number of input data falling into the adversarial sample space to the number of all input data in the detection period reaches the ratio threshold in the preset monitoring period. If the ratio threshold is reached, it is confirmed that an adversarial attack facing the target model is detected.

比例阈值的确定方式参考上述数量阈值，在此不再赘述。For the determination method of the ratio threshold, refer to the above-mentioned quantity threshold, which will not be repeated here.

由以上描述可以看出，在本说明书的一个实施例中，可以先对目标模型进行攻击测试，以得到目标模型的若干对抗样本，将若干对抗样本进行计算得到对抗样本空间。It can be seen from the above description that, in an embodiment of this specification, an attack test can be performed on the target model first to obtain several adversarial samples of the target model, and the adversarial sample space is obtained by calculating the several adversarial samples.

在对目标模型进行对抗攻击监测时，可以采集调用目标模型的输入数据，判断输入数据是否落入预先计算得到的对抗样本空间，并根据判断结果计算监测周期内落入到对抗样本空间的输入数据的监测参数，若监测参数满足攻击条件，则认为监测到面向目标模型的对抗攻击。本实施例所述方法，不影响目标模型的正常使用，还可以监测到对抗攻击。When monitoring the target model for adversarial attacks, you can collect the input data of calling the target model, determine whether the input data falls into the pre-calculated adversarial sample space, and calculate the input data that falls into the adversarial sample space during the monitoring period according to the judgment result If the monitoring parameters meet the attack conditions, then it is considered that an adversarial attack facing the target model is detected. The method described in this embodiment does not affect the normal use of the target model, and can also detect counterattacks.

请参考图2，所述对抗攻击的监测方法可以包括步骤S201至S205。Please refer to FIG. 2, the method for monitoring the counter-attack may include steps S201 to S205.

步骤S201，获取目标模型的对抗样本空间。Step S201: Obtain the adversarial sample space of the target model.

步骤S202，采集调用所述目标模型的输入数据。Step S202: Collect input data for calling the target model.

步骤S203，判断所述输入数据是否落入所述对抗样本空间。Step S203: Determine whether the input data falls into the adversarial sample space.

步骤S204，根据判断结果计算监测周期内落入到所述对抗样本空间的输入数据的监测参数，当所述监测参数满足预设的攻击条件时，确定监测到面向所述目标模型的对抗攻击。Step S204: Calculate the monitoring parameters of the input data falling into the countermeasure sample space in the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, it is determined that the counter attack facing the target model is detected.

上述步骤S201-步骤S204请参见步骤S101-步骤S104，在此不再赘述。For the above steps S201-S204, please refer to step S101-step S104, which will not be repeated here.

步骤S205，发送告警信息。Step S205, sending alarm information.

当监测参数满足预设的攻击条件时，确定监测到面向所述目标模型的对抗攻击后，还可以发送告警信息。When the monitoring parameters meet the preset attack conditions, after it is determined that the confrontational attack facing the target model is detected, alarm information may also be sent.

在一个例子中，告警信息可以包括当前监测周期、落入到对抗空间的输入数据的数量/比例等。In an example, the alarm information may include the current monitoring period, the number/proportion of input data falling into the confrontation space, and so on.

例如，告警信息可以为：“10分钟内监测到可疑输入数据223个，疑似存在对抗攻击”。若所述落入对抗空间的输入数据的数量仍在上升，则可以更新可疑输入数据数量/比例，持续报警。For example, the alarm information may be: "223 pieces of suspicious input data are detected within 10 minutes, and there is a suspected counterattack attack". If the number of input data falling into the confrontation space is still increasing, the number/ratio of suspicious input data can be updated, and the alarm will continue.

在另一个例子中，告警信息还可以包括输入数据对应的目标模型调用方的标识，所述标识可以为调用方的ID、名称、IP地址等。In another example, the alarm information may also include the identifier of the caller of the target model corresponding to the input data, and the identifier may be the ID, name, IP address, etc. of the caller.

例如，告警信息可以为：“10分钟内监测到可疑输入数据223个，疑似存在对抗攻击。其中，80％的可疑输入数据来自用户A。”调用方标识信息可通过目标模型调用过程中的调用日志得到。For example, the alarm information can be: "223 suspicious input data were detected within 10 minutes, and there is a suspected confrontation attack. Among them, 80% of the suspicious input data comes from user A." The caller identification information can be invoked through the target model invocation process The log gets.

在另一例子中，告警信息还可以包括目标模型对落入对抗样本空间的输入数据的预测结果，以判断对抗攻击是否攻击成功。In another example, the warning information may also include the prediction result of the target model on the input data falling into the confrontation sample space, so as to determine whether the confrontation attack is successful.

例如，若攻击者企图将添加干扰后的非法用户的图像输入目标模型，使目标模型输出的预测结果为合法用户，则告警信息可以为：“10分钟内监测到可疑输入数据223个，疑似存在对抗攻击。其中，220个输入数据输出结果为非法用户，3个输入数据的输出结果为合法用户。”则可以根据目标模型输出的预测结果判断对抗攻击是否成功。For example, if an attacker attempts to input an image of an illegal user with interference into the target model, and the predicted result output by the target model is a legitimate user, the alarm information can be: "223 suspicious input data were detected within 10 minutes, and there is a suspect Confrontation attacks. Among them, 220 input data output results are illegal users, and 3 input data output results are legitimate users.” Then, the success of the counter attack can be judged based on the prediction results output by the target model.

由以上描述可以看出，在本说明书的另一个实施例中，监测到存在面向目标模型的对抗攻击后，还可以发送告警信息。告警信息可以示出对抗攻击的攻击次数、攻击结果，还可以追溯到攻击源，后续可根据告警信息采取一些措施来抵御对抗攻击。例如，拦截可疑调用方的调用等，进而有效降低私有数据泄露、资金损失等安全风险。It can be seen from the above description that, in another embodiment of this specification, after detecting that there is an adversarial attack against a target-oriented model, an alarm message can also be sent. The alarm information can show the number of attacks and the result of the counterattack attack, and it can also be traced back to the source of the attack, and then some measures can be taken according to the alarm information to defend against counterattack attacks. For example, intercept calls from suspicious callers, etc., thereby effectively reducing security risks such as private data leakage and capital loss.

下面结合一个具体的实施例对本说明书对抗攻击的监测方法进行说明。The following describes the method for monitoring against attacks in this specification in conjunction with a specific embodiment.

所述对抗攻击的监测方法可以应用于服务器。The monitoring method for resisting attacks can be applied to a server.

请参考图3、图4，所述对抗攻击的监测方法可以分为两个流程：对目标模型进行攻击测试，以得到对抗样本空间；监测目标模型的输入数据，以监测对抗攻击。Please refer to Figures 3 and 4, the monitoring method of the confrontation attack can be divided into two processes: the attack test is performed on the target model to obtain the confrontation sample space; the input data of the target model is monitored to monitor the confrontation attack.

本实施例中，目标模型为用于用户身份认证的人脸识别模型。所述方法包括步骤S301至S303。In this embodiment, the target model is a face recognition model used for user identity authentication. The method includes steps S301 to S303.

步骤S301，调用人脸识别模型。In step S301, the face recognition model is called.

本实施例中，需要获取人脸识别模型的调用方式的说明文档及调用接口。In this embodiment, it is necessary to obtain the description document and the calling interface of the calling method of the face recognition model.

步骤S302，对所述人脸识别模型进行基于边界攻击的黑盒测试，以获取若干对抗样本。Step S302: Perform a black box test based on a boundary attack on the face recognition model to obtain a number of adversarial samples.

对人脸识别模型进行攻击测试，本实施例中，攻击测试为基于边界攻击的黑盒测试，先构造干扰性较大的人脸图像作为对抗样本并输入人脸识别模型，通过人脸识别模型输出的结果，在保证对抗性的前提下不断地降低对抗样本的干扰性，最终得到干扰性较小的若干对抗样本。本实施例中，对抗样本的干扰可以是在人脸图像上的增加噪声、调整特定像素点的像素值等。Perform an attack test on the face recognition model. In this embodiment, the attack test is a black box test based on a boundary attack. First, a more intrusive face image is constructed as a confrontation sample and input to the face recognition model. The face recognition model is passed The output result continuously reduces the interference of the adversarial samples under the premise of ensuring the adversarial properties, and finally obtains several adversarial samples with less interference. In this embodiment, the interference of the anti-sample may be adding noise to the face image, adjusting the pixel value of a specific pixel, and so on.

步骤S303，基于所述对抗样本确定人脸识别模型的对抗样本空间，所述对抗样本空间为凸包络。Step S303: Determine a confrontation sample space of the face recognition model based on the confrontation sample, where the confrontation sample space is a convex envelope.

确定若干对抗样本的空间坐标，基于所述空间坐标以K-Means算法进行聚类，得到若干对抗样本簇。基于Graham算法为每个对抗样本簇生成对应的凸包络，将生成的若干所述凸包络作为人脸识别模型的对抗样本空间。Determine the spatial coordinates of a number of adversarial samples, and cluster the K-Means algorithm based on the spatial coordinates to obtain a number of adversarial sample clusters. A corresponding convex envelope is generated for each adversarial sample cluster based on the Graham algorithm, and several of the convex envelopes generated are used as the adversarial sample space of the face recognition model.

图4是本说明书一示例性实施例示出的另一种对抗攻击监测的方法的流程示意图，包括步骤S401至S407。Fig. 4 is a schematic flowchart of another method for counter-attack monitoring according to an exemplary embodiment of the present specification, including steps S401 to S407.

步骤S401，部署人脸识别模型。Step S401, deploy a face recognition model.

步骤S402，获取所述人脸识别模型的对抗样本空间。Step S402: Obtain the adversarial sample space of the face recognition model.

步骤S403，采集调用所述人脸识别模型的输入图像。Step S403: Collect and call the input image of the face recognition model.

本实施例中，实时采集人脸识别模型的输入图像。In this embodiment, the input image of the face recognition model is collected in real time.

步骤S404，判断所述输入图像是否落入所述对抗样本空间。Step S404: Determine whether the input image falls into the adversarial sample space.

本实施例中，计算所述输入图像的坐标，基于预设的拟合函数，判断所述坐标是否落入任意一个凸包络。In this embodiment, the coordinates of the input image are calculated, and based on a preset fitting function, it is determined whether the coordinates fall into any convex envelope.

步骤S405，根据判断结果计算监测周期内落入到所述对抗样本空间的输入图像的比例。Step S405: Calculate the proportion of input images that fall into the confrontation sample space in the monitoring period according to the judgment result.

本实施例中，在预设的监测周期内，实时采集人脸识别模型的输入图像，每采集一张输入图像，则执行步骤S404，若判断结果为输入图像落入对抗样本空间，则将可疑输入图像的计数+1，若判断结果为输入图像未落入对抗样本空间，则可将安全输入图像的计数+1。In this embodiment, within a preset monitoring period, the input images of the face recognition model are collected in real time. Each time an input image is collected, step S404 is executed. If the result of the judgment is that the input image falls into the adversarial sample space, the suspect The count of the input image is +1. If the judgment result is that the input image does not fall into the adversarial sample space, the count of the safe input image can be +1.

步骤S406，若所述比例达到比例阈值，确定监测到面向所述人脸识别模型的对抗攻击。Step S406: If the ratio reaches the ratio threshold, it is determined that a confrontation attack directed to the face recognition model is detected.

本实施例中，比例阈值可根据人脸识别模型的历史输入数据得到，例如，统计得到：人脸识别模型在过去30天内，平均每小时输入图像落入凸包络的比例为0.05。则将比例阈值确定为0.05，其中监测周期为1小时。In this embodiment, the proportional threshold can be obtained based on historical input data of the face recognition model. For example, statistics are obtained: the average hourly ratio of input images falling into the convex envelope of the face recognition model in the past 30 days is 0.05. The ratio threshold is determined to be 0.05, and the monitoring period is 1 hour.

在监测周期内，可以实时判断输入图像落入凸包络的比例是否大于比例阈值0.05。例如，可将步骤S405中监测到的可疑输入图像的数量，除以可疑输入图像和安全输入图像的数量之和，判断得到的可疑输入图像的比例是否大于0.05，若大于0.05，则确认监测到对抗攻击。During the monitoring period, it can be judged in real time whether the proportion of the input image falling into the convex envelope is greater than the proportion threshold of 0.05. For example, the number of suspicious input images monitored in step S405 can be divided by the sum of the number of suspicious input images and safe input images to determine whether the ratio of the obtained suspicious input images is greater than 0.05. If it is greater than 0.05, confirm the detected Fight against the attack.

步骤S407，发送告警信息。Step S407, sending alarm information.

在监测到对抗攻击后，可以发送告警信息。在本实施例中，告警信息可以包括当前监测周期、落入到凸包络的输入图像的比例、人脸识别模型调用方的标识等。下表示例性的示出了告警信息的一种示例：After a counter attack is detected, an alert message can be sent. In this embodiment, the alarm information may include the current monitoring period, the proportion of the input image falling into the convex envelope, the identity of the caller of the face recognition model, and so on. The following table exemplarily shows an example of alarm information:

上表示出了在当前监测周期内，疑似对抗攻击的输入图像比例、调用次数较多的调用方标识及相应的调用次数，全面地反映了人脸识别模型的在当前监测周期内的攻击状况。The above shows the proportion of input images that are suspected of counter-attack attacks, the caller ID with a higher number of calls, and the corresponding number of calls in the current monitoring cycle, which fully reflects the attack status of the face recognition model in the current monitoring cycle.

以上表告警信息为例，当前监测周期内用户A输入的可疑输入图像最多，为预防对抗攻击，后续可拦截用户A的调用请求，例如，拦截用户A在预设时间段内的调用请求。Take the alarm information in the above table as an example. User A has the most suspicious input images input in the current monitoring period. In order to prevent adversarial attacks, user A's call request can be intercepted subsequently, for example, user A's call request within a preset time period.

由以上描述可以看出，可采用说明书提供的对抗攻击监测方法监测人脸识别模型的对抗攻击，在确认监测到面向人脸识别模型的对抗攻击时，可及时采取拦截调用等防御策略，从而有效降低私有数据泄露、资金损失等安全风险。From the above description, it can be seen that the confrontation attack monitoring method provided in the manual can be used to monitor the confrontation attack of the face recognition model. When the confrontation attack facing the face recognition model is confirmed, the defense strategy such as interception call can be adopted in time, which is effective Reduce security risks such as private data leakage and capital loss.

与前述对抗攻击的监测方法的实施例相对应，本说明书还提供了对抗攻击的检测的装置的实施例。Corresponding to the foregoing embodiment of the monitoring method for countering attacks, this specification also provides an embodiment of a device for countering attack detection.

本说明书对抗攻击的检测的装置的实施例可以应用在服务器上。装置实施例可以通过软件实现，也可以通过硬件或者软硬件结合的方式实现。以软件实现为例，作为一个逻辑意义上的装置，是通过其所在服务器的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言，如图5所示，为本说明书对抗攻击的监测装置所在服务器的一种硬件结构图，除了图5所示的处理器、内存、网络接口、以及非易失性存储器之外，实施例中装置所在的电子设备通常根据该服务器的实际功能，还可以包括其他硬件，对此不再赘述。The embodiment of the device for anti-attack detection in this specification can be applied on a server. The device embodiments can be implemented by software, or can be implemented by hardware or a combination of software and hardware. Taking software implementation as an example, as a logical device, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory through the processor of the server where it is located. From a hardware perspective, as shown in Figure 5, a hardware structure diagram of the server where the monitoring device against attacks is located in this specification, except for the processor, memory, network interface, and non-volatile memory shown in Figure 5. In addition, the electronic device in which the device is located in the embodiment usually may include other hardware according to the actual function of the server, which will not be repeated here.

图6是本说明书一示例性实施例示出的一种对抗攻击的监测的装置的框图。Fig. 6 is a block diagram of a device for monitoring against attacks shown in an exemplary embodiment of this specification.

请参考图6，所述对抗攻击的检测的装置600可以应用在前述图5所示的服务器中，包括有：获取单元610、采集单元620、判断单元630、监测单元640。Please refer to FIG. 6, the device 600 for detecting anti-attack can be applied to the server shown in FIG. 5, and includes: an acquisition unit 610, a collection unit 620, a judgment unit 630, and a monitoring unit 640.

其中，获取单元610，获取目标模型的对抗样本空间。Wherein, the acquiring unit 610 acquires the adversarial sample space of the target model.

采集单元620，采集调用所述目标模型的输入数据。The collection unit 620 collects input data for calling the target model.

判断单元630，判断所述输入数据是否落入所述对抗样本空间。The judging unit 630 judges whether the input data falls into the adversarial sample space.

监测单元640，根据判断结果计算监测周期内落入到所述对抗样本空间的输入数据的监测参数，当所述监测参数满足预设的攻击条件时，确定监测到面向所述目标模型的对抗攻击。The monitoring unit 640 calculates the monitoring parameters of the input data falling into the confrontation sample space during the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, it is determined that the confrontation attack facing the target model is detected .

可选的，所述判断单元630：确定所述输入数据的空间坐标；判断所述空间坐标是否落入任意凸包络；若是，则确定所述输入数据落入所述对抗样本空间。Optionally, the judging unit 630: determine the spatial coordinates of the input data; judge whether the spatial coordinates fall into any convex envelope; if so, determine that the input data falls into the adversarial sample space.

可选的，所述判断单元630：确定所述输入数据的空间坐标；根据所述空间坐标，判断所述输入数据与对抗样本簇的距离是否小于阈值；若是，则确定所述输入数据落入所述对抗样本空间。Optionally, the judging unit 630: determine the spatial coordinates of the input data; according to the spatial coordinates, judge whether the distance between the input data and the adversarial sample cluster is less than a threshold; if so, determine that the input data falls into The adversarial sample space.

可选的，所述装置还包括告警单元，发送告警信息。Optionally, the device further includes an alarm unit to send alarm information.

上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程，在此不再赘述。For the implementation process of the functions and roles of each unit in the above-mentioned device, please refer to the implementation process of the corresponding steps in the above-mentioned method for details, which will not be repeated here.

对于装置实施例而言，由于其基本对应于方法实施例，所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本说明书方案的目的。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。For the device embodiment, since it basically corresponds to the method embodiment, the relevant part can refer to the part of the description of the method embodiment. The device embodiments described above are merely illustrative, where the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. Those of ordinary skill in the art can understand and implement without creative work.

上述实施例阐明的***、装置、模块或单元，具体可以由计算机芯片或实体实现，或者由具有某种功能的产品来实现。一种典型的实现设备为计算机，计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The systems, devices, modules, or units illustrated in the above embodiments may be specifically implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. The specific form of the computer can be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email receiving and sending device, and a game control A console, a tablet computer, a wearable device, or a combination of any of these devices.

与前述对抗攻击的监测方法的实施例相对应，本说明书还提供一种对抗攻击的监测装置，该装置包括：处理器以及用于存储机器可执行指令的存储器。其中，处理器和存储器通常借由内部总线相互连接。在其他可能的实现方式中，所述设备还可能包括外部接口，以能够与其他设备或者部件进行通信。Corresponding to the foregoing embodiment of the monitoring method for countering attacks, this specification also provides a monitoring device for countering attacks. The device includes a processor and a memory for storing machine executable instructions. Among them, the processor and the memory are usually connected to each other via an internal bus. In other possible implementation manners, the device may also include an external interface to be able to communicate with other devices or components.

在本实施例中，通过读取并执行所述存储器存储的与对抗攻击的监测逻辑对应的机器可执行指令，所述处理器被促使：获取目标模型的对抗样本空间；采集调用所述目标模型的输入数据；判断所述输入数据是否落入所述对抗样本空间；根据判断结果计算监测周期内落入到所述对抗样本空间的输入数据的监测参数，当所述监测参数满足预设的攻击条件时，确定监测到面向所述目标模型的对抗攻击。In this embodiment, by reading and executing the machine executable instructions corresponding to the monitoring logic of the counterattack attack stored in the memory, the processor is prompted to: obtain the countermeasure sample space of the target model; collect and call the target model Determine whether the input data falls into the confrontation sample space; calculate the monitoring parameters of the input data that fall into the confrontation sample space during the monitoring period according to the determination result, when the monitoring parameters meet the preset attack When the conditions are met, it is determined that an adversarial attack facing the target model is detected.

可选的，在确定所述目标模型的对抗样本空间时，所述处理器被促使：对所述目标模型进行攻击测试，以获得所述目标模型的至少一个对抗样本；基于所述对抗样本，确定所述目标模型的对抗样本空间。Optionally, when determining the adversarial sample space of the target model, the processor is prompted to: perform an attack test on the target model to obtain at least one adversarial sample of the target model; based on the adversarial sample, Determine the adversarial sample space of the target model.

可选的，在进行所述攻击测试时，所述处理器被促使：进行基于边界攻击的黑盒测试；或进行基于边界攻击的白盒测试。Optionally, when performing the attack test, the processor is prompted to: perform a black box test based on a boundary attack; or perform a white box test based on a boundary attack.

可选的，在基于所述对抗样本，确定所述目标模型的对抗样本空间时，所述处理器被促使：确定每个对抗样本的空间坐标；基于所述空间坐标对所述对抗样本进行聚类，得到若干对抗样本簇；为每个对抗样本簇生成对应的凸包络，作为所述对抗样本空间。Optionally, when determining the adversarial sample space of the target model based on the adversarial samples, the processor is prompted to: determine the spatial coordinates of each adversarial sample; and gather the adversarial samples based on the spatial coordinates. Class, obtain several adversarial sample clusters; generate a corresponding convex envelope for each adversarial sample cluster as the adversarial sample space.

可选的，在判断所述输入数据是否落入所述对抗样本空间时，所述处理器被促使：确定所述输入数据的空间坐标；判断所述空间坐标是否落入任意凸包络；若是，则确定所述输入数据落入所述对抗样本空间。Optionally, when determining whether the input data falls into the adversarial sample space, the processor is prompted to: determine the spatial coordinates of the input data; determine whether the spatial coordinates fall into any convex envelope; if so , It is determined that the input data falls into the adversarial sample space.

可选的，在判断所述输入数据是否落入所述对抗样本空间，所述处理器被促使：确定所述输入数据的空间坐标；根据所述空间坐标，判断所述输入数据与任意对抗样本簇的距离是否小于距离阈值；若是，则确定所述输入数据落入所述对抗样本空间。Optionally, when determining whether the input data falls into the confrontation sample space, the processor is prompted to: determine the spatial coordinates of the input data; and determine whether the input data and any confrontation sample are based on the spatial coordinates. Whether the distance of the cluster is less than the distance threshold; if so, it is determined that the input data falls into the adversarial sample space.

可选的，在确定监测到面向所述目标模型的对抗攻击后，所述处理器还被促使：发送告警信息。Optionally, after determining that a confrontational attack facing the target model is detected, the processor is also prompted to send alarm information.

与前述对抗攻击的监测方法的实施例相对应，本说明书还提供一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，该程序被处理器执行时实现以下步骤：获取目标模型的对抗样本空间；采集调用所述目标模型的输入数据；判断所述输入数据是否落入所述对抗样本空间；根据判断结果计算监测周期内落入到所述对抗样本空间的输入数据的监测参数，当所述监测参数满足预设的攻击条件时，确定监测到面向所述目标模型的对抗攻击。Corresponding to the foregoing embodiment of the monitoring method for countering attacks, this specification also provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium, and when the program is executed by a processor, the following steps are implemented: The countermeasure sample space of the target model; collect and call the input data of the target model; determine whether the input data falls into the countermeasure sample space; calculate the value of the input data that falls into the countermeasure sample space during the monitoring period according to the judgment result Monitoring parameters, when the monitoring parameters meet a preset attack condition, it is determined that a confrontation attack facing the target model is detected.

可选的，所述目标模型的对抗样本空间的确定方式，包括：对所述目标模型进行攻击测试，以获得所述目标模型的至少一个对抗样本；基于所述对抗样本，确定所述目标模型的对抗样本空间。Optionally, the method for determining the adversarial sample space of the target model includes: performing an attack test on the target model to obtain at least one adversarial sample of the target model; and determining the target model based on the adversarial sample The adversarial sample space.

可选的，所述攻击测试，包括：基于边界攻击的黑盒测试；或基于边界攻击的白盒测试。Optionally, the attack test includes: a black box test based on a border attack; or a white box test based on a border attack.

可选的，所述基于所述对抗样本，确定所述目标模型的对抗样本空间，包括：确定每个对抗样本的空间坐标；基于所述空间坐标对所述对抗样本进行聚类，得到若干对抗样本簇；为每个对抗样本簇生成对应的凸包络，作为所述对抗样本空间。Optionally, the determining the adversarial sample space of the target model based on the adversarial samples includes: determining the space coordinates of each adversarial sample; clustering the adversarial samples based on the space coordinates to obtain several adversarial samples Sample clusters; generate a corresponding convex envelope for each confrontation sample cluster as the confrontation sample space.

可选的，所述判断所述输入数据是否落入所述对抗样本空间，包括：确定所述输入数据的空间坐标；判断所述空间坐标是否落入任意凸包络；若是，则确定所述输入数据落入所述对抗样本空间。Optionally, the judging whether the input data falls into the adversarial sample space includes: determining the spatial coordinates of the input data; judging whether the spatial coordinates fall into any convex envelope; if so, determining the The input data falls into the adversarial sample space.

可选的，所述判断所述输入数据是否落入所述对抗样本空间，包括：确定所述输入数据的空间坐标；根据所述空间坐标，判断所述输入数据与任意对抗样本簇的距离是否小于距离阈值；若是，则确定所述输入数据落入所述对抗样本空间。Optionally, the judging whether the input data falls into the adversarial sample space includes: determining the spatial coordinates of the input data; judging whether the distance between the input data and any adversarial sample cluster is based on the spatial coordinates Is less than the distance threshold; if so, it is determined that the input data falls into the adversarial sample space.

可选的，所述监测参数为落入所述对抗样本空间的输入数据的数量，所述攻击条件为所述数量达到数量阈值。Optionally, the monitoring parameter is the quantity of input data that falls into the adversarial sample space, and the attack condition is that the quantity reaches a quantity threshold.

可选的，所述监测参数为落入所述对抗样本空间的输入数据的比例，所述攻击条件为所述比例达到比例阈值。Optionally, the monitoring parameter is a proportion of input data falling into the adversarial sample space, and the attack condition is that the proportion reaches a proportion threshold.

可选的，所述确定监测到面向所述目标模型的对抗攻击后，还包括：发送告警信息。Optionally, after determining that a confrontational attack facing the target model is detected, the method further includes: sending alarm information.

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than in the embodiments and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown in order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

以上所述仅为本说明书的较佳实施例而已，并不用以限制本说明书，凡在本说明书的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本说明书保护的范围之内。The above are only the preferred embodiments of this specification and are not intended to limit this specification. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this specification shall be included in this specification. Within the scope of protection.

Claims

一种对抗攻击的监测方法，包括：A method of monitoring against attacks, including:

获取目标模型的对抗样本空间；Obtain the adversarial sample space of the target model;

采集调用所述目标模型的输入数据；Collecting and calling the input data of the target model;

判断所述输入数据是否落入所述对抗样本空间；Judging whether the input data falls into the adversarial sample space;

根据判断结果计算监测周期内落入到所述对抗样本空间的输入数据的监测参数，当所述监测参数满足预设的攻击条件时，确定监测到面向所述目标模型的对抗攻击。Calculate the monitoring parameters of the input data falling into the countermeasure sample space in the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, it is determined that the counter attack facing the target model is monitored.
根据权利要求1所述方法，所述目标模型的对抗样本空间的确定方式，包括：The method according to claim 1, wherein the method for determining the adversarial sample space of the target model includes:

对所述目标模型进行攻击测试，以获得所述目标模型的至少一个对抗样本；Performing an attack test on the target model to obtain at least one adversarial sample of the target model;

基于所述对抗样本，确定所述目标模型的对抗样本空间。Based on the adversarial sample, the adversarial sample space of the target model is determined.
根据权利要求2所述方法，所述攻击测试，包括：According to the method of claim 2, the attack test includes:

基于边界攻击的黑盒测试；或Black box testing based on border attacks; or

基于边界攻击的白盒测试。White box testing based on border attacks.
根据权利要求2所述方法，基于所述对抗样本，确定所述目标模型的对抗样本空间，包括：The method according to claim 2, determining the adversarial sample space of the target model based on the adversarial sample, comprising:

确定每个对抗样本的空间坐标；Determine the spatial coordinates of each adversarial sample;

基于所述空间坐标对所述对抗样本进行聚类，得到若干对抗样本簇；Clustering the adversarial samples based on the space coordinates to obtain several adversarial sample clusters;

为每个对抗样本簇生成对应的凸包络，作为所述对抗样本空间。A corresponding convex envelope is generated for each adversarial sample cluster as the adversarial sample space.
根据权利要求4所述方法，判断所述输入数据是否落入所述对抗样本空间，包括：According to the method of claim 4, determining whether the input data falls into the adversarial sample space includes:

确定所述输入数据的空间坐标；Determine the spatial coordinates of the input data;

判断所述空间坐标是否落入任意凸包络；Judging whether the space coordinates fall into any convex envelope;

若是，则确定所述输入数据落入所述对抗样本空间。If so, it is determined that the input data falls into the adversarial sample space.
根据权利要求4所述方法，判断所述输入数据是否落入所述对抗样本空间，包括：According to the method of claim 4, determining whether the input data falls into the adversarial sample space includes:

确定所述输入数据的空间坐标；Determine the spatial coordinates of the input data;

根据所述空间坐标，判断所述输入数据与任意对抗样本簇的距离是否小于距离阈值；According to the space coordinates, determine whether the distance between the input data and any adversarial sample cluster is less than a distance threshold;

若是，则确定所述输入数据落入所述对抗样本空间。If so, it is determined that the input data falls into the adversarial sample space.
根据权利要求1所述方法，所述监测参数为落入所述对抗样本空间的输入数据的数量，所述攻击条件为所述数量达到数量阈值。The method according to claim 1, wherein the monitoring parameter is the number of input data falling into the confrontation sample space, and the attack condition is that the number reaches a number threshold.
根据权利要求1所述方法，所述监测参数为落入所述对抗样本空间的输入数据的比例，所述攻击条件为所述比例达到比例阈值。The method according to claim 1, wherein the monitoring parameter is a proportion of input data falling into the adversarial sample space, and the attack condition is that the proportion reaches a proportion threshold.
根据权利要求1所述方法，所述确定监测到面向所述目标模型的对抗攻击后，所述方法还包括：The method according to claim 1, after determining that a confrontation attack facing the target model is detected, the method further comprises:

发送告警信息。Send alarm information.
一种对抗攻击的监测装置，包括：A monitoring device against attacks, including:

获取单元，获取目标模型的对抗样本空间；The acquisition unit, which acquires the adversarial sample space of the target model;

采集单元，采集调用所述目标模型的输入数据；A collection unit to collect input data for calling the target model;

判断单元，判断所述输入数据是否落入所述对抗样本空间；A judging unit, judging whether the input data falls into the adversarial sample space;

监测单元，根据判断结果计算监测周期内落入到所述对抗样本空间的输入数据的监测参数，当所述监测参数满足预设的攻击条件时，确定监测到面向所述目标模型的对抗攻击。The monitoring unit calculates the monitoring parameters of the input data falling into the confrontation sample space in the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, it is determined that the confrontation attack facing the target model is detected.
根据权利要求10所述装置，所述目标模型的对抗样本空间的确定方式，包括：The device according to claim 10, wherein the method for determining the adversarial sample space of the target model comprises:

对所述目标模型进行攻击测试，以获得所述目标模型的至少一个对抗样本；Performing an attack test on the target model to obtain at least one adversarial sample of the target model;

基于所述对抗样本，确定所述目标模型的对抗样本空间。Based on the adversarial sample, the adversarial sample space of the target model is determined.
根据权利要求11所述装置，所述攻击测试，包括：The device according to claim 11, said attack test comprising:

基于边界攻击的黑盒测试；或Black box testing based on border attacks; or

基于边界攻击的白盒测试。White box testing based on border attacks.
根据权利要求11所述装置，所述基于所述对抗样本，确定所述目标模型的对抗样本空间，包括：The device according to claim 11, wherein the determining the confrontation sample space of the target model based on the confrontation sample comprises:

确定每个对抗样本的空间坐标；Determine the spatial coordinates of each adversarial sample;

基于所述空间坐标对所述对抗样本进行聚类，得到若干对抗样本簇；Clustering the adversarial samples based on the space coordinates to obtain several adversarial sample clusters;

为每个对抗样本簇生成对应的凸包络，作为所述对抗样本空间。A corresponding convex envelope is generated for each adversarial sample cluster as the adversarial sample space.
根据权利要求13所述装置，所述判断单元：The device according to claim 13, wherein the judgment unit:

确定所述输入数据的空间坐标；Determine the spatial coordinates of the input data;

判断所述空间坐标是否落入任意凸包络；Judging whether the space coordinates fall into any convex envelope;

若是，则确定所述输入数据落入所述对抗样本空间。If so, it is determined that the input data falls into the adversarial sample space.
根据权利要求13所述装置，所述判断单元：The device according to claim 13, wherein the judgment unit:

确定所述输入数据的空间坐标；Determine the spatial coordinates of the input data;

根据所述空间坐标，判断所述输入数据与对抗样本簇的距离是否小于距离阈值；According to the spatial coordinates, determine whether the distance between the input data and the adversarial sample cluster is less than a distance threshold;

若是，则确定所述输入数据落入所述对抗样本空间。If so, it is determined that the input data falls into the adversarial sample space.
根据权利要求10所述装置，所述监测参数为落入所述对抗样本空间的输入数据的数量，所述攻击条件为所述数量达到数量阈值。The device according to claim 10, wherein the monitoring parameter is the number of input data falling into the confrontation sample space, and the attack condition is that the number reaches a number threshold.
根据权利要求10所述装置，所述监测参数为落入所述对抗样本空间的输入数据的比例，所述攻击条件为所述比例达到比例阈值。The device according to claim 10, wherein the monitoring parameter is a proportion of input data falling into the adversarial sample space, and the attack condition is that the proportion reaches a proportion threshold.
根据权利要求10所述装置，还包括：The device according to claim 10, further comprising:

告警单元，发送告警信息。The alarm unit sends alarm information.
一种对抗攻击的监测装置，包括：A monitoring device against attacks, including:

处理器；processor;

用于存储机器可执行指令的存储器；Memory used to store machine executable instructions;

其中，通过读取并执行所述存储器存储的与对抗攻击的监测逻辑对应的机器可执行指令，所述处理器被促使：Wherein, by reading and executing the machine executable instructions corresponding to the monitoring logic against the attack stored in the memory, the processor is prompted to:

获取目标模型的对抗样本空间；Obtain the adversarial sample space of the target model;

采集调用所述目标模型的输入数据；Collecting and calling the input data of the target model;

判断所述输入数据是否落入所述对抗样本空间；Judging whether the input data falls into the adversarial sample space;

根据判断结果计算监测周期内落入到所述对抗样本空间的输入数据的监测参数，当所述监测参数满足预设的攻击条件时，确定监测到面向所述目标模型的对抗攻击。Calculate the monitoring parameters of the input data falling into the countermeasure sample space in the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, it is determined that the counter attack facing the target model is monitored.